Artificial intelligence (AI) governance is a pressing policy issue. AI systems help us complete a wide range of tasks, from driving to the store or vacuuming, to diagnosing illnesses and providing disaster relief. AI systems are rapidly being adopted to assist, or independently complete, many more tasks. The result is a deserved focus on the safe development and deployment of AI systems. Governments are putting forth AI ethics principles, compiling AI inventories, and mandating AI risk assessments. But efforts to ensure AI systems are safe and effective require a standardized approach to classifying the varied types of AI systems in use.
Classifying AI systems involves identifying a set of observable system characteristics and assigning individual systems a predefined label for each characteristic. For example, for every AI system, we can observe its autonomy and assign it a predefined autonomy level that informs our understanding of what kind of system it is and how it works. Combining the identified system characteristics, a framework defines an AI system along those characteristics, referred to as framework dimensions. Using such a framework, system developers, governing bodies, and users can classify systems in a uniform way and use those classifications to inform consequential decisions about AI technologies, while effectively monitoring risk and bias and managing system inventories.
To that end, CSET partnered with the Organization for Economic Cooperation and Development (OECD) AI Policy Observatory and U.S. Department of Homeland Security (DHS) Office of Strategy, Policy, and Plans to develop several frameworks for classifying AI systems.1 Each framework was built following the same process:
- Identify policy-relevant characteristics of AI systems (e.g., autonomy, impact, data collection method) to be the framework dimensions;
- Define a set of levels for each dimension (e.g., low, medium, high) into which an AI system can be assigned.
To test the usability of four resulting frameworks, CSET fielded two rounds of a survey experiment where more than 360 respondents completed over 1,800 unique AI system classifications using the frameworks. We tested the ability of individuals to assign consistent and accurate classifications across frameworks to provide insight into how the public and policymakers could use the framework and what frameworks may be most effective.
Based on the survey experiment, we find:
- Certain frameworks produced more consistent and accurate classifications. Higher-performing frameworks more than doubled the percentage of consistent and accurate classifications, compared to the lowest-performing framework.
- Including a summary rubric of framework dimensions improves classification. We found a significant decrease in consistent and accurate classifications when users were not provided with a rubric when making a system classification, compared to instances when a rubric was provided.
- Users were better at classifying an AI system’s impact level than autonomy level. Users consistently assigned the accurate system impact but struggled to consistently assign the accurate level of system autonomy, across all frameworks. Users were also better at classifying system deployment context than technical system characteristics.
- Users were better at classifying an AI system’s autonomy level when the framework provided more descriptive levels. We found that consistent and accurate classifications were higher when system autonomy levels were labeled as “action,” “decision,” and “perception” as opposed to “high,” “medium,” and “low.”
- Classification depends on sufficient accessible information about the system. We found classifications were more varied when system descriptions did not include a specific use case, suggesting that the provided descriptions shaped how well users could classify the systems. More broadly, this process showed that classifying technical characteristics requires more information than is typically available about an AI system.
Download Full ReportClassifying AI Systems
- CSET did not receive any funding for this research from the U.S. Department of Homeland Security or any other government entity. The views and conclusions contained in this document are those of the author and should not be interpreted as necessarily representing the official policies, either expressed or implied, of DHS, and do not constitute a DHS endorsement of the rubric tested or evaluated.