AI regulation is rightfully garnering attention in the United States. Private sector developers have drawn focus in congressional testimony and other venues to the transformative potential of AI applications. Most recently, a number of companies also signed a voluntary pledge led by the White House to use safeguards in AI development. As policymakers decide how best to regulate AI, they first need to grasp the different types of harm that various AI applications might cause at the individual, national, and societal levels. At CSET, we recently developed a framework to categorize and analyze different AI harms based on several criteria. Policymakers can use CSET’s AI Harm Framework (and related annotation guide) to think holistically about different AI harms and how regulations might help those most affected by these various types.
Components of AI Harm
AI systems can cause various types of harm. These harms can affect many different types of people, particularly vulnerable groups, but they can also create challenges related to the environment, property, infrastructure, and social structure. Thinking systematically about harm, its elements, and the various categories of AI harms can help us rigorously address different concerns that new AI applications might create.
Our research has identified four key components of AI harm:
- The AI system involved: Without an AI system, AI harm cannot occur. Examples of AI systems include chatbots powered by Large Language Models (LLMs), AI systems used to approve loans, AI-assisted driving in vehicles, and facial recognition used to unlock phones, among many others.
- The affected entity: Identifying or describing the affected entities helps us determine what harms occurred, the potential for future harm, or the risks associated with new systems. Examples include a person, company, organization, location, product, and the natural environment. Groups of individuals (e.g., females, minors, Catholics, Black Americans, etc.) and organizations can also be entities.
- The harm event or issue: This refers to the actual harm caused (an “event”) by the AI system or the potential harm (an “issue”) the system might one day cause. The harm can manifest in various ways, depending on the specific situation. For example, an individual being denied a loan for reasons that are incorrect, discriminatory, or illegal would be considered a harm event.
- The AI-linked connection to the harm: This component establishes a clear relationship between the AI system and the negative consequences. It helps us attribute the harm specifically to the AI system’s behavior. For example, a phone may have an embedded AI assistant, but that AI’s behavior is not directly linked to an expensive phone call placed by a child who was able to dial a random number. In contrast, an AI-enabled HR tool that uses names to recommend job applicants can be directly linked to harm if it produces discriminatory results.
Route to AI Harms
With these four components in mind, we can categorize a harm by how a particular AI system was involved:
- Unintentional harm caused by an AI system: When an AI system behaves unexpectedly or produces unforeseen outcomes, it can harm entities unintentionally, or in ways not envisioned by its developers. Failures in AI applications usually stem from differences in data, users, or conditions compared to those in which the AI was created, or from other unintended biases. For example, some facial recognition systems have been demonstrated to struggle to identify individuals with darker skin tones because they were mostly trained on images of lighter skin-toned people. This has led to unintentional harm based on race or ethnicity. Developing robust and reliable AI systems and implementing restrictions on their use can help reduce unintended harm; restricting an AI model’s use or its outputs could also help.
- Intentional harm caused by an AI system: An AI system can be designed to harm other entities intentionally. For example, AI-enabled weapon systems are explicitly developed to cause harm to a selected target. Additionally, criminals or “bad actors” may misuse AI systems for harmful purposes, bypassing safety measures or employing novel applications for malicious or illegal activities. In 2019, an AI-enabled scam using a deepfake defrauded €220,000 from a UK firm.
- Harm caused to an AI system: A perhaps less commonly considered category of AI harm is one where entities deliberately harm the AI system itself. The entity responsible can be a person, software, or even another AI system. This mode often relates to cybersecurity and vulnerabilities, including actions that compromise the AI system’s performance, extract sensitive data, or manipulate model parameters.
Given how fast AI is advancing, thinking about harms in the context of existing systems is insufficient for AI safety purposes. Policymakers and regulators also need an idea of harms that might occur in the future to ensure that their efforts do not rapidly become obsolete. We can therefore consider harms on a temporal spectrum, including those that have definitively occurred in the past, those that could probably occur, and those that might one day speculatively occur. While past, definitively occurred AI harms clearly involve all four of the key components discussed above, it is helpful to break out the nuanced differences between probable, implied, and speculative harms.
- Occurred harms have definitively happened and are harm events. The harm may be something easily observable like injury or physical damage. In contrast, it may not be directly observable like mental/psychological harm, pain and suffering, or harm to intangible property (for example, IP theft, damage to a company’s reputation). An example of an ‘occurred’ harm is a fatal collision between a pedestrian and a self-driving Uber in Arizona in 2018. The accident happened because the car’s AI model struggled to classify jaywalkers as pedestrians, which led to the delayed recognition of the pedestrian by the AI.
- Probable harms involve a known risk or vulnerability of harm with deployed AI systems. While a specific harm may not yet have happened, there is a possibility that harm could imminently occur. Clearly-defined entities are currently at risk of experiencing harm. Thus, in order for a harm to be probable, the AI system must be currently deployed or used in conditions where its behavior could impact an entity. Awareness of potential AI harms is critical to mitigating risk since harms that may occur can quickly turn into harms that have actually occurred. For example, in 2018, actor and director Jordan Peele created a fake public service announcement in which a deepfake video of former U.S. President Barack Obama warned about the ease of creating and the potential harm stemming from deepfake videos. Since then, the use of AI-generated videos and images for political disinformation has increased and gone international, showing up in Ukraine, Turkey, and other states. The potential demonstrated in Peele’s PSA eventually became real.
- Implied harms involve situations where we cannot definitively say that an entity could experience probably imminent AI harm–however, we have supporting evidence that harm could reasonably occur. For example, a 2020 study that looked at AI produced diagnostic labels for X-ray images. The study built and diagnostic-AI using common training datasets that are typically used in the development of currently deployed AI. The study found that its AI under-diagnosed some subpopulations (based on gender, race, and socioeconomic status) because these subpopulations were underrepresented in the training data. The study did not examine results from known deployed, in-use AI systems and cannot definitively say those deployed systems underdiagnose subpopulations. However, the study does make a credible argument that those systems are likely to under-diagnose and thus are associated with implied harm.
- Speculative harms are usually thought experiments that rely on non-existent AI capabilities or systems or speculate about future conditions that might come to pass. For speculative harms, there are no entities currently exposed to the harm. For example, a U.S. AI system that accidentally launches nuclear weapons and kills millions is a speculative AI harm, since the United States does not have an AI system that controls nuclear weapons and it has publicly committed to restricting AI control over nuclear weapons.
All AI harms, occurred, probable, implied, or speculative, are significant. Where AI harms already occurred, we may need to change the AI system and our harm mitigation processes. Probable AI harms point to vulnerabilities or risks that should be managed or mitigated before any actual harm takes place. Finally, it is important to foresee and prepare for speculative AI harms that could have severe and irreversible consequences. Ideally, society and governments should establish safeguards, practices, agreements, and policies that prevent these speculative AI harms.
Differentiating between Simple and Complex Harms
CSET’s framework clearly breaks down the elements and types of AI harm analytically but, in the real world, AI harms may differ greatly across various applications and fields. AI harm can be straightforward, involving one AI system and a single harmed entity, but it can also be more complex and multifaceted. AI harm can involve multiple harmed entities, AI systems, or types of harm.
It is relatively common for an instance or event of AI harm to create multiple harms. This could involve harming multiple entities, harming the same entity in multiple ways, or both. There is also a possibility that AI harm can lead to cascading harm, an initial harm that then directly or indirectly leads to additional harm. For example, in 2018, investigative journalists discovered that the Dutch Tax Authority had been using a risk prediction system to detect welfare fraud that wrongfully determined that thousands of families were defrauding childcare benefits. In addition to terminating benefits payments and plunging thousands of households into debt (tangible harm in the form of financial loss), the model disproportionately accused immigrant families because it considered a second nationality a high-risk factor (intangible harm in the form of disparate treatment).
It is also possible for an AI system to be both the harmed and harming entity. Consider a notional cyberattack on an AI embedded in an autonomous vehicle. If that cyber-breach was used to control the vehicle’s AI system and crash the vehicle into a building, the embedded AI system would be both the harmed and harming entity. An entity (the people or software performing the cyber attack) harmed another AI system. Additionally, other entities (the building and vehicle that sustained physical damage) were intentionally harmed by an AI system.
Thinking about the various components and types of AI harms systematically can help policymakers and regulators capitalize on AI’s potential safely and responsibly. We believe our framework is applicable across the different branches of government and agencies that AI will impact, and that our insights hold for those in Congress as truly as they do for those in the Department of Defense, Commerce, or Homeland Security.