Assessment

Classifying AI Systems

Catherine Aiken
| November 2021

This brief explores the development and testing of artificial intelligence system classification frameworks intended to distill AI systems into concise, comparable and policy-relevant dimensions. Comparing more than 1,800 system classifications, it points to several factors that increase the utility of a framework for human classification of AI systems and enable AI system management, risk assessment and governance.

AI Accidents: An Emerging Threat

Zachary Arnold and Helen Toner
| July 2021

As modern machine learning systems become more widely used, the potential costs of malfunctions grow. This policy brief describes how trends we already see today—both in newly deployed artificial intelligence systems and in older technologies—show how damaging the AI accidents of the future could be. It describes a wide range of hypothetical but realistic scenarios to illustrate the risks of AI accidents and offers concrete policy suggestions to reduce these risks.

Key Concepts in AI Safety: Interpretability in Machine Learning

Tim G. J. Rudner and Helen Toner
| March 2021

This paper is the third installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. The first paper in the series, “Key Concepts in AI Safety: An Overview,” described three categories of AI safety issues: problems of robustness, assurance, and specification. This paper introduces interpretability as a means to enable assurance in modern machine learning systems.

Key Concepts in AI Safety: Robustness and Adversarial Examples

Tim G. J. Rudner and Helen Toner
| March 2021

This paper is the second installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. The first paper in the series, “Key Concepts in AI Safety: An Overview,” described three categories of AI safety issues: problems of robustness, assurance, and specification. This paper introduces adversarial examples, a major challenge to robustness in modern machine learning systems.

Key Concepts in AI Safety: An Overview

Tim G. J. Rudner and Helen Toner
| March 2021

This paper is the first installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. In it, the authors introduce three categories of AI safety issues: problems of robustness, assurance, and specification. Other papers in this series elaborate on these and further key concepts.

AI Verification

Matthew Mittelsteadt
| February 2021

The rapid integration of artificial intelligence into military systems raises critical questions of ethics, design and safety. While many states and organizations have called for some form of “AI arms control,” few have discussed the technical details of verifying countries’ compliance with these regulations. This brief offers a starting point, defining the goals of “AI verification” and proposing several mechanisms to support arms inspections and continuous verification.

The rise of deepfakes could enhance the effectiveness of disinformation efforts by states, political parties and adversarial actors. How rapidly is this technology advancing, and who in reality might adopt it for malicious ends? This report offers a comprehensive deepfake threat assessment grounded in the latest machine learning research on generative models.