Reports

Key Concepts in AI Safety: An Overview

Tim G. J. Rudner

and Helen Toner

March 2021

This paper is the first installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. In it, the authors introduce three categories of AI safety issues: problems of robustness, assurance, and specification. Other papers in this series elaborate on these and further key concepts.

Download Full Report

Other briefs in this series:

Introduction

The past decade has seen the emergence of modern artificial intelligence and a variety of AI-powered technological innovations. This rapid transformation has predominantly been driven by machine learning, a subfield of AI in which computers learn patterns and form associations based on data. Machine learning has achieved success in application areas including image classification and generation, speech and text generation, and decision making in complex environments such as autonomous driving, video games, and strategy board games.

However, unlike the mathematical and computational tools commonly used in engineering, modern machine learning methods do not come with safety guarantees. While advances in fields such as control theory have made it possible to build complex physical systems, like those found in various types of aircraft and automobiles, that are validated and guaranteed to have an extremely low chance of failure, we do not yet have ways to produce similar guarantees for modern machine learning systems. As a result, many machine learning systems cannot be deployed without risking the system encountering a previously unknown scenario that causes it to fail.

The risk of system failures causing significant harm increases as machine learning becomes more widely used, especially in areas where safety and security are critical. To mitigate this risk, research into “safe” machine learning seeks to identify potential causes of unintended behavior in machine learning systems and develop tools to reduce the likelihood of such behavior occurring. This area of research is referred to as “AI safety” and focuses on technical solutions to ensure that AI systems operate safely and reliably. Many other challenges related to the safe deployment of AI systems—such as how to integrate them into existing networks, how to train operators to work effectively with them, and so on—are worthy of substantial attention, but are not covered here.

Problems in AI safety can be grouped into three categories: robustness, assurance, and specification. Robustness guarantees that a system continues to operate within safe limits even in unfamiliar settings; assurance seeks to establish that it can be analyzed and understood easily by human operators; and specification is concerned with ensuring that its behavior aligns with the system designer’s intentions.

Download Full Report

Key Concepts in AI Safety: An Overview

Authors

Tim G. J. Rudner Helen Toner

Originally Published

March 2021

Topics

Assessment

Citation

Tim G. J. Rudner and Helen Toner, "Key Concepts in AI Safety: An Overview" (Center for Security and Emerging Technology, March 2021). https://doi.org/10.51593/20190040.

Twitter Facebook LinkedIn

Reports

Key Concepts in AI Safety: Robustness and Adversarial Examples

March 2021

This paper is the second installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure… Read More

Reports

Key Concepts in AI Safety: Interpretability in Machine Learning

March 2021

This paper is the third installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure… Read More

Reports

Key Concepts in AI Safety: Specification in Machine Learning

November 2021

This paper is the fourth installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure… Read More

Reports

Key Concepts in AI Safety: Reliable Uncertainty Quantification in Machine Learning

June 2024

This paper is the fifth installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure… Read More

Center for Security and Emerging Technology

The White House Made Fixing Intel Its Pet Project. It’s Working.

Reports

Key Concepts in AI Safety: An Overview

Introduction

Download Full Report

Related Content

Key Concepts in AI Safety: Robustness and Adversarial Examples

Key Concepts in AI Safety: Interpretability in Machine Learning

Key Concepts in AI Safety: Specification in Machine Learning

Key Concepts in AI Safety: Reliable Uncertainty Quantification in Machine Learning

The White House Made Fixing Intel Its Pet Project. It’s Working.

Reports

Key Concepts in AI Safety: An Overview

Introduction

Download Full Report

Related Content

Key Concepts in AI Safety: Robustness and Adversarial Examples

Key Concepts in AI Safety: Interpretability in Machine Learning

Key Concepts in AI Safety: Specification in Machine Learning

Key Concepts in AI Safety: Reliable Uncertainty Quantification in Machine Learning

This website uses cookies.