Analysis

Key Concepts in AI Safety: Specification in Machine Learning

Tim G. J. Rudner

Helen Toner

December 2021

This paper is the fourth installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. The first paper in the series, “Key Concepts in AI Safety: An Overview,” outlined three categories of AI safety issues—problems of robustness, assurance, and specification—and the subsequent two papers described problems of robustness and assurance, respectively. This paper introduces specification as a key element in designing modern machine learning systems that operate as intended.

Download Full Report

Other briefs in this series:

Introduction

Specification is the task of conveying to a machine learning system what exactly its designers would like it to do.¹ For some tasks—such as choosing which tiles in a CAPTCHA test contain a traffic light—it is relatively straightforward for the designer of such a system to write a precise description of what they are looking for. For many other tasks, however, it is difficult to capture the nuances of the intentions in precise, mathematical language.

In some ways, this type of challenge is not unique to machines. One prominent computer scientist describes specification problems in terms of familiar fictional analogues: As in the cases of King Midas, the Sorcerer’s Apprentice, or the genie in the lamp, you get exactly what you wished for, which may not necessarily be what you wanted.² Principal–agent problems in economics deal with related situations, where a task is delegated from one person (the principal) to another (the agent), but the agent’s understanding or incentives may diverge from the principal’s. Specification problems in machine learning systems arise due to similar dynamics: In all but the simplest settings, it is challenging to convey and incentivize desired behaviors, which may in turn lead to undesired behaviors. Ensuring that a given specification of a machine learning system results in a specific desired behavior and is in accordance with its designer’s intentions is a key challenge for machine learning research. To guard against failures, the goal is for machine learning systems to be robust to potential errors or inaccuracies in the specification. Getting this right will become increasingly critical as machine learning systems are deployed in higher-stakes and more complex settings. This brief provides an overview of specification problems for a policy audience, introducing key concepts, offering real-world examples, and suggesting implications for policymakers.

Download Full Report

Key Concepts in AI Safety: Specification in Machine Learning

“Specification” can also be used to refer to other parts of the process of designing and training a machine learning system, such as model specification, but these are not included for the purposes of this paper.
Stuart Russell, “Of Myths and Moonshine,” Edge, November 2014, https://www.edge.org/conversation/the-myth-of-ai#26015.

Analysis

Key Concepts in AI Safety: An Overview

March 2021

This paper is the first installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure… Read More

Analysis

Key Concepts in AI Safety: Robustness and Adversarial Examples

March 2021

This paper is the second installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure… Read More

Analysis

Key Concepts in AI Safety: Interpretability in Machine Learning

March 2021

This paper is the third installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure… Read More

Analysis

Key Concepts in AI Safety: Reliable Uncertainty Quantification in Machine Learning

June 2024

This paper is the fifth installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure… Read More

Center for Security and Emerging Technology

Analysis

Key Concepts in AI Safety: Specification in Machine Learning

Introduction

Download Full Report

Related Content

Key Concepts in AI Safety: An Overview

Key Concepts in AI Safety: Robustness and Adversarial Examples

Key Concepts in AI Safety: Interpretability in Machine Learning

Key Concepts in AI Safety: Reliable Uncertainty Quantification in Machine Learning

Analysis

Key Concepts in AI Safety: Specification in Machine Learning

Introduction

Download Full Report

Related Content

Key Concepts in AI Safety: An Overview

Key Concepts in AI Safety: Robustness and Adversarial Examples

Key Concepts in AI Safety: Interpretability in Machine Learning

Key Concepts in AI Safety: Reliable Uncertainty Quantification in Machine Learning

This website uses cookies.