Other briefs in this series:
- Key Concepts in AI Safety: Robustness and Adversarial Examples
- Key Concepts in AI Safety: Interpretability in Machine Learning
- Key Concepts in AI Safety: Specification in Machine Learning
- Key Concepts in AI Safety: Reliable Uncertainty Quantification in Machine Learning
Introduction
The past decade has seen the emergence of modern artificial intelligence and a variety of AI-powered technological innovations. This rapid transformation has predominantly been driven by machine learning, a subfield of AI in which computers learn patterns and form associations based on data. Machine learning has achieved success in application areas including image classification and generation, speech and text generation, and decision making in complex environments such as autonomous driving, video games, and strategy board games.
However, unlike the mathematical and computational tools commonly used in engineering, modern machine learning methods do not come with safety guarantees. While advances in fields such as control theory have made it possible to build complex physical systems, like those found in various types of aircraft and automobiles, that are validated and guaranteed to have an extremely low chance of failure, we do not yet have ways to produce similar guarantees for modern machine learning systems. As a result, many machine learning systems cannot be deployed without risking the system encountering a previously unknown scenario that causes it to fail.
The risk of system failures causing significant harm increases as machine learning becomes more widely used, especially in areas where safety and security are critical. To mitigate this risk, research into “safe” machine learning seeks to identify potential causes of unintended behavior in machine learning systems and develop tools to reduce the likelihood of such behavior occurring. This area of research is referred to as “AI safety” and focuses on technical solutions to ensure that AI systems operate safely and reliably. Many other challenges related to the safe deployment of AI systems—such as how to integrate them into existing networks, how to train operators to work effectively with them, and so on—are worthy of substantial attention, but are not covered here.
Problems in AI safety can be grouped into three categories: robustness, assurance, and specification. Robustness guarantees that a system continues to operate within safe limits even in unfamiliar settings; assurance seeks to establish that it can be analyzed and understood easily by human operators; and specification is concerned with ensuring that its behavior aligns with the system designer’s intentions.