Other briefs in this series:
- Key Concepts in AI Safety: An Overview
- Key Concepts in AI Safety: Robustness and Adversarial Examples
- Key Concepts in AI Safety: Interpretability in Machine Learning
- Key Concepts in AI Safety: Specification in Machine Learning
Introduction
The last decade of progress in machine learning research has given rise to systems that are surprisingly capable but also notoriously unreliable. The chatbot ChatGPT, developed by OpenAI, provides a good illustration of this tension. Users interacting with the system after its release in November 2022 quickly found that while it could adeptly find bugs in programming code and author Seinfeld scenes, it could also be confounded by simple tasks. For example, one dialogue showed the bot claiming that the fastest marine mammal was the peregrine falcon, then changing its mind to the sailfish, then back to the falcon—despite the obvious fact that neither of these choices is a mammal. This kind of uneven performance is characteristic of deep learning systems—the type of AI systems that have seen most progress in recent years—and presents a significant challenge to their deployment in real-world contexts.
An intuitive way to handle this problem is to build machine learning systems that “know what they don’t know”—that is, systems that can recognize and account for situations where they are more likely to make mistakes. For instance, a chatbot could display a confidence score next to its answers, or an autonomous vehicle could sound an alarm when it finds itself in a scenario it cannot handle. That way, the system could be useful in situations where it performs well, and harmless in situations where it does not. This could be especially useful for AI systems that are used in a wide range of settings, such as large language models (the technology that powers chatbots like ChatGPT), since these systems are very likely to encounter scenarios that diverge from what they were trained and tested for.
Unfortunately, designing machine learning systems that can recognize their limits is more challenging than it may appear at first glance. In fact, enabling machine learning systems to “know what they don’t know”—known in technical circles as “uncertainty quantification”—is an open and widely studied research problem within machine learning. This paper gives an introduction to how uncertainty quantification works, why it is difficult, and what the prospects are for the future.