AI safety Archives | Page 2 of 2 | Center for Security and Emerging Technology

Key Concepts in AI Safety: Interpretability in Machine Learning

Tim G. J. Rudner and Helen Toner

| March 2021

This paper is the third installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. The first paper in the series, “Key Concepts in AI Safety: An Overview,” described three categories of AI safety issues: problems of robustness, assurance, and specification. This paper introduces interpretability as a means to enable assurance in modern machine learning systems.

Key Concepts in AI Safety: Robustness and Adversarial Examples

Tim G. J. Rudner and Helen Toner

| March 2021

This paper is the second installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. The first paper in the series, “Key Concepts in AI Safety: An Overview,” described three categories of AI safety issues: problems of robustness, assurance, and specification. This paper introduces adversarial examples, a major challenge to robustness in modern machine learning systems.

Key Concepts in AI Safety: An Overview

Tim G. J. Rudner and Helen Toner

| March 2021

This paper is the first installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. In it, the authors introduce three categories of AI safety issues: problems of robustness, assurance, and specification. Other papers in this series elaborate on these and further key concepts.

Building trust in human-machine teams

Brookings Institute

| February 18, 2021

CSET Research Fellow Margarita Konaev and Research Analyst Husanjot Chahal discuss research gaps on trust in human-machine teaming and how to build trustworthy AI systems for military systems and missions.

JAIC funded at $184M, U.S. export restrictions expanded and Facebook bans deepfake videos

January 8, 2020

Plus RAND report on DOD, OECD report on semiconductor industry and the AI Index

Tim G. J. Rudner

Tim G. J. Rudner is a Non-Resident AI/ML Fellow. He conducts research on probabilistic machine learning, reinforcement learning and AI safety.

AI Safety, Security, and Stability Among Great Powers: Options, Challenges, and Lessons Learned for Pragmatic Engagement

Andrew Imbrie and Elsa Kania

| December 2019

Among great powers, AI has become a new focus of competition due to its potential to transform the character of conflict and disrupt the military balance. This policy brief considers alternative paths toward AI safety and security.

Tag Archive: AI safety

Key Concepts in AI Safety: Interpretability in Machine Learning

Key Concepts in AI Safety: Robustness and Adversarial Examples

Key Concepts in AI Safety: An Overview

Building trust in human-machine teams

JAIC funded at $184M, U.S. export restrictions expanded and Facebook bans deepfake videos

Tim G. J. Rudner

AI Safety, Security, and Stability Among Great Powers: Options, Challenges, and Lessons Learned for Pragmatic Engagement

This website uses cookies.