This paper is the third installment in a series on “AI safety,” an area
of machine learning research that aims to identify causes of
unintended behavior in machine learning systems and develop
tools to ensure these systems work safely and reliably. The first
paper in the series, “Key Concepts in AI Safety: An Overview,”
described three categories of AI safety issues: problems of
robustness, assurance, and specification. This paper introduces
interpretability as a means to enable assurance in modern machine
learning systems.
This paper is the second installment in a series on “AI safety,” an
area of machine learning research that aims to identify causes of
unintended behavior in machine learning systems and develop
tools to ensure these systems work safely and reliably. The first
paper in the series, “Key Concepts in AI Safety: An Overview,”
described three categories of AI safety issues: problems of
robustness, assurance, and specification. This paper introduces
adversarial examples, a major challenge to robustness in modern
machine learning systems.
This paper is the first installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. In it, the authors introduce three categories of AI safety issues: problems of robustness, assurance, and specification. Other papers in this series elaborate on these and further key concepts.
CSET Research Fellow Margarita Konaev and Research Analyst Husanjot Chahal discuss research gaps on trust in human-machine teaming and how to build trustworthy AI systems for military systems and missions.
Among great powers, AI has become a new focus of competition due to its potential to transform the character of conflict and disrupt the military balance. This policy brief considers alternative paths toward AI safety and security.
This website uses cookies.
To learn more, please review this policy. By continuing to browse the site, you agree to these terms.
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.