Data Brief

The Inigo Montoya Problem for Trustworthy AI

The Use of Keywords in Policy and Research

Emelia Probasco,

Autumn Toney,

and Kathleen Curlee

June 2023

When the technology and policy communities use terms associated with trustworthy AI, could they be talking past one another? This paper examines the use of trustworthy AI keywords and the potential for an “Inigo Montoya problem” in trustworthy AI, inspired by "The Princess Bride" movie quote: “You keep using that word. I do not think it means what you think it means.”

Download Full Report

Executive Summary

While top-level principles regarding trustworthy, ethical, and responsible artificial intelligence and machine learning (ML) are critical to the formation of international norms, so too is the detailed work of the academic and research communities in establishing precise framings, techniques, and tools that will help create or assess trustworthy AI. An obvious interest among policymakers, therefore, is to understand and assess where the technical community may be making progress that can be harnessed, and where policymakers would do well to support or otherwise incentivize more activity.

Understanding where progress is being made in developing trustworthy AI is complicated. First, the field of AI/ML is rapidly advancing, with new tools and techniques emerging in rapid succession. Second, trustworthy AI is a nascent, multifaceted concept that is hard to bound. And third, there is the possibility that policymakers and technical researchers may be talking past each other, at least in the published literature, by using the same key terms to describe trustworthy AI—but with different meanings ascribed to these terms.

This paper aims to assist technology policymakers interested in trustworthy AI by examining the use of trustworthy AI keywords in AI research publications and whether or not that use overlaps with how the research and development community uses the same terms. Drawing on the National Institute of Standards and Technologies’ AI Risk Management Framework (NIST AI RMF), a set of terms related to trustworthy AI is defined, and 2.3 million AI-related research publications between 2010 and 2021 are analyzed, with the following findings:

Roughly 14 percent of AI papers between 2010 and 2021 include at least one of 13 trustworthy AI keywords (322,209 keyword papers). The growth in the number of publications using these terms exceeds the growth of research on AI generally in the past five years.

A review of the titles and abstracts of the most cited papers with a trustworthy AI keyword in 2021, reveals that researchers are using most of the keywords in ways that align with the intent of the NIST AI RMF. However, tracking trends in trustworthy AI research through keywords can be misleading, because not all papers that use a keyword for trustworthy AI actually discuss that subject, and some keywords are used in different contexts more often than others. For example:
- The keywords reliability and robustness are the most frequently mentioned trustworthy AI terms in publications, and most of the titles and abstracts reviewed for this study indicate that the terms are used in ways that align with NIST’s AI RMF. These terms may appear frequently in part because they are generally expected evaluation metrics used widely in AI research.
- This trend was noted in a review of titles and abstracts. However, in the case of reliability, a significant minority of papers using the term do so in the context of research on how AI could improve the reliability of a non-AI system.
- Like reliability, safety, security, and resilience are also terms frequently used with varying meanings. While most of the titles and abstracts reviewed for this study use these terms in ways that align with NIST’s AI RMF definitions, a significant minority use them in research on how AI could improve the reliability, safety, security, and/or resilience of a non-AI system.
- While the keyword bias is frequently used in policy conversations in the context of mitigating or avoiding the harmful effects of discrimination, in AI publications it has two main uses, one technical to describe meaningful components of an algorithm, and the other to describe unfair discrimination. NIST’s definition accounts for both, though it focuses on harmful bias mitigation in the sense of unfair discrimination. Researchers are evenly split between these two options in how they use the word bias.

Many publications that use the terms explainability, interpretability, transparency, and accountability are referencing how to develop AI models and systems that an end-user can trust, specifically in the context of the Explainable AI (XAI) research area. This is interesting, because, while trustworthy AI is not currently considered a research area, XAI has developed into one. Although the terms explainability and interpretability can be confusing to non-experts, they appear to be distinct and core to the XAI area of research.

Download Full Report

The Inigo Montoya Problem for Trustworthy AI

Authors

Emelia Probasco Autumn Toney Kathleen Curlee

Originally Published

June 2023

Topics

Applications

Citation

Emelia Probasco, Autumn Toney, and Kathleen Curlee, "The Inigo Montoya Problem for Trustworthy AI" (Center for Security and Emerging Technology, June 2023). https://doi.org/10.51593/20230014a

Reports

The Inigo Montoya Problem for Trustworthy AI (International Version)

October 2023

Australia, Canada, Japan, the United Kingdom, and the United States emphasize principles of accountability, explainability, fairness, privacy, security, and transparency in their high-level AI policy documents. But while the words are the same, these countries… Read More

Reports

A Common Language for Responsible AI

October 2022

Policymakers, engineers, program managers and operators need the bedrock of a common set of terms to instantiate responsible AI for the Department of Defense. Rather than create a DOD-specific set of terms, this paper argues… Read More

Reports

Key Concepts in AI Safety: An Overview

March 2021

This paper is the first installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure… Read More

Reports

Trusted Partners

February 2021

As the U.S. military integrates artificial intelligence into its systems and missions, there are outstanding questions about the role of trust in human-machine teams. This report examines the drivers and effects of such trust, assesses… Read More

Center for Security and Emerging Technology

The White House Made Fixing Intel Its Pet Project. It’s Working.

Data Brief

The Inigo Montoya Problem for Trustworthy AI

The Use of Keywords in Policy and Research

Executive Summary

Download Full Report

Related Content

The Inigo Montoya Problem for Trustworthy AI (International Version)

A Common Language for Responsible AI

Key Concepts in AI Safety: An Overview

Trusted Partners

The White House Made Fixing Intel Its Pet Project. It’s Working.

Data Brief

The Inigo Montoya Problem for Trustworthy AI

The Use of Keywords in Policy and Research

Executive Summary

Download Full Report

Related Content

The Inigo Montoya Problem for Trustworthy AI (International Version)

A Common Language for Responsible AI

Key Concepts in AI Safety: An Overview

Trusted Partners

This website uses cookies.