Executive Summary
Policymakers are debating the risks that new advanced artificial intelligence (AI) technologies can pose if intentionally misused: from generating content for disinformation campaigns to instructing a novice how to build a biological agent. Because the technology is improving rapidly and the potential dangers remain unclear, assessing risk is an ongoing challenge.
Malicious-use risks are often considered to be a function of the likelihood and severity of the behavior in question. We focus on the likelihood that an AI technology is misused for a particular application and leave severity assessments to additional research.
There are many strategies to reduce uncertainty about whether a particular AI system (call it X) will likely be misused for a specific malicious application (call it Y). We describe how researchers can assess the likelihood of malicious use of advanced AI systems at three stages:
- Plausibility (P)
- Performance (P)
- Observed use (Ou)
Plausibility tests consider whether system X can do behavior Y at all. Performance tests ask how well X can perform Y. Information about observed use tracks whether X is used to do Y in the real world.
Familiarity with these three stages of assessment—including the methods used at each stage, along with their limitations—can help policymakers critically evaluate claims about AI misuse threats, contextualize headlines describing research findings, and understand the work of the newly created network of AI safety institutes.
––––––––
This Issue Brief summarizes the key points in: Josh A. Goldstein and Girish Sastry, “The PPOu Framework: A Structured Approach for Assessing the Likelihood of Malicious Use of Advanced AI Systems,” Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7, no. 1 (2024): 503–518.