Red-teaming is a popular evaluation methodology for AI systems, but it is still severely lacking in theoretical grounding and technical best practices. This blog introduces the concept of threat modeling for AI red-teaming and explores the ways that software tools can support or hinder red teams. To do effective evaluations, red-team designers should ensure their tools fit with their threat model and their testers.
As AI agents become more autonomous and capable, organizations need new approaches to deploy them safely at scale. This explainer introduces the rapidly growing field of AI control, which offers practical techniques for organizations to get useful outputs from AI agents even when the AI agents attempt to misbehave.
Kyle Crichton, Abhiram Reddy, Jessica Ji, Ali Crawford, Mia Hoffmann, Colin Shea-Blymyer, and John Bansemer
| September 2025
Organizations looking to adopt artificial intelligence (AI) systems face the challenge of deciphering a myriad of voluntary standards and best practices—requiring time, resources, and expertise that many cannot afford. To address this problem, this report distills over 7,000 recommended practices from 52 reports into a single harmonized framework. Integrating new AI guidance with existing safety and security practices, this work provides a road map for organizations navigating the complex landscape of AI guidance.
CSET’s Jessica Ji shared her expert analysis in an interview published by Science News. The interview discusses the U.S. government’s new action plan to integrate artificial intelligence into federal operations and highlights the significant privacy, cybersecurity, and civil liberties risks of using AI tools on consolidated sensitive data, such as health, financial, and personal records.
Frontier AI capabilities show no sign of slowing down so that governance can catch up, yet national security challenges need addressing in the near term. This blog post outlines a governance approach that complements existing commitments by AI companies. This post argues the government should take targeted actions toward AI preparedness: sharing national security expertise, promoting transparency into frontier AI development, and facilitating the development of best practices.
Despite recent upheaval in the AI policy landscape, AI evaluations—including AI red-teaming—will remain fundamental to understanding and governing the usage of AI systems and their impact on society. This blog post draws from a December 2024 CSET workshop on AI testing to outline challenges associated with improving red-teaming and suggest recommendations on how to address those challenges.
As new advanced AI systems roll out, there is widespread disagreement about malicious use risks. Are bad actors likely to misuse these tools for harm? This report presents a simple framework to guide the questions researchers ask—and the tools they use—to evaluate the likelihood of malicious use.
Jessica Ji, Jenny Jun, Maggie Wu, and Rebecca Gelles
| November 2024
Artificial intelligence models have become increasingly adept at generating computer code. They are powerful and promising tools for software development across many industries, but they can also pose direct and indirect cybersecurity risks. This report identifies three broad categories of risk associated with AI code generation models and discusses their policy and cybersecurity implications.
Kyle Crichton, Jessica Ji, Kyle Miller, John Bansemer, Zachary Arnold, David Batz, Minwoo Choi, Marisa Decillis, Patricia Eke, Daniel M. Gerstein, Alex Leblang, Monty McGee, Greg Rattray, Luke Richards, and Alana Scott
| October 2024
As critical infrastructure operators and providers seek to harness the benefits of new artificial intelligence capabilities, they must also manage associated risks from both AI-enabled cyber threats and potential vulnerabilities in deployed AI systems. In June 2024, CSET led a workshop to assess these issues. This report synthesizes our findings, drawing on lessons from cybersecurity and insights from critical infrastructure sectors to identify challenges and potential risk mitigations associated with AI adoption.
This year, CSET researchers returned to the DEF CON cybersecurity conference to explore how understandings of AI red-teaming practices have evolved among cybersecurity practitioners and AI experts. This blog post, a companion to "How I Won DEF CON’s Generative AI Red-Teaming Challenge", summarizes our takeaways and concludes with a list of outstanding research questions regarding AI red-teaming, some of which CSET hopes to address in future work.
This website uses cookies.
To learn more, please review this policy. By continuing to browse the site, you agree to these terms.
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.