Executive Summary
Recent developments have improved the ability of large language models (LLMs) and other AI systems to generate computer code. While this is promising for the field of software development, these models can also pose direct and indirect cybersecurity risks. In this paper, we identify three broad categories of risk associated with AI code generation models: 1) models generating insecure code, 2) models themselves being vulnerable to attack and manipulation, and 3) downstream cybersecurity impacts such as feedback loops in training future AI systems.
Existing research has shown that, under experimental conditions, AI code generation models frequently output insecure code. However, the process of evaluating the security of AI-generated code is highly complex and contains many interdependent variables. To further explore the risk of insecure AI-written code, we evaluated generated code from five LLMs. Each model was given the same set of prompts, which were designed to test likely scenarios where buggy or insecure code might be produced. Our evaluation results show that almost half of the code snippets produced by these five different models contain bugs that are often impactful and could potentially lead to malicious exploitation. These results are limited to the narrow scope of our evaluation, but we hope they can contribute to the larger body of research surrounding the impacts of AI code generation models.
Given both code generation models’ current utility and the likelihood that their capabilities will continue to improve, it is important to manage their policy and cybersecurity implications. Key findings include the below.
- Industry adoption of AI code generation models may pose risks to software supply chain security. However, these risks will not be evenly distributed across organizations. Larger, more well-resourced organizations will have an advantage over organizations that face cost and workforce constraints.
- Multiple stakeholders have roles to play in helping to mitigate potential security risks related to AI-generated code. The burden of ensuring that AI-generated code outputs are secure should not rest solely on individual users, but also on AI developers, organizations producing code at scale, and those who can improve security at large, such as policymaking bodies or industry leaders. Existing guidance such as secure software development practices and the NIST Cybersecurity Framework remains essential to ensure that all code, regardless of authorship, is evaluated for security before it enters production. Other cybersecurity guidance, such as secure-by-design principles, can be expanded to include code generation models and other AI systems that impact software supply chain security.
- Code generation models also need to be evaluated for security, but it is currently difficult to do so. Evaluation benchmarks for code generation models often focus on the models’ ability to produce functional code but do not assess their ability to generate secure code, which may incentivize a deprioritization of security over functionality during model training. There is inadequate transparency around models’ training data—or understanding of their internal workings—to explore questions such as whether better performing models produce more insecure code.