Memory safety is often seen as a never-ending source of cybersecurity problems and, thus, is increasingly cited as a high-priority issue in cybersecurity policy and strategy. It has been a hotly debated topic in computer security for decades, and fortunately, is not as complicated as some computer science textbooks make it seem. The cybersecurity issues surrounding memory safety are primarily rooted in two fundamental aspects of modern computing:
- The lack of strict separation in memory between code and data; and,
- The prevalence of widely used programming languages (primarily C/C++) that make introducing memory safety errors common.
This blog post attempts to bridge the gap between fundamental computer science challenges in memory safety, and their relation to cybersecurity in order to inform policy options.
A Primer On Modern Computer Architectures
Modern computer architectures have many complex components including numerous microprocessors, memory, and memory management units, graphical coprocessors, cryptographic coprocessors, storage devices, and more. Each of these components is complex enough that they require volumes of text to describe with technical precision. While there have been many innovations and advancements over the decades, the core of modern computers still reflects fundamental concepts first described over 70 years ago.
The story of modern computers can be traced back to the Universal Turing Machine paper published in 1936. This paper is believed to be the origin of the “stored-program computer” idea which was used by John Von Neumann when he first described the “Von Neumann architecture” in 1945. Traces of this architecture can still be found in modern hardware design. One of the defining aspects of Von Neumann architecture was how it simplified memory management. Memory is used to store and retrieve input by users or program outputs as a result of computation. Earlier systems stored programs separately from the data they operated on and thus were inefficient as they were slow and difficult to program and manage. The Von Neumann architecture is a “Stored Program” design that stores CPU instructions in the same memory as data. This allowed for simpler, more efficient designs and innovative applications such as self-modifying code.
Despite all the rapid advancements in computing since these seminal papers, code and data still occupy the same memory in our modern computer architectures. This fundamental aspect of their design is at the root of why memory safety is still an issue today.
C/C++ – “Memory Unsafe” Programming Languages
The C language was first introduced in 1972, and C++ arrived in 1985. These languages are ubiquitous and are used to develop applications that run everything from modern smartphones to space vehicles, and everything in between. The role these languages have played in developing and powering our modern world cannot be overstated. They are as relevant today as they were decades ago, but they are generally considered memory unsafe despite updates and modernization efforts.
But why do software engineers reach for these languages in 2023 when there are many other alternatives that have benefited from decades of research and improvement? The answer generally boils down to a mix of the following:
- A mature ecosystem of libraries and frameworks: If a software engineer needs an existing software component that can communicate over a standard protocol, or parse a common file format, there are many stable, and well-maintained options that can be integrated with little effort.
- Performance: These languages are compiled into efficient machine code in order to produce binary instructions that are optimized for the best possible performance.
- Program size: Compiling directly to machine code without high-level abstractions also results in smaller programs on disk and in memory that can run on devices with lower-tier hardware. This is particularly important for embedded applications like those in vehicles or mobile phones in the developing world.
- Broad support: There are multiple mature and stable compilers available for transforming C/C++ into machine code for nearly every CPU architecture and operating system available on the market.
Memory Safety
Languages such as C/C++ allow for great flexibility and performance by allowing the programmer to directly manipulate memory as opposed to allowing a higher level library or runtime to handle these operations. As a result it is easy for a programmer to inadvertently introduce an error in their program that allows for a routine operation to corrupt the state of memory. For example, a programmer may intend for a user input string to be placed into a fixed region of memory, but forget to validate the length of that string. The operation that copies this string into memory may corrupt other regions of memory if it doesn’t carefully check the intended maximum length of the string. These input validation steps would have to run many times per operation and thus can introduce performance overhead. This unintended memory corruption can result in the program crashing, or worse, be used by an attacker to compromise the system by injecting new program instructions.
Memory safety issues can also arise when a programming language, such as C/C++, exhibits what is termed “undefined behavior.” This happens when the programming language standard provides no specification or guidance on how the program should behave under conditions not explicitly defined in the standard. For example, a language standard may define a function to copy the contents of memory from a source to a destination, but what happens when those regions of memory are overlapping is undefined. The result and impact of undefined behavior greatly depends on the implementation, environment, compiler, and the hardware in which the code runs.
When combined with program runtime complexity, memory safety errors and undefined behavior in C/C++ programs inevitably results in exploitable security vulnerabilities. Despite decades of developer training, automated tooling, and language improvements it is still quite easy to introduce bugs that corrupt the state of memory, or introduce subtle, difficult to diagnose errors.
Other popular programming languages such as Java, Python, and Rust (and many others) do not suffer from the same kinds of memory safety issues and are considered safer from these kinds of errors. They achieve this level of safety through various mechanisms including language-level guarantees that reduce undefined behavior, and runtime enforcement of operations that would typically lead to memory safety issues in C/C++. Memory-safe languages often use type safety systems that define, and enforce, well-understood behaviors through the compiler and at runtime. In practical terms, type safety helps prevent programming mistakes that can lead to serious security vulnerabilities. Compilers for these languages are generally able to make guarantees using those type-safety rules that would be difficult or impossible to do in C/C++. These languages aren’t perfect, and developers can still introduce different kinds of security issues but they are largely free of the memory safety issues commonly found in C/C++ programs.
Without provable memory safety, it is exceedingly difficult, if not impossible, for a programmer to make any guarantees as to the safety or security of their program in both the colloquial and mathematical sense.
A Well Understood Problem
Memory unsafety is a well-understood problem in the technical community. Likewise, the security vulnerabilities that result from memory unsafety is a well-understood problem in policy circles. Decades of exploitable security vulnerabilities have clearly demonstrated the risk and increased interest in finding solutions.
Technical experts have implemented advanced software tools to detect memory safety vulnerabilities both early in the development process and long after software has been deployed to users. Significant effort has even gone into making unknown memory safety vulnerabilities difficult to exploit, often at the cost of software performance, through hardware and software modifications usually referred to as ‘exploit mitigations’. The explosive growth of software has resulted in a high number of memory safety vulnerabilities that when combined allow bad actors to bypass these mitigations.
Policymakers have recently responded by exploring options for mitigating these risks through levers such as government procurement policies, public/private partnerships with nonprofits in the open source ecosystem, and challenges that aim for new innovative approaches using AI to rapidly discover and fix vulnerabilities. At a high level, these efforts largely attempt to incentivize the use of memory-safe languages or to improve the industry’s ability to manage and respond to such vulnerabilities.
Challenges in Memory Safety
All of the attempts to mitigate these issues have yet to achieve success, but it’s not for lack of effort. There are significant technical and non-technical challenges involved that make quick solutions elusive, like:
- There are billions of lines of code already written in C/C++. These programs power our modern world, enable e-commerce, move money, and allow us to communicate with people across the world. The sheer momentum and ubiquity of these mature ecosystems and their associated tooling make abandoning them a risky proposition. Fixing all of the existing code is likely not possible given the finite number of people with the necessary expertise.
- Rewriting this existing vast infrastructure in safer programming languages would come at an enormous cost in terms of both dollars and opportunity and won’t solve all security vulnerabilities.
- Even with safer programming languages the very architectures at the foundations of modern computing still aren’t designed to eliminate the security issues introduced through memory unsafety.
Despite these challenges, current efforts to address memory safety issues are encouraging. Using AI to automatically rewrite code to use safer constructs and languages, hardware-enforced exploit mitigations, and general investments in safer languages like Rust are particularly promising; however, we shouldn’t expect the underlying issues to be mitigated overnight.