In the rush to implement national security use cases for artificial intelligence and machine learning, policymakers need to ensure they are properly weighing the risks, say experts in the field.
Like all software, artificial intelligence (AI)/machine learning (ML) is vulnerable to hacking. But because of the way it has to be trained, AI/ML is even more susceptible than most software—it can be successfully attacked even without access to the computer network it runs on.
“There hasn’t been enough policymaker attention on the risks of AI being hacked,” says Andrew Lohn, a senior fellow at the Center for Security and Emerging Technology—or CSET—a nonpartisan think tank attached to Georgetown University’s Walsh School of Foreign Service. “There are people pushing for adoption of AI without fully understanding the risks that they are going to have to accept along the way.”
In a series of papers and presentations, Lohn and his CSET colleagues have sought to draw attention to the growing body of academic research showing that AI/ML algorithms can easily be attacked in a host of different ways. “The aim isn’t to stop [the deployment of AI] but to make sure that people understand the whole package of what they’re asking for,” he says.
Lohn believes the AI hacking threat is not merely academic or theoretical. White hat hackers have successfully demonstrated real-world attacks against AI-powered autonomous driving systems such as those used by Tesla cars. Researchers from Chinese e-commerce giant Tencent managed to get the car’s autopilot feature to switch lanes into oncoming traffic using inconspicuous stickers on the roadway.
At McAfee Security, researchers used similarly discreet stickers on speed limit signs to get the car to speed up to 85 miles in a 35 mile-an-hour zone.
Based on his interactions with engineers working on commercial AI technology, Lohn believes these or other kinds of attacks have been carried out in the wild by real cyber threat actors. “People are very reluctant to talk about when they’ve been attacked,” he says. “It’s more of a nod and a wink.”
And, of course, successful attacks might not be detected. Neal Ziring, the technical director of the National Security Agency’s (NSA’s) Cybersecurity Directorate, told a Billington CyberSecurity webinar in January that although there was burgeoning academic literature about how to perform attacks, research on detecting them was “far less mature at this point.”
“The toughest aspect of securing AI/ML,” Ziring said, was the fact that it had a deployment pipeline that attackers could strike at any point.
AI/ML systems have to be trained before deployment using vast data sets—pictures of faces, for instance, are used to train facial recognition software. By looking at millions of labeled images, AI/ML can be trained to distinguish, say, cats from dogs.
But it’s this training pipeline that makes AI/ML vulnerable even to attackers who don’t have any access to the network it’s running on.
Data poisoning attacks work by seeding specially crafted images into AI/ML training sets, which in some cases have been scraped off the public Internet or harvested from social media or other platforms.
“If you don’t know where those data items are coming from,” warned Ziring, “if you don’t track their provenance and their integrity carefully, you could allow inappropriate examples or malicious examples to sneak in.”
Although indistinguishable to human eyes from a genuine image, the poisoned images contain data that can train the AI/ML to misidentify whole categories of items. “The mathematics of these systems, depending on what type of model you’re using, can be very susceptible to shifts in the way recognition or classification is done, based on even a small number of training items,” explained Ziring.
Indeed, according to a presentation last year by Stanford cryptography professor Dan Boneh, a single corrupt image in a training set can be enough to poison an algorithm and cause it to erroneously identify thousands of images.
Poisoned images can be crafted in many different ways, Boneh explained, demonstrating one technique known as the fast gradient sign method, or FGSM, which identifies key data points in training images. Using FGSM, an attacker can make pixel-level changes called “perturbations” into an image. Though invisible to the human eye, these perturbations turn the image into an “adversarial example,” providing data inputs that make the AI/ML misidentify it by fooling the model it’s using.
“The model is trained to do one thing, but we can confuse it by submitting a carefully crafted adversarial example,” Boneh said.
Attacks using FGSM are typically “white box” attacks, where the attacker has access to the source code of the AI/ML. White box attacks can be conducted on open source AI/ML, of which there’s a swiftly growing library on GitHub and other open source repositories.
But academics have demonstrated plenty of “black box” data poisoning attacks too, where the attacker only has access to the inputs, the training data and the outputs—the ability to see how the system classifies incoming images.
Indeed, the Defense Advanced Research Projects Agency (DARPA) says that the “rapidly proliferating” academic research in the field is “characterized by ever more complex attacks that require progressively less knowledge about the ML system being attacked, while proving increasingly strong against defensive countermeasures.” The agency’s Guaranteeing AI Robustness against Deception (GARD) program is charged with “the development of theoretical foundations for defensible ML” and “the creation and testing of defensible systems.”
GARD is building a testbed on which the robustness and defensibility of AI/ML systems can be measured under real-world threat scenarios. The aim? “To create deception-resistant ML technologies with stringent criteria for evaluating their robustness.”
The proliferation of open source AI/ML tools, including data training sets of uncertain provenance, opens the door to software supply-chain attacks, as well as data poisoning, points out CSET’s Lohn.
Malicious contributors have successfully introduced malicious coding into open source projects, so it’s not a purely hypothetical threat. “AI is a battleground for the great powers. Machine learning applications are increasingly high-value targets for highly sophisticated cyber adversaries,” he says.
One of the types of attack that most concerns national security agencies is the so-called extraction attack. Extraction is a black box attack technique used to reverse-engineer AI/ML models or get insight into the data used to train them.
The NSA’s Ziring explained extraction attacks this way: “If you’re a government agency, you’ve put a lot of effort into training your model, perhaps you used highly sensitive data to train it … an attacker might attempt to query your model in a mathematically guided fashion in order to extract facts about the model, its behavior or the data that was used to train it. If the data used to train it was highly sensitive, proprietary, nonpublic, you don’t want that to happen.”
A public-private partnership model should be used to address threats to AI/ML, Ziring said. “Multistakeholder organizations are exactly the right place to pursue this. Right? Because you’re going to get that diversity of viewpoint, and the different insights from different stakeholders that can help inform a security consensus … necessary for securing AI/ML systems.”
“Attacks will happen in the real world,” warns University of Illinois at Urbana-Champaign Prof. Bo Li, arguing that it’s important to get out ahead of the threat. “I would say it’s important to investigate this now rather than later because, otherwise, it’s even more expensive to develop those defense algorithms once we already have the models deployed.”
Integrating “design principles like explainability and human-style knowledge and reasoning ability such as domain knowledge” could help, Li says. By designing explainability into AI/ML systems from the beginning, their operators and defenders get tools they can use to build defenses, she explains.
“We need to understand why the model is giving us this prediction, what quantitative or qualitative measures I can trace back to know what information triggered this prediction, and if I know this prediction is wrong or has a problem, I can then understand what might have caused the problem.”