Executive Summary
Cyber threats are multiplying and escalating. AI could exacerbate the problem or be part of the solution. Innovations in machine learning (ML) methodologies have already proven their usefulness for cybersecurity. But can ML-enabled defenses deployed at scale contend with adaptive attackers? To level the playing field for defenders, machine learning must be able to perform reliably under sustained pressure from offensive campaigns—without constant human supervision.
Yet machine learning carries with it new security challenges. ML systems rely on patterns in data to make predictions, which can be manipulated by attackers to evade defenses. Techniques to make ML systems more robust to these deceptions often harm their accuracy. They essentially prevent the system from looking for certain patterns useful for making predictions, and thus make it more prone to error. This accuracy-robustness tradeoff creates a problem, for example, for an ML-based antivirus system continuously adapting to keep pace with evolving malware. The developer can carefully supervise the system and harden it to deceptive attacks, but doing so impedes its ability to accurately detect new malware. Deploying ML systems to deal with dynamic threats will require constantly balancing these risks.
This balancing act will present a serious challenge for automating ML-cyber defenses at scale. Defenders have options to manage risk tradeoffs by employing multiple systems, hardening them against specific threats, and offsetting their limitations with non-ML tools. But defenders’ reliance on machine learning will change the threat landscape in ways that undermine their ability to proactively make these tactical choices. Machine learning will expand the attack surface by creating new dependencies on widely relied upon tools, services, and data sources. Persistent attackers will exploit the compromises defenders make between accuracy and robustness. The most sophisticated adversaries may use ML capabilities themselves to counter ML-based defenses. Defenders need tools to gauge and improve accuracy and robustness that can cope with these innovations in adversary tactics and offensive campaigns.
The imperative for policy and strategy will be to put defenders on solid ground—enabling and empowering them to manage these tradeoffs at a tactical level. This includes four recommendations for government efforts to shape the trajectory of the emerging ML-cybersecurity ecosystem toward a more tenable situation for defenders:
- Build security into the process of ML design and development. The typical approach to development that prioritizes efficiency and accuracy will not suffice. Cybersecurity calls for systems that can learn continuously while under pressure from attackers and demonstrate provable robustness to real-world threats.
- Promote resilience through system diversity and redundancy. Machine learning will never be foolproof. Decisionmakers must set thresholds for critical security settings that limit the impacts of ML vulnerabilities, including offsetting risks with ML and non-ML tools and safeguards.
- Manage the risk that cuts across the ML and cybersecurity ecosystem. This requires mapping out and protecting critical dependencies, from reliance on shared datasets and open-source tools to feedback loops in deployed systems.
- Counter strategic rivals’ attempts to compromise and sabotage ML development. Rivals will try to infiltrate development processes, whether to extract information in order to reverse engineer ML systems or to manipulate training data to corrupt them. Successful defense at a tactical level will depend on thwarting offensive campaigns that aim to fatally compromise ML defenses before they are even deployed.