“Data is the new oil,” or so we’ve been told. From policy pronouncements to media reports to op-eds, many have used the attractive analogy when discussing artificial intelligence. Kai-Fu Lee, author of AI Superpowers, has written, “in the age of AI, where data is the new oil, China is the new Saudi Arabia.”1
Yet reality is far messier. With a population of 1.4 billion people, robust surveillance and data collection capabilities, and access to private sector data, the Chinese government appears to have vast quantities of data.2 But even if China has far more data than the United States, does this raw data necessarily translate into a meaningful advantage for China? And if so, is this enough to overtake the United States in AI? Both countries invest in AI for military applications; will China’s potentially greater access3 to commercial data accelerate its development of AI-enabled weapons relative to the United States?
This paper reviews the challenges in assessing whether the United States or China has a “data advantage” in the military AI realm—i.e., whether one country has access to more data in a way that confers an advantage in developing military AI systems. We provide initial insights for measuring a relative data advantage by answering three questions that are important when evaluating data competitiveness. What does it mean to have a data advantage? Does commercial data matter for military AI? Will big data stay relevant for future AI applications?
Following are the key assessments of this paper:
- Determining whether one country has a data advantage over the other is not as simple as measuring which country produces more raw data overall. Estimates that compare raw data broadly without looking at specific application or domain areas are oversimplifications that do not accurately reflect the role of data.
- A country that first reaches the experimentation phase (i.e., where data for a specific application is digitally stored, cleaned and transformed, labeled, and optimized to train a machine learning algorithm) is at an advantage over others for that application, as it is positioned to move faster toward developing its aimed AI application.
- Commercial data, while useful, will be less relevant for military operational AI. China’s access to commercial market data is unlikely to confer a military operational advantage; data needs for military AI applications are environment-specific, and little ability exists to transfer commercial data and machine learning models to military applications.
- Certain emerging approaches might make big data collected from the real world less relevant in the future, even though the applicability of these approaches to military needs remains unclear.