Small Data’s Big AI Potential

Husanjot Chahal,

Helen Toner,

and Ilya Rahkovsky

September 2021

Conventional wisdom suggests that cutting-edge artificial intelligence is dependent on large volumes of data. An overemphasis on “big data” ignores the existence—and underestimates the potential—of several AI approaches that do not require massive labeled datasets. This issue brief is a primer on “small data” approaches to AI. It presents exploratory findings on the current and projected progress in scientific research across these approaches, which country leads, and the major sources of funding for this research.

Download Full Report

Executive Summary

This issue brief provides an introduction to and overview of “small data” artificial intelligence approaches—that is, approaches that help with situations where little or no labeled data is available and that reduce our dependency on massive datasets collected from the real world. According to the conventional understanding of AI, data is an essential strategic resource and any meaningful progress in cutting-edge AI techniques requires large volumes of data. This overemphasis on “big data” ignores the existence and overshadows the potential of the approaches we describe in this brief, which do not require massive datasets for training.

We present our analysis in two sections. The first introduces and classifies the main small data approaches, which we conceptualize in terms of five rough categories—transfer learning, data labeling, artificial data, Bayesian methods, and reinforcement learning—and lays out reasons for why they matter. In doing so, we aim not only to point out the potential benefits of using small data approaches, but also to deepen nontechnical readers’ understanding of when, and how, data is useful for AI. Drawing from original CSET datasets, the second section presents some exploratory findings evaluating the current and projected progress in scientific research across small data approaches, outlining which country leads, and the major sources of funding for this research. We conclude the following four key takeaways based on our findings:

a) Artificial intelligence is not synonymous with big data, and there are several alternative approaches that can be used in different small data settings.

b) Research into transfer learning is growing especially rapidly (even faster than the larger and better-known field of reinforcement learning) making this approach likely to work better and be more widely used in the future than it is today.

c) The United States and China are competing closely in small data approaches, with the United States leading in the two largest categories of reinforcement learning and Bayesian methods, and China holding a small but growing lead in the fastest-growing category of transfer learning.

d) Tentatively, transfer learning may be a promising target for greater U.S. government funding, given its smaller share of investments in small data approaches relative to investment patterns across AI as a field.

—

Correction (September 29, 2021): Due to an initial code error, some of the age-corrected citations in this brief were wrong upon publication. Table 2 and Figure 3 have since been updated with the right numbers, and corresponding edits have been made to the text on pages 17-18.

Download Full Report

Small Data’s Big AI Potential

Center for Security and Emerging Technology

Pentagon Standoff Is a Decisive Moment for How A.I. Will Be Used in War

Reports

Executive Summary

Download Full Report

Pentagon Standoff Is a Decisive Moment for How A.I. Will Be Used in War

Reports

Small Data’s Big AI Potential

Executive Summary

Download Full Report

This website uses cookies.