This issue brief provides an introduction to and overview of “small data” artificial intelligence approaches—that is, approaches that help with situations where little or no labeled data is available and that reduce our dependency on massive datasets collected from the real world. According to the conventional understanding of AI, data is an essential strategic resource and any meaningful progress in cutting-edge AI techniques requires large volumes of data. This overemphasis on “big data” ignores the existence and overshadows the potential of the approaches we describe in this brief, which do not require massive datasets for training.
We present our analysis in two sections. The first introduces and classifies the main small data approaches, which we conceptualize in terms of five rough categories—transfer learning, data labeling, artificial data, Bayesian methods, and reinforcement learning—and lays out reasons for why they matter. In doing so, we aim not only to point out the potential benefits of using small data approaches, but also to deepen nontechnical readers’ understanding of when, and how, data is useful for AI. Drawing from original CSET datasets, the second section presents some exploratory findings evaluating the current and projected progress in scientific research across small data approaches, outlining which country leads, and the major sources of funding for this research. We conclude the following four key takeaways based on our findings:
a) Artificial intelligence is not synonymous with big data, and there are several alternative approaches that can be used in different small data settings.
b) Research into transfer learning is growing especially rapidly (even faster than the larger and better-known field of reinforcement learning) making this approach likely to work better and be more widely used in the future than it is today.
c) The United States and China are competing closely in small data approaches, with the United States leading in the two largest categories of reinforcement learning and Bayesian methods, and China holding a small but growing lead in the fastest-growing category of transfer learning.
d) Tentatively, transfer learning may be a promising target for greater U.S. government funding, given its smaller share of investments in small data approaches relative to investment patterns across AI as a field.
Correction (September 29, 2021): Due to an initial code error, some of the age-corrected citations in this brief were wrong upon publication. Table 2 and Figure 3 have since been updated with the right numbers, and corresponding edits have been made to the text on pages 17-18.