When people hear “artificial intelligence,” many envision “big data.” There’s a reason for that: some of the most prominent AI breakthroughs in the past decade have relied on enormous data sets. Image classification made enormous strides in the 2010s thanks to the development of ImageNet, a data set containing millions of images hand sorted into thousands of categories. More recently GPT-3, a language model that uses deep learning to produce humanlike text, benefited from training on hundreds of billions of words of online text. So it is not surprising to see AI being tightly connected with “big data” in the popular imagination. But AI is not only about large data sets, and research in “small data” approaches has grown extensively over the past decade—with so-called transfer learning as an especially promising example.
Also known as “fine-tuning,” transfer learning is helpful in settings where you have little data on the task of interest but abundant data on a related problem. The way it works is that you first train a model using a big data set and then retrain slightly using a smaller data set related to your specific problem. For example, by starting with an ImageNet classifier, researchers in Bangalore, India, used transfer learning to train a model to locate kidneys in ultrasound images using only 45 training examples. Likewise, a research team working on German-language speech recognition showed that they could improve their results by starting with an English-language speech model trained on a larger data set before using transfer learning to adjust that model for a smaller data set of German-language audio.
Research in transfer learning approaches has grown impressively over the past 10 years. In a new report for Georgetown University’s Center for Security and Emerging Technology (CSET), we examined current and projected progress in scientific research across “small data” approaches, broken down in terms of five rough categories: transfer learning, data labeling, artificial data generation, Bayesian methods and reinforcement learning. Our analysis found that transfer learning stands out as a category that has experienced the most consistent and highest research growth on average since 2010. This growth has even outpaced the larger and more established field of reinforcement learning, which in recent years has attracted widespread attention.
Read the full article at Scientific American.