The power of small data: In artificial intelligence, supersized data sets and models often steal the limelight. But these models are riddled with dangerous biases and are massive polluters. Instead of focusing efforts on building ever bigger data sets, policymakers should pay attention and direct funding to so-called small data techniques, which have huge AI potential, according to a report from Georgetown University’s Center for Security and Emerging Technology (CSET) by Husanjot Chahal, Helen Toner and Ilya Rahkovsky.
AI: Decoded rang Chahal and Toner to hear more.
Small data approaches have developed fast over the past decade. They include methods such as transfer learning, data labeling, artificial data, Bayesian methods and reinforcement learning. What they all have in common is that they do not rely on massive labeled datasets.
Fine-tuning big data: Chahal and Toner give transfer learning as an especially promising example. Through transfer learning, researchers first train a model on a big dataset, and then “fine tune” it using a smaller dataset to tackle a very specific problem. This can be useful when little data exists. For example researchers could train an AI model to evaluate the risk for a disease using a dataset looking at the general population, and then fine tune it to target a minority with few digital health records.
This could also help smaller companies reap the benefits of big AI models without breaking the bank. A German team of researchers managed to use transfer learning to improve the performance of their smaller German speech recognition model by training it first with a bigger English-language model.
Helping with privacy: “Small data approaches can reduce the incentive to collect large amounts of personal data. And this is through approaches such as artificial data generation, where we don’t even need to collect real world data,” Chahal said.
Busting the big data myth: Chahal and Toner were hoping to “mythbust a little about this idea that has caught hold in DC that data is this huge strategic asset and that the U.S. is at some disadvantage relative to China, because China supposedly has access to so much more data and data is so critical,” Toner said. “We think that both of those points are not actually correct and that if your policy is being made on the basis of that understanding, that will probably be bad policy,” she continued.
Why this is good for Europe: The European Commission’s strategy is twofold: Make more data available from the bloc’s troves of industrial data, and create smart governance structures for it. Europe will struggle to match the U.S. and China when it comes to data, but it can use data more wisely. “Data advantage does not only come from a country that has access to lots of raw data as generated by a huge population size, but the advantage is more contingent upon what you do with that data,” Chahal said.
Leading the pack: Chahal and Toner found that the U.S. and China are the top two countries in producing small data research papers. But beyond that, European countries such as the U.K.,France, Germany, Spain and the Netherlands dominate the ranking list.
The power of small data: In artificial intelligence, supersized data sets and models often steal the limelight. But these models are riddled with dangerous biases and are massive polluters. Instead of focusing efforts on building ever bigger data sets, policymakers should pay attention and direct funding to so-called small data techniques, which have huge AI potential, according to a report from Georgetown University’s Center for Security and Emerging Technology (CSET) by Husanjot Chahal, Helen Toner and Ilya Rahkovsky.
AI: Decoded rang Chahal and Toner to hear more.
Small data approaches have developed fast over the past decade. They include methods such as transfer learning, data labeling, artificial data, Bayesian methods and reinforcement learning. What they all have in common is that they do not rely on massive labeled datasets.
Fine-tuning big data: Chahal and Toner give transfer learning as an especially promising example. Through transfer learning, researchers first train a model on a big dataset, and then “fine tune” it using a smaller dataset to tackle a very specific problem. This can be useful when little data exists. For example researchers could train an AI model to evaluate the risk for a disease using a dataset looking at the general population, and then fine tune it to target a minority with few digital health records.
This could also help smaller companies reap the benefits of big AI models without breaking the bank. A German team of researchers managed to use transfer learning to improve the performance of their smaller German speech recognition model by training it first with a bigger English-language model.
Helping with privacy: “Small data approaches can reduce the incentive to collect large amounts of personal data. And this is through approaches such as artificial data generation, where we don’t even need to collect real world data,” Chahal said.
Busting the big data myth: Chahal and Toner were hoping to “mythbust a little about this idea that has caught hold in DC that data is this huge strategic asset and that the U.S. is at some disadvantage relative to China, because China supposedly has access to so much more data and data is so critical,” Toner said. “We think that both of those points are not actually correct and that if your policy is being made on the basis of that understanding, that will probably be bad policy,” she continued.
Why this is good for Europe: The European Commission’s strategy is twofold: Make more data available from the bloc’s troves of industrial data, and create smart governance structures for it. Europe will struggle to match the U.S. and China when it comes to data, but it can use data more wisely. “Data advantage does not only come from a country that has access to lots of raw data as generated by a huge population size, but the advantage is more contingent upon what you do with that data,” Chahal said.
Leading the pack: Chahal and Toner found that the U.S. and China are the top two countries in producing small data research papers. But beyond that, European countries such as the U.K., France, Germany, Spain and the Netherlands dominate the ranking list.