Data, algorithms and models

Using Machine Learning to Fill Gaps in Chinese AI Market Data

Zachary Arnold Joanne Boisson Lorenzo Bongiovanni Daniel Chou Carrie Peelman Ilya Rahkovsky
| February 2021

In this proof-of-concept project, CSET and Amplyfi Ltd. used machine learning models and Chinese-language web data to identify Chinese companies active in artificial intelligence. Most of these companies were not labeled or described as AI-related in two high-quality commercial datasets. The authors' findings show that using structured data alone—even from the best providers—will yield an incomplete picture of the Chinese AI landscape.

From China to San Francisco: The Location of Investors in Top U.S. AI Startups

Rebecca Kagan Rebecca Gelles Zachary Arnold
| February 2021

Foreign investors comprise a significant portion of investors in top U.S. AI startups, with China as the leading location. The authors analyze investment data in the U.S. AI startup ecosystem both domestically and abroad, outlining the sources of global investment.

Corporate Investors in Top U.S. AI Startups

Rebecca Kagan Rebecca Gelles Zachary Arnold
| February 2021

Corporate investors are a significant player in the U.S. AI startup ecosystem, funding 71 percent of top U.S. AI startups. The authors analyze the trends in top corporate funders and the startups receiving corporate money.

Comparing Corporate and University Publication Activity in AI/ML

Simon Rodriguez Tim Hwang Rebecca Gelles
| January 2021

Based on news coverage alone, it can seem as if corporations dominate the research on artificial intelligence and machine learning when compared to the work of universities and academia. Authors Simon Rodriguez, Tim Hwang and Rebecca Gelles analyze the data over the past decade of research publications and find that, in fact, universities are the more dominant producers of AI papers. They also find that while corporations do tend to generate more citations to the work they publish in the field, these “high performing” papers are most frequently cross-collaborations with university labs.

China’s STI Operations

William Hannas Huey-Meei Chang
| January 2021

Open source intelligence (OSINT) and science and technology intelligence (STI) are realized differently in the United States and China, China putting greater value on both. In the United States’ understanding, OSINT “enables” classified reporting, while in China it is the intelligence of first resort. This contrast extends to STI which has a lower priority in the U.S. system, whereas China and its top leaders personally lavish great attention on STI and rely on it for national decisions. Establishing a “National S&T Analysis Center” within the U.S. government could help to address these challenges.

AI and the Future of Cyber Competition

Wyatt Hoffman
| January 2021

As states turn to AI to gain an edge in cyber competition, it will change the cat-and-mouse game between cyber attackers and defenders. Embracing machine learning systems for cyber defense could drive more aggressive and destabilizing engagements between states. Wyatt Hoffman writes that cyber competition already has the ingredients needed for escalation to real-world violence, even if these ingredients have yet to come together in the right conditions.

Mapping U.S. Multinationals’ Global AI R&D Activity

Roxanne Heston Remco Zwetsloot
| December 2020

Many factors influence where U.S. tech multinational corporations decide to conduct their global artificial intelligence research and development (R&D). Company AI labs are spread all over the world, especially in North America, Europe and Asia. But in contrast to AI labs, most company AI staff remain concentrated in the United States. Roxanne Heston and Remco Zwetsloot explain where these companies conduct AI R&D, why they select particular locations, and how they establish their presence there. The report is accompanied by a new open-source dataset of more than 60 AI R&D labs run by these companies worldwide.

Hacking AI

Andrew Lohn
| December 2020

Machine learning systems’ vulnerabilities are pervasive. Hackers and adversaries can easily exploit them. As such, managing the risks is too large a task for the technology community to handle alone. In this primer, Andrew Lohn writes that policymakers must understand the threats well enough to assess the dangers that the United States, its military and intelligence services, and its civilians face when they use machine learning.

Automating Cyber Attacks

Ben Buchanan John Bansemer Dakota Cary Jack Lucas Micah Musser
| November 2020

Based on an in-depth analysis of artificial intelligence and machine learning systems, the authors consider the future of applying such systems to cyber attacks, and what strategies attackers are likely or less likely to use. As nuanced, complex, and overhyped as machine learning is, they argue, it remains too important to ignore.

The Future of Data Science

National Academies of Sciences, Engineering, and Medicine
| November 4, 2020

CSET Founding Director Jason Matheny presented the keynote address at the virtual colloquium on the future of data science and the implications for privacy and national security hosted by the National Academies of Sciences, Engineering, and Medicine.