Introduction
U.S. policymakers care about research output. Tracking research output can, among other things, inform assessments of any given country’s innovativeness or assist in evaluating the impact of certain funding initiatives. Specific to artificial intelligence (AI) and other emerging technologies, U.S. policymakers are eager to understand and compare research trends. But measuring research output is not as straightforward as it may seem. There are different ways to define output and different data sources that can be relied on, which can affect the outcomes of assessments.
One way of measuring output is by counting the number of publications in a research area as an indicator of interest or focus in that research area. Capturing research quality, not just quantity, by taking citations into account is another approach to measuring output. This data brief takes the former approach of counting publication quantities as a way to explore trends in AI research.
When counting AI publication output, it is critical to consider where the publications are published and the language in which a work is published. In developing CSET’s merged corpus of scholarly literature, we intentionally incorporated China National Knowledge Infrastructure (CNKI; 中国知网), a key Chinese-language data source, in addition to predominantly English-language sources, specifically Web of Science, Digital Science Dimensions, Microsoft Academic Graph, arXiv, and Papers With Code.
The inclusion of CNKI means that counting AI research at CSET may produce different results, compared to analyses without CNKI. While there may be times when it is appropriate to separate CNKI and predominately English-language publications, the ability to analyze them jointly offers a broader view of the research landscape and an in-depth exploration of Chinese-language AI research. This data brief explores the implications of including CNKI by presenting AI research publication trends in CSET’s merged corpus including and excluding CNKI.