Policymakers, market analysts, and academic researchers often use commercial databases to identify artificial intelligence-related companies and investments. High-quality commercial datasets have many advantages, but by design or by accident, they may overlook some AI-related companies. This proof-of-concept brief describes a new means of identifying these “missing” companies. We used machine learning (ML) models developed by Amplyfi Ltd. and Chinese-language web data to identify Chinese companies active in AI, then manually confirmed whether two leading commercial datasets, Crunchbase and PEData/Zero2IPO, included these companies and associated them with AI.
We found that most of the companies identified by Amplyfi’s models were not labeled or described as AI-related in these databases. Although our findings are preliminary, the sheer volume of the “hidden” companies suggests that no matter one’s definition of AI activity, using structured data alone—even from the best providers—will yield an incomplete picture of the Chinese AI landscape. ML-based approaches can complement these structured datasets, providing clearer insight into commercial AI activity in China.