COVID-19 Dataset (CORD-19)

A Free, Open Resource for the Global AI Community

The White House is tapping the expertise of researchers from Georgetown’s Center for Security and Emerging Technology to determine how data and open research can be used to address the COVID-19 pandemic. CSET has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a resource of more than 57,000 articles in JSON format about COVID-19 and the coronavirus family of viruses for use by the global machine learning community. The dataset represents the most extensive machine-readable coronavirus literature collection available for data and text mining to date.

With this step, we’ve made available full-text, machine-readable resources to help speed response to this global crisis. The worldwide machine learning community now has the opportunity to apply recent advances in natural language processing to find answers to important questions about this infectious disease.Dewey Murdick, CSET Director of Data Science

CORD-19 contains 45,000 full-text articles with a wealth of information about the novel coronavirus (SARS-CoV-2), the associated illness COVID-19, and related viruses. The collection will be updated as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.

At the request of the White House Office of Science and Technology Policy, CSET leads this effort in partnership with the Allen Institute for AI, Chan Zuckerberg Initiative, Microsoft Research and the National Library of Medicine of the National Institutes of Health. Read the press release here.

Now, preliminary answers to questions about COVID-19 — including estimations of reproduction rate, incubation period, and key risk factors — are emerging. Kaggle has summarized early findings extracted from the CORD-19 papers by machine learning algorithms. Learn more here.

Media Coverage

White House, Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset
Wired, Researchers Will Deploy AI to Better Understand Coronavirus
Wired, AI Can Help Scientists Find a Covid-19 Vaccine
Defense One, How to Counter China’s Coronavirus Disinformation Campaign
Forbes, A Call To Action To AI Experts: Join The Fight Against The Coronavirus
Forbes, Our Smartphone Data Can Predict How Coronavirus Will Spread
Federal News Network, Using data in the fight against coronavirus
Fedscoop, White House aims to answer WHO’s coronavirus questions using natural language processing
Geekwire, AI2 and Microsoft join the White House’s push to enlist AI for the war on coronavirus
Geekwire, Software tools for mining COVID-19 research studies go viral among scientists
Geekwire, How AI is helping scientists in the fight against COVID-19, from robots to predicting the future
Nextgov, Government Partnership Offers Cash Prizes for AI Tools That Support Coronavirus Research
The Next Web, How AI helps scientists find reliable coronavirus research
TechCrunch, With launch of COVID-19 data hub, the White House issues a ‘call to action’ for AI researchers
Governing, AI to Interrogate Deep Archive to Find Insights on COVID-19
Analytics India, Top Hackathons Dedicated To Fight COVID-19
Psychology Today, White House Calls for AI Experts to Help COVID-19 Research
Tech Republic, Verizon Media builds search engine to help researchers find COVID-19 documents
The South African, CORD-19: Database of scientific articles launched to help AI fight COVID-19
Import AI, AnimeGAN; why Bengali is hard for OCR systems; help with COVID by mining the CORD-19 dataset

Center for Security and Emerging Technology

Other

A Free, Open Resource for the Global AI Community

Download the Coronavirus Open Research Dataset

Media Coverage

Other

COVID-19 Dataset (CORD-19)

A Free, Open Resource for the Global AI Community

Download the Coronavirus Open Research Dataset

Media Coverage

This website uses cookies.