A consortium of tech leaders — including Seattle’s Allen Institute for Artificial Intelligence, Microsoft and Facebook CEO Mark Zuckerberg’s charity — today unveiled an AI-enabled database that’s meant to give researchers quicker, surer access to resources relating to coronavirus and how to stop it.
The COVID-19 Open Research Dataset, or CORD-19, was created in response to a request from the White House’s Office of Science and Technology Policy. It takes advantage of AI tools to organize more than 24,000 articles about the COVID-19 disease and the SARS-CoV-2 coronavirus that causes it.
“We think that AI has an important part to play in solving this problem,” said Doug Raymond, general manager for the Semantic Scholar academic search engine at the Allen Institute for Artificial Intelligence, also known as AI2.
AI2’s CEO, Oren Etzioni, said his team leapt at the opportunity to participate in CORD-19. “We hesitated all of negative-two seconds,” he joked.
The CORD-19 database was built on the foundation laid by Semantic Scholar, and is being housed on Semantic Scholar’s website.
“The core problem is information overload in research,” Raymond explained. “There are dozens of institutions that have published research on coronavirus. … Putting all the information together in a common format that is comprehensive is a huge challenge for researchers, and it’s a great application of our AI capabilities.”
For years, AI2’s researchers have been using tools such as machine learning and natural language processing to extract key features from research literature and help researchers find studies that are most relevant to problems they’re trying to solve.
In 2018, AI2 partnered with Microsoft to expand the scope of Semantic Scholar. The AI-enabled database now takes in more than 182 million research papers from all fields of science.
Microsoft is playing a similar role in curating the contents of CORD-19.
“It’s all hands on deck as we face the COVID-19 pandemic,” Eric Horvitz, chief scientific officer at Microsoft, said in a news release. “We need to come together as companies, governments and scientists, and work to bring our best technologies to bear across biomedicine, epidemiology, AI and other sciences.”
The National Library of Medicine at the National Institutes of Health facilitated access to 10,000 scholarly articles related to coronavirus. AI2 transformed all that content and more into machine-readable form, and created an adaptive feed that keeps users up to date on the research areas in which they’re most interested.
The Chan Zuckerberg Initiative and Georgetown University’s Center for Security and Emerging Technology also contributed to the effort.
CORD-19 will continue to be updated as new research about coronavirus is published on preprint servers and in peer-reviewed publications. Raymond pointed out that Semantic Scholar can also link academic research to clinical trial data, GitHub data archives and non-academic reports based on research.
Researchers making use of the database can share the data mining tools and insights they develop in response to the CORD-19 call to action via the Kaggle data science community.
“We’re putting this dataset up in front of our community of 4.3 million data scientists in the hope that the world’s AI community can help find answers to a key set of questions about COVID-19,” Anthony Goldbloom, Kaggle’s co-founder and CEO, said in today’s news release.
The key questions were formulated in coordination with experts on infectious disease at the World Health Organization and a standing committee at the National Academies of Science, Engineering and Medicine.
Michael Kratsios, the White House’s chief technology officer, said decisive action in the scientific community will play a critical role in stopping the coronavirus outbreak. He called on the U.S. research community to make full use of CORD-19.
“The White House will continue to be a strong partner in this all-hands-on-deck approach,” Kratsios said. “We thank each institution for voluntarily lending its expertise and innovation to this collaborative effort.”
AI2’s Raymond said CORD-19 should provide a new avenue for open science, in accordance with the vision of Paul Allen, the late co-founder of Microsoft who created the institute in 2014. Working on CORD-19 also gives researchers who make their home in Seattle, one of the hottest U.S. hotspots for the coronavirus outbreak, a role in helping to end the crisis.
“We’re all impacted by it, and we’re excited to be able to contribute something based on the work that we do every day,” Raymond told GeekWire. “Hopefully it will help a lot of people and have an impact. This is what Paul Allen wanted us to be able to do, so we feel like we’re achieving some part of that.”