Coronaviruses comprise of a large group of viruses that can cause a variety of ailments, from common cold to Severe Acute Respiratory Syndrome (SARS). SARS-Cov-2 is the official name of the coronavirus strain behind the current global pandemic and Covid-19 is the official name of the disease it causes.
Health workers and medical experts are working tirelessly on the front lines to diagnose, treat and provide care to Covid-19 patients. Governments across the globe are devising directives and coming up with interventions to beat back the coronavirus outbreak. We are at a crucial juncture now but there are a lot of known unknowns about the coronavirus and Covid-19. It is critical that healthcare experts and policy makers quickly gain a better understanding of the coronavirus.
There is a large body of published about the coronavirus family going back to several decades. A body of literature about SARS-Cov-2 and Covid-19 has also emerged as frontline medical researchers rapidly document their observations and experiences in the current outbreak. Medical and policy researchers should be able to query this body of knowledge and get answers to many pressing questions.
Synthesizing such vast amounts of data to generate useful insights is a good candidate to apply machine learning techniques. For example, there are more than 2000 papers published in 2020 itself and without the help of AI tools, it is practically impossible to keep track of all the research. But before we can apply AI, the data needs to be accessible and in a format that machines can read.
Towards this end, based on a request from the White House Office of Science and Technology Policy, prominent industry and academic research groups have compiled a dataset of scholarly literature about Covid-19, SARS-Cov-2 and coronaviruses and making it available freely to the public.
The organizations involved in this initiative are:
- Allen Institute for AI (AI2)
- Chang Zuckerberg Institute
- Georgetown University’s Center for Security and Emerging Technology
- Microsoft
- National Library of Medicine at the National Institutes of Health
COVID-19 Open Research Dataset (CORD-19) is a collection of research studies published in both peer-reviewed journals and non-peer-reviewed pre-print websites such as bioRxiv and medRxiv. Currently, it consists of over 13,000 full-text papers and abstracts for another 16,000 papers and is expected to be updated with new research as it becomes available.
Here is the call to action to AI experts, data scientists and technology professionals.
World Health Organization (WHO) and the National Academies of Sciences, Engineering and Medicine have identified the key scientific questions about Covid-19. Here is a high-level summary of these key questions.
Based on available literature, what do we know about:
- Virus Transmission, Incubation And Stability, including seasonality, incubation period, asymptomatic transmission, persistence of virus on different surfaces and effectiveness of protective gear.
- Medical Care, including challenges, solutions and best practices related to management of surge capacity, addressing shortages, Telemedicine and home care
- Risk Factors And Effective Mitigation Measures such as impact of pre-existing diseases, impact of behavioral factors such as smoking and susceptibility of groups such as pregnant women and …
- Virus Origins, Genetics and Evolution, including variations of the virus over time and geography, the different strains that may be in circulation, animal hosts and livestock infections.
- Non-pharmaceutical Interventions, including methods and barriers to prevent community spread, impact assessment of measures such as school closures, travel bans, physical distancing and prohibition of large gatherings.
- Vaccines And Therapeutics, including effectiveness of drugs in development, clinical effectiveness studies, approaches to distribute new therapeutics.
- Diagnostics And Surveillance, including screening policies and protocols, early detection, sampling methods, guidance at national, state and local levels and the trade-offs involved in rapid testing between speed, accuracy and accessibility.
- Ethical Considerations, including norms of social science research, impact and needs of caregivers and identifying drivers of fear, stigma and misinformation during outbreaks.
- Information Sharing and Collaboration, including data-collection standards, communication methods, coordination of local and Federal, private, public, non-commercial and academic communities.
The dataset is available for download on AI2’s Semantic Scholar website. The machine learning and data science website Kaggle, a subsidiary of Google, has all the details about the specific pieces of the puzzle that AI experts can help put together.
The stakes have never been higher. Carpe Diem!