Bioinformatics is a field of biotechnology that lies at the intersection of data, computer science, and biology. Specifically, bioinformatics employs computational methods to address research questions in biological sciences and is often used in regards to storing, processing, and analyzing biological data (such as genetic information, including DNA). Innovations in bioinformatics can aid in the development of critical medicines and tools for research in the life sciences.
To take a broad look at the bioinformatics research landscape, we examined research clusters (RCs) in CSET’s Map of Science. We searched for RCs with “Bioinformatics” as a common subject classification (see Microsoft Academic Graph). We then filtered out RCs that have fewer than 50 papers from the past 5 years, that are more than 20 years old, and with fewer than half of their papers having AI classifications, in order to ensure confidence in AI paper classifications. This resulted in 1,212 bioinformatics RCs.
Figure 1. Broad subject area for bioinformatics RCs
Figure 1 breaks down the resulting bioinformatics RCs by top-level subject area. Most bioinformatics RCs fall within medicine, while just over a third fall within biology. Social science trails in a distant third place. One RC falls within the materials science subject area, representing less than 0.08 percent of bioinformatics RCs. For ease of visualization this RC is excluded from the chart.
Figure 2. Country affiliation for papers in all bioinformatics RCs
As shown in Figure 2, the United States is the most prolific producer of papers in bioinformatics RCs, closely followed by China and then distantly by the United Kingdom.
Only a small proportion of bioinformatics RCs contain Al-related papers: less than 5 percent of these RCs contain at least 5 percent AI-related papers, with about 2.5 percent of them meeting the 10 percent AI-related paper threshold for cross-disciplinary AI research. One should note that the papers in bioinformatics RCs with smaller proportions of AI-related papers may still employ substantial amounts of computational methods in their research even if they are not necessarily using AI-specific tasks.
In any case, it is useful to examine AI-heavy bioinformatics RCs to gain insight on what AI in bioinformatics looks like. Filtering for bioinformatics RCs with at least 10 percent AI-related papers results in 34 RCs. Looking at just these RCs, the United States’ lead in the number of papers widened, though the top 10 paper-producing countries did not change.
Figure 3. Country affiliation for papers in bioinformatics and AI cross-disciplinary RCs with 10 percent or more AI-related papers
RC 70122 is the bioinformatics RC with the most AI-related papers—47 percent are classified as AI-related—and is one of the seven bioinformatics RCs that consist of more than 25 percent AI papers. This RC falls within the social science subject area. RC 70122 publication output is led by the United States.
In addition to social science, this RC has significant amounts of medical, biology, psychology, and machine learning research. It appears to center on neurodegenerative illness and neuroimaging. It also consists of over 10 percent computer vision papers.
RC 70122 core papers
- Biological subtypes of Alzheimer disease: A systematic review and meta-analysis, 2020, Neurology
- Uncovering the heterogeneity and temporal complexity of neurodegenerative diseases with Subtype and Stage Inference, 2018, Nature Communications
- DIVE: A spatiotemporal progression model of brain pathology in neurodegenerative disorders, 2019, arXiv
- Probabilistic disease progression modeling to characterize diagnostic uncertainty: Application to staging and prediction in Alzheimer’s disease, 2017, NeuroImage
- Heterogeneous patterns of brain atrophy in Alzheimer’s disease, 2018: Neurobiology of Aging
Industry-funded bioinformatics RCs
Additionally, of the 1,212 bioinformatics RCs we initially pulled, we found 16 that had more than 15 percent industry-funded papers. These 16 RCs are dominated by the United States, with only one having more than 10 percent Chinese-language papers (RC 28815 has 13 percent Chinese-language papers). Of the 16 RCs, 11 ( 69 percent) focus on cancer research-related topics. One of the non-cancer-focused RCs is RC 29289, which is the only RC in this group with more than 10 percent AI-related papers.
This bioinformatics RC with high industry funding and more than 10 percent AI-related papers focuses on the blood-brain barrier, drug discovery and pharmacology, and the central nervous system. Authors for 23 percent of the papers in this RC report industry funding, and the number of papers in this RC grew 36 percent in the last year. While it does not primarily focus on cancer-related research, some cancer-related keywords are present in the top CSET-extracted phrases from papers in this RC.
RC 29289 core papers
- LightBBB: computational prediction model of blood-brain-barrier penetration based on LightGBM, 2020, Bioinformatics
- Prediction of brain:blood unbound concentration ratios in CNS drug discovery employing in silico and in vitro model systems, 2018, Drug Discovery Today
- A Generic Multi-Compartmental CNS Distribution Model Structure for 9 Drugs Allows Prediction of Human Brain Target Site Concentrations, 2016, Pharmaceutical Research
- Predicting drug concentration-time profiles in multiple CNS compartments using a comprehensive physiologically-based pharmacokinetic model, 2017, CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY
- In Silico Prediction of Blood–Brain Barrier Permeability of Compounds by Machine Learning and Resampling Methods, 2018, ChemMedChem
Find CSET’s other data snapshots explaining the CSET Map of Science and topic-specific explorations below.