The convergence of AI and biotechnology regularly made headlines in 2024. The year saw success stories for AI-developed drugs and a Nobel Prize for the pioneering developers of protein design tools, raising public awareness of the opportunity for AI to advance biomedical research. But 2024 also brought continued conversation around fears that bad actors could use AI to make biological or chemical weapons. These opposing narratives around AI for biotechnology made us wonder how biotech researchers are actually using AI in published research.
Building on prior analysis, we leveraged CSET’s merged academic corpus, enriched publication metadata, and research clusters to explore how AI is being used in biology research.1 We focused on 10 biology subfields, selected in collaboration with experts at the Chan Zuckerberg Initiative: Cell Biology, Single-cell Transcriptomics, Developmental Biology, Inflammation, Metagenomics, Metabolic Network Modeling, Neuroscience, Rare Diseases, Protein Structure Prediction, and Cancer Research. CSET’s enhanced publications corpus includes in-house field of study scoring (see here and here), AI relevance predictions (see here and here), and extracted AI/ML research tasks and method mentions (see here). This analysis involved the following steps:
- Surface AI publications in selected biology subfields using our AI-relevance predictions and field of study scores, supplemented with keyword and medical subject headings (MeSH) terms, as needed.
- Analyze the AI/ML tasks and methods mentioned in publications and other publication metadata, such as author affiliations.
- Find research clusters in our Map of Science that include publications surfaced in Step 1.
- Validate a set of research clusters for analysis focusing on those predicted to experience high growth and identified as relevant through manual review.
- Analyze the AI/ML tasks and methods most frequently mentioned in publications in those research clusters and other cluster metadata, such as key concepts.
We focused on AI/ML tasks and methods because they represent AI “use cases.” A research task is what the research claims to do (“molecular docking” or “prediction of protein solubility”) while each research method we identified is an AI/ML-enabled approach to a task (“clustering” or “convolutional neural networks”). We mapped raw mentions extracted from publication titles and abstracts to a set of canonical tasks and methods defined in a widely-used knowledge base. (Note that we summarize our results for tasks and methods separately, below, rather than for task-method pairs; they do not necessarily appear together in the same publications.)
As shown in Figure 1, across more than 29,000 AI and biology publications published since 2010, the most frequent research tasks mentioned were classification, diagnosis, and segmentation. The most frequent AI/ML methods mentioned were convolutional neural networks, deep learning, and machine learning.
Figure 1. Most Mentioned AI/ML Tasks and Methods Across AI + Biology Subfields, 2010-2024
Since 2010, the same tasks—classification, diagnosis, and segmentation—are consistently at the top. We see more fluctuation over time in the most frequently mentioned AI/ML methods, as shown in Figure 2. This makes sense, given rapid developments in AI/ML methods since 2010. The shifts that we observe align with trends in the field. Notably, convolutional neural networks went from being a scarcely mentioned method in 2010-2014 to topping the list after 2015. Deep learning experienced a similar rise over time. Meanwhile, classifiers and feature extraction were top methods in 2010–2014, but fell out of the top mentions after 2015.
Figure 2. Most Mentioned AI/ML Methods Across AI + Biology Subfields, By Time Period
One takeaway from this exploration is that AI methods have been applied to research tasks that, well, we’d expect them to be helpful with! The top-mentioned tasks are priorities and challenges for biological research. It makes sense that AI/ML methods are being deployed, and adapted as AI/ML methods evolve, to tackle these longstanding challenges.
- Classification: Separating biological data into groups. AI/ML enables more rapid classification in cases where classes are known or unknown.
- Diagnosis + Cancer Detection: Identifying or predicting disease status. AI-enabled diagnosis and detection can be faster and more sensitive.
- Segmentation + Image Processing: Breaking down and preparing images (e.g. medical scans, microscope images) for analysis. AI can speed up and streamline this traditionally time-intensive process.
In addition to the set of 29,000 AI and biology publications, we looked at research clusters that contain these publications in order to get a broader picture of how AI is being used in these biology research areas. The full set of publications is spread across thousands of research clusters. To filter down to a smaller set of clusters for analysis, we prioritized clusters with a high share of AI-relevant research, field-relevant key concepts, and high predicted growth (including extreme-growth prediction, described in more detail here and here). Our manual review resulted in roughly 100 clusters.
One interesting distinction we observed reviewing the smaller set of clusters was the prevalence of computer vision in some, while other clusters included a lot of AI research but not computer vision. This pattern reflects the two primary types of biological data: biological images like microscope images, histopathology slides, or medical scans, and non-imaging data like gene sequences, RNA expression, medical records, or measurements from assays and other experiments. Research areas with more computer vision publications tended to be those that rely more on imaging data, like Cell Biology and Inflammation. Meanwhile, Protein Structure Prediction is an example of an area where there is a lot of AI research, but not computer vision research.
Tables 1 and 2 provide details on some of the research clusters we identified. Table 1 highlights clusters that include high shares of AI and computer vision research. The top AI/ML tasks in these example clusters represent ways of processing images (segmentation, image recognition, processing, and analysis), characterizing images (classification), or extracting information from images (cell count, feature extraction, diagnosis). Some of these top tasks reflect fairly specific goals that align with the overall research cluster. For example, a top task being “Blood Cell Count” in Cluster 11064 (research on detecting blood diseases in microscope images) or “Polyp Segmentation” in Cluster 14173 (research on interpreting endoscopic images).
Table 1. Sample of AI and Biology Subfield Research Clusters with High CV Research
Table 2 highlights clusters with a high incidence of AI research but low computer vision research. The top AI/ML tasks in these example clusters represent approaches to their specific research goals, like single-cell RNA sequencing, drug discovery, protein function prediction, and cancer detection. (We still see a lot of classification as a general research task, however.)
Table 2. Sample of AI and Biology Subfield Research Clusters with Low CV Research
While we need to monitor risks and applaud major advances related to the convergence of AI and biotechnology, our exploration highlights the use of AI to address longstanding challenges and make incremental progress in biological research. AI helps find patterns, sort data into classes, screen for anomalies, and differentiate and break down complex images. As we noted previously, these applications align with a big-picture goal that was already important to scientists: sifting through large amounts of information to identify complex patterns and important variables. Researchers are taking advantage of these uses to advance biological science in many areas, for many purposes, at many universities and labs.