Recent articles on the dangers of using large language models highlight the need to balance discussion of risks with the huge potential benefits of LLMs for health and medicine. While there is a potential for serious risks—for example, that LLM-created systems might someday help untrained actors to manufacture new biological or chemical weapons—LLMs already perform tasks that are important to scientists, like identifying complex patterns and important variables within large amounts of information. Described below are three particularly notable areas in the life sciences where LLMs are catalyzing meaningful advances: drug discovery, genetics, and precision medicine.
Drug Discovery
A class of LLMs called chemical language models (CLMs) can help discover new therapies by using text-based representations of chemical structures to predict potential drug molecules that target specific disease-causing proteins. These models have already outperformed traditional drug discovery approaches in certain cases, and researchers can use pre-trained CLMs for their own projects with tools like Nvidia and AstraZeneca’s MegaMolBART.
LLMs aren’t just enabling breakthroughs in drug design for traditional small-molecule drugs. Researchers have also used LLMs to improve or design new antibodies, a type of immune molecule that is also used as a therapy for diseases like viral infections, cancers, and autoimmune disorders.
Genetics
While scientists understand more than ever about how discrete stretches of DNA—called genes—contribute to physical traits, the sheer size and complexity of the human genome (over 3 billion base pairs make up DNA) makes it difficult to connect every gene to its function.
LLMs trained on sequences of nucleotides—the A, C, T, and G bases that make up DNA—may be able to pinpoint patterns that are too subtle to recognize otherwise. For example, a team of more than two dozen collaborators including researchers from top U.S. research labs won a Gordon Bell Special Prize for COVID-19 Research for their model that tracked SARS-CoV-2 mutations and predicted possible variants of concern. Similar models could help to inform public health measures.
Precision Medicine
In clinical settings, LLMs may help to process and interpret digital versions of patient files, called electronic health records. When paired with an LLM that can identify health trends, the information in EHRs can help clinicians to choose diagnoses and treatments that are best suited for a patient’s unique needs (also known as precision or personalized medicine).
Companies like Google and Microsoft are developing LLM chatbot-type tools to help medical professionals answer clinical questions. However, these tools are still being evaluated for safety and accuracy to avoid harming patients.
Given LLMs’ positive impacts for biology, what will it take to power new applications? Since these LLMs rely on the same “AI triad” as other AI applications—algorithms, data, and computing power—unlocking the full potential of LLMs for biology and medicine will require targeted initiatives in each of these three areas:
- Algorithms: The next generation of algorithms for biological LLMs will be written by researchers with expertise in both computer science and biology—a gap that the U.S. can fill by attracting, training, and retaining interdisciplinary researchers.
- Data: Biological LLMs rely on private health information and data like genetic sequences, health records, and clinical samples. Future LLMs will need databases that represent real-world diversity and balance privacy concerns with the need for data. It’s important to remember that U.S. privacy policies do not apply to privately-held or international health databases.
- Computing power: Large, expensive computing sources are more accessible to private-sector developers with deep pockets than academic researchers. Future progress, particularly from academic institutions and research labs, is at risk without programs that drive innovation by providing low-cost access to biology-specific computing resources.
For more work by these authors, check out CSET’s Bio Research Topic.