Executive Summary
There is widespread consensus that open and freely available AI models benefit research. Yet there is a lack of empirical evidence detailing how this relationship manifests. This report aims to fill this gap by investigating the use of open large language models (LLMs) in published research, overviewing what organizations and countries use them most frequently, and considering their wider impact on research. To this end, we identify and analyze more than 250 publications that use open models in ways that require access to model weights, and derive a taxonomy of use cases that openly available model weights exclusively or predominantly enable. We then review more than 130 publications that use closed models to compare use cases when model weights are and are not openly available.
Our analysis finds that open models enable a more diverse range of use cases than closed models. Of the eight high-level use cases for AI models we identified, five are exclusively enabled by access to model weights, two predominantly require weights, and one does not require weights. Those requiring weights include continuously pretraining models to expand their general knowledge, compressing models to improve their efficiency, combining different models or synchronizing their modalities (e.g., text and imagery), and measuring the functionality of models on hardware or the performance of hardware when running models.
Two use cases predominantly require access to weights: fine-tuning models for particular tasks or domains, and examining model internals to interpret their functionality. While some closed model application programming interfaces (API) allow for these use cases, the access offered is generally very limited and does not, for example, allow for customized fine-tuning or granular examination of model internals. These APIs are therefore generally less useful to researchers for these use cases, and most studies assessed in this report that conducted model fine-tuning or examination required access to model weights.
The final use case is prompting, which we define as any form of input-output probing. Prompting allows for the evaluation of model performance, capabilities, alignment, and safety, among other things, and requires only minimal access to a model through a web or programming interface, so it can be conducted on both open and closed models. In our sample of papers that used closed models, researchers engaged almost exclusively in model prompting.
These open model use cases allow researchers to investigate a wider range of questions, explore more avenues of experimentation, and implement and demonstrate a wider range of techniques than if they only had access to closed models. For example, researchers can custom fine-tune or continuously pretrain open models to study how a model’s performance or behavior changes with the introduction of new datasets and techniques, or examine open models to assess how their internal parameters and processes contribute to and influence model behaviors, which is an important enabler of AI interpretability and auditing. We note that some researchers may prefer to use closed models, especially for prompting, as state-of-the-art models tend to be closed, often come with convenient user interfaces and APIs, and do not require the user to download and run the model on custom computing infrastructure. Notwithstanding such factors, we find that access to open models can support advances in important areas of research beyond what is possible with closed models.
When it comes to the types of authors and organizations conducting research that use open models, we find that nearly 90% and 50% of the papers in our sample were produced by researchers at academic institutions and companies, respectively, with about 35% being written in collaboration by authors at these types of organizations. While open models can be beneficial to lower-resource academic organizations, the prevalence of academia in our sample is likely due to the fact they are more likely to publish their research. We also find that the majority of papers that use open models in our sample are produced by researchers at U.S. organizations (64%), followed by Chinese organizations (38%), which reflects broader trends in AI research output, as well as the predominance of English language research in our sample.