Following the release of ChatGPT earlier this year and related breakthroughs in large language models, or LLMs, CSET has received a lot of questions about LLMs and their implications. But we find that current questions and discussions tend to miss some basics about LLMs and how they work. We asked our NLP Engineer, James Dunham, to help us explain LLMs in plain English, and this is what we got… check it out!
What are LLMs?
Large language models, or LLMs, are systems that learn patterns in language from being exposed to a lot of it. The LLMs people are talking about today use their knowledge of language to produce new text on demand. A chatbot system like ChatGPT generates text in response to human prompts using an LLM.
How does it work?
If you ask ChatGPT “how do LLMs work?” It basically takes those words and predicts what’s likely to come next — applying its knowledge of which words tend to follow others learned through exposure to existing text in different contexts (e.g., “LLMs work by …”).
ChatGPT is based on an LLM (initially GPT-3.5) that OpenAI trained to participate in dialogue using a lot of data and a variety of novel methods. For instance, training involved showing the model both sides of example conversations. Humans also reviewed possible responses from the model and graded them. All of this provided data for improving how ChatGPT responds in conversation.
What happens under the hood?
The input text is first split up into words or word parts, called tokens, and a numerical representation of these are produced. So we started with natural language text, but now we have a lot of numbers that encode useful information, learned during training, about each word or word part in context. Then the model considers possible continuations of the text. It evaluates how plausible they seem based on its training. The model chooses the best next token from a set of plausible candidates, and it repeats this until it seems like the best thing would be to stop. All of this is based on the statistical patterns in text — the model of language — learned during training. But it can work astonishingly well, giving the appearance of encyclopedic knowledge and reasoning.
What other kinds of AI are there?
ChatGPT and its underlying LLM are examples of generative artificial intelligence, meaning that they generate content. Another well-known generative model released in 2022 is Stable Diffusion, which can create images on demand. Many other varieties and applications of artificial intelligence exist, from autonomous cyber defense and visual surveillance, to playing board games.
Other Questions?
If you have questions or would like to discuss research on these topics, reach out to CSET at cset@georgetown.edu . For another easy to read explanation of LLMs and generative AI, check out What Are Generative AI, Large Language Models, and Foundation Models? And AI Chatbots Are Doing Something a Lot Like Improv.