Nonparametric Language Models: Trading Data for Parameters (and Compute) in Large Language Models
YOU'RE INVITED!
Join us for a Triangle CS Distinguished Lecturer Series (TCSDLS) live broadcast from UNC! Snacks will be provided. Or attend via Zoom. TCSDLS is cosponsored by Duke CS, NC State CS, and UNC CS.
ABSTRACT
Large language models (LLMs) such as ChatGPT have taken the world by storm, but are incredibly expensive to train, requiring significant amount of data and computational resources. They also hallucinate, e.g. by regularly introducing made up facts, and are difficult to keep up to date over time, as the world around them changes. In this talk, I will survey some our recent work on non-parametric and retrieval-based language models, which are instead designed to be easily extensible and provide much more careful provenience for their predictions. The key idea is to trade parameters for data; rather than attempting to memorize all the worlds facts and knowledge in the learned parameters of a single monolithic LM, we instead provide the model an explicit knowledge store (e.g. a collection of web pages from Wikipedia) that can be used to look up information in real time. This is a relatively new research direction where best practices are still forming, but I will argue retrieval augmentation is a very general idea that can lead to much more efficient training, can provide fundamentally new insights into how LLMs work, and is broadly applicable to a range of settings, including e.g. models that do text-to-image generation. I will also provide, to the best of my ability, a guess about where things are going and what it would take to convince every major LLM to go non-parametric in the near future.
SPEAKER BIO:
Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Director at Meta. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms and models, introducing new tasks and datasets, and, most recently, studying how to best develop self-supervision signals for pre-training. His honors include being named an ACL Fellow, winning a PECASE award, an Allen Distinguished Investigator award, and multiple best paper awards. Read more: https://www.cs.washington.edu/people/faculty/lsz