Probabilistic methods for designing functional protein structures
The biochemical functions of proteins, such as catalyzing a chemical reaction or binding to a virus, are typically conferred by the geometry of only a handful of atoms. This arrangement of atoms, known as a motif, is structurally supported by the rest of the protein, referred to as a scaffold. A central task in protein design is to identify a diverse set of stabilizing scaffolds to support a motif known or theorized to confer function. This long-standing challenge is known as the motif-scaffolding problem.
In this talk, I describe a statistical approach I have developed to address the motif-scaffolding problem. My approach involves (1) estimating a distribution supported on realizable protein structures and (2) sampling scaffolds from this distribution conditioned on a motif. For step (1) I adapt diffusion generative models to fit example protein structures from nature. For step (2) I develop sequential monte carlo algorithms to sample from the conditional distributions of these models. I finally describe how, with experimental and computational collaborators, I have generalized and scaled this approach to generate and experimentally validate hundreds of proteins with various functional specifications.
Bio:
Brian Trippe is a postdoctoral fellow at Columbia University in the Department of Statistics, and a visiting researcher at the Institute for Protein Design at the University of Washington. He completed his Ph.D. in Computational and Systems Biology at the Massachusetts Institute of Technology where worked on Bayesian methods for inference in hierarchical linear models. In his research, Brian develops statistical machine learning methods to address challenges in biotechnology and medicine, with a focus on generative modeling and inference algorithms for protein engineering.