Data Science Approaches to Biological Systems
In the first part of my talk, I will present our work on disordered protein regions, which constitute over 40% of eukaryotic proteomes. Specifically, I will discuss how disordered regions can increase phenotypic complexity and contribute to diseases such as cancer. I will present IDR-Screen, which is a high throughput experimental approach to discover functional disordered regions from large libraries of sequences. I will emphasise how machine learning on these data can help us learn rules that make disordered regions functional. This can be exploited for synthetic biology and to interpret impact of mutations in disordered segments.
In the second part of my talk, I will discuss our work on G-protein coupled receptors, which regulate virtually every aspect of human physiology and are a major drug target. Specifically, I will discuss our work on activation and selectivity in the GPCR signalling system. I will then present our recent work integrating population-level data of millions of polymorphisms with atomic-level data of GPCR structures to investigate GPCR pharmacogenomics. I will highlight how mapping polymorphism data onto structures of molecular machines can provide mechanistic insights into biochemical and phenotypic variation in biological systems.