Scalable Integrative Analysis of Large Biobank-Scale Whole Genome Sequencing Studies With Functional Data
Whole Genome/Exome Sequencing (WGS/WES) data and Electronic Health Records (EHRs), such as large scale national and institutional biobanks, have emerged rapidly worldwide. In this talk, I will discuss the analytic tools and resources for scalable analysis of large scale biobank- and population-based Whole Genome Sequencing (WGS) association studies of common and rare variants by integrating WGS data with multi-faceted functional annotation data. Discussions include fitting mixed models for continuous and discrete and survival phenotypes using sparse GRM in population and biobank based studies, and rare variant association tests and meta-analysis by incorporating multi-faceted variant functional annotations including single-cell-based cell-specific annotations using individual level data and WGS summary statistics. I will also provide a demo of FAVOR (favor.genohub.org), a variant functional annotation online portal and resource that provides multi-faceted functional annotations of genome-wide 9 billion variants, and FAVORAnnotator, a tool to functionally annotate any WGS/WES studies. Cloud-based platforms for these resources will be discussed. The presentation will be illustrated using ongoing large scale population-based whole genome sequencing studies and biobanks of quantitative, case-control, and time-to-event phenotypes, including the Genome Sequencing Program (GSP) of the National Human Genome Research Institute and the Trans-Omics Precision Medicine Program (TOPMed) from the National Heart, Lung and Blood Institute, and the UK Biobank and FinnGen, which have been collectively sequencing about 1 million genomes.
Dr. Lin's research interests lie in the development and application of scalable statistical and machine learning methods for the analysis of massive data from the genome, exposome and phenome, including big and complex genetic and genomic, epidemiological and health data. Examples of her current research include analytic methods and applications for large scale Whole Genome Sequencing studies, biobanks and Electronic Health Records, techniques and tools for whole genome variant functional annotations, analysis of the interplay of genes and environment, multiple phenotype analysis, polygenic risk prediction and heritability estimation.
Location: Hock Plaza, Room #10089
Zoom link: https://bit.ly/3PhY9qp