Seminar Series: Learning Interpretable Representations Of Biological Data
Center for Statistical Genetics and Genomics Faculty Candidate Seminar
Abstract: The increasing ease of collecting genome-scale data has rapidly accelerated its use in all areas of biomedical science. Translating genome scale data in to testable hypothesis, on the other hand, is challenging and remains an active area method development. In this talk we present two machine learning approaches to deduce data representations that are inspired by a mechanistic understanding of the data generating process. First, we present a new constrained matrix decomposition approach that directly aligns a lower dimension representation with known biological pathways. Our method provides state-of-the-art accuracy in reconstructing known upstream variables through a biologically interpretable decomposition. Second, we present a new deep learning method for predicting enhancer promoter interactions (EPIs) from DNA sequence. Our method borrows the attention mechanism, widely used for language translation tasks, to explicitly model EPIs in terms of biochemical compatibility.