RBPzoo: The Motifs, Binding Interface and Evolution History of 30,000 Eukaryotic RNA-binding Proteins (RBPs)
RNA binding proteins (RBPs) are key regulators of gene expression. I will describe a new resource, the RBPzoo, which contains RNA motifs derived from in vitro selection data for 381 eukaryotic RBPs. I will also introduce a new machine learning algorithm, Joint Protein-Ligand Embedding (JPLE), which we trained on RBPzoo to learn a mapping from an RBP's sequence to its RNA motif. We used JPLE to assign RNA motifs and the specificity-determining residues to protein sequences for nearly 30,000 RBPs from 690 eukaryotes. To illustrate the power of this resource, we identify 12 Arabidopsis thaliana mRNA-stability-affecting RBPs based only on their amino acid sequence and tissue-specific RNA-seq for A. thaliana. We also reconstruct the evolution of >2,500 groups of RBP motifs, finding that 19 motifs representing core RNA processing functions are deeply conserved between plants and metazoa. Finally, we identify three phases of rapid motif expansion: one corresponding with two whole-genome-duplications in the vertebrate lineage and two recent expansions in the flowering plants and nematode lineages.