题 目：Dissect global dynamics of protein interactome and gene regulation across human populations and in disease
Professor, Department of Biological Statistics and Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, 335 Weill Hall, Ithaca, NY 14853 USA
Protein-protein interactions facilitate much of known cellular function. While simply knowing which proteins interact with each other provides valuable information to spur functional studies, far more specific hypotheses can be tested if the spatial contacts of interacting proteins are known. However, co-crystal structures and homology models cover only ~10% of all known human interactions. To solve this issue, we developed ECLAIR (Ensemble Classifier Learning Algorithm to predict Interface Residues), a unified machine learning framework that we used to create the first multi-scale whole-proteome 3D structural interactome in human for all experimentally-determined binary interactions reported in major databases (Nature Methods, 2018). Finally, we demonstrate that our 3D interactome approach offers a generalizable interactome-based framework for prioritizing missense mutations that contribute risk to human disease and for understanding their mechanisms at the molecular level by analyzing 2,821 de novo missense mutations identified from whole-exome-sequencing of ~2,500 families from the Simons Simplex Collection. We find that Interaction-disrupting de novo missense mutations are more common in autism probands, these mutations principally affect hub proteins, and importantly, protein harboring these mutations in male probands are significantly more likely to be known ASD genes, confirming the effectiveness of our framework (Nature Genetics, 2018).
Each human genome carries tens of thousands of coding variants. The extent to which this variation is functional and the mechanisms by which they exert their influence remains largely unexplored. To address this gap, we leveraged the ExAC database of 60,706 human exomes to investigate experimentally the impact of 2,009 missense single nucleotide variants (SNVs) across 2,185 protein-protein interactions, generating interaction profiles for 4,797 SNV-interaction pairs, of which 421 SNVs segregate at > 1% allele frequency in human populations. Surprisingly, we find that interaction-disruptive SNVs are prevalent at both rare and common allele frequencies. Furthermore, these results suggest that 10.5% of missense variants carried per individual are disruptive, a higher proportion than previously reported; this indicates that each individual’s genetic makeup may be significantly more complex than expected. Finally, we demonstrate that candidate disease-associated mutations can be identified through shared interaction perturbations between variants of interest and known disease mutations (Nature Communications, in press).
Distal enhancer elements remain one of the least understood genomic entities despite decades of research demonstrating their pivotal roles in development and disease. Recently, we developed the PRO-cap assay that is capable of detecting transcription start sites (TSSs) genome wide with at least an order of magnitude higher sensitivity than other assays. Since numerous chromatin features have been proposed to mark distal enhancer elements, we performed systematic and functional comparisons of enhancer predictions using our improved eSTARR-seq assay. Our results indicate gene-distal divergent TSSs detected by PRO-cap are a robust predictor of enhancer activity, with higher specificity than histone modifications. We propose a model of regulatory elements defined by divergent TSS boundaries, validate that these boundaries are necessary and sufficient to capture enhancer activities genome wide.
Professor Haiyuan Yu got his B.S. in Biophysics from Peking University in 2000. He got his Ph.D. in Computational Biology and Bioinformatics at Yale University in 2006. He did his post doctor in Biomedical Systems Biology at Harvard Medical School (2006-2009). From 2019, he is a full Professor in Department of Biological Statistics and Computational Biology and Weill Institute for Cell and Molecular Biology at Cornell University
His lab perform research in the broad area of Network Systems Biology with both high-throughput experimental (see Vo et al., Cell 2016) and integrative computational (see Wang et al., Nature Biotechnology 2012) methodologies, aiming to understand gene functions and their relationships within complex molecular networks and how perturbations to such systems may lead to various human diseases. The complexity of biological systems calls for building experimentally-verified computational models based on high-quality large-scale datasets, which is truly the future of biomedical research and the main theme of the lab. Their research is focused in five main areas (http://www.yulab.org/) :
1) Functional and Comparative Genomics
2) Molecular and Dynamic Proteomics
3) Structural Genomics and Simulations
4) Algorithms and Tools
5) Technology Development