2013.9.27 Prediction of protein-protein interaction at residue level
Li Liao, Ph.D.
Computer and Information Sciences
University of Delaware, Newark, DE, USA
Accurate identification of protein interaction sites is challenging both experimentally and computationally, and yet is essential to better understanding protein functions and interactions and their roles in cell biology and common human diseases, and can help facilitate the development of therapeutic treatments. We have developed computational methods to accurately predict protein interaction at the residue level. First, a new decoding algorithm, ETBViterbi, was developed with early trace back mechanism built into interaction profile hidden Markov models (ipHMMs) in order to incorporate the long-distance correlations between interacting residues, sometimes separated by dozens of amino acids in the primary structure of a protein. It is shown that this can improve interacting residue prediction accuracy up to 12%. Second, we developed a supervised learning method to further predict which residues in one protein interact with which residues in another protein, namely, the contact matrix. Each residue position in an interacting domain is represented as a 20-dimension vector of Fisher scores calculated from the ipHMM, characterizing how similar it is as compared to the domain family profile at that position. Each element of the contact matrix for a sequence-pair is now represented by a feature vector concatenating the vectors of the two corresponding residues. A support vector machine is then trained to predict if the two residues actually interact with each other. Cross-validation on a benchmark dataset shows that the prediction accuracy, measured as ROC score, reaches 0.91 as compared to 0.78 from using a previous method based multiple sequence alignment.