Technical Highlight - June 2014
Short description: An algorithm identifies functional clusters of cell surface-anchored and secreted immunoglobulin superfamily proteins through the comparison of conserved regions and taking into account known protein–protein interaction data.
As more sequencing data become available, new tools to generate insights about protein function are sorely needed. Clustering algorithms that rely solely on pair-wise sequence similarity can miss functionally related proteins that share little sequence identity.
To generate predictions of functionally related families among the pharmacologically important immunoglobulin superfamily (IgSF) proteins, Fiser and colleagues (PSI NYSGRC) trained their algorithm to identify proteins that bind the same ligand in a similar manner. This algorithm, named PICTree, first compares sequence profile-based hidden Markov models, which amplifies signals from conserved regions that generally correlate with functional importance. In addition to improving clustering, the algorithm was calibrated on a dataset that included ligand-interaction data from the STRING protein interaction database. Further, the algorithm places more emphasis on sequence similarity within the N-terminal domain, which is frequently involved in ligand binding within cell-surface IgSF proteins.
Analysis of the 477 human cell-surface or secreted IgSF proteins resulted in the identification of 83 clusters with 2–34 members in each, and 87 singletons. Toward the researchers' goal of defining ligand interactions for all IgSF proteins, they predicted the function of a previously uncharacterized protein, VSIG8, in this initial analysis.
Of the five IgSF functional pairs in the training dataset that PICTree failed to identify, four required additional experimental information to ascertain if the pairs indeed share common binding modes. The authors also propose the incorporation of protein-specific binding site information in future versions of the algorithm: for example, in secreted IgSF proteins, unlike cell-surface ones, the binding site could lie outside the N-terminus.
For now, large-scale structural genomics efforts could benefit from information about functional families as well as single interactors that currently lack such information, in order to prioritize targets for experimental analysis.
E.H. Yap et al. Functional clustering of immunoglobulin superfamily proteins with protein-protein interaction information calibrated hidden Markov model sequence profiles.
J. Mol. Biol. 426, 945-961 (2014). doi:10.1016/j.jmb.2013.11.009