Technical Highlight - June 2012
Short description: FunTree integrates multiple kinds of data to address evolution of enzyme function in structurally defined superfamilies.
In order to predict functions of newly identified enzymes or design new functions, an understanding of enzyme function evolution is necessary. This can be achieved by integrating data from structural genomics projects with literature curation and selected predictions. To date, however, this has not been realized on a sufficiently broad or detailed scale.
Now work funded by the Wellcome Trust from the PSI MCSG has yielded a pipeline for such integration and a resource, FunTree, which currently comprises data for 276 superfamilies that represent over 2 million sequences from UniProtKB.
In the FunTree pipeline, Furnham and colleagues account for possible enzyme function changes resulting from alterations in single or multiple domains. . Their workflow considers domains within superfamilies as structurally similar groups (SSGs), as well as multi-domain architecture (MDA). Starting with curated domain structure from CATH and MDA data from CATH-Gene3D, FunTree integrates mechanistic data from MACiE, sequence, Enzyme Commission (EC) number and taxonomic data from UniProtKB, Catalytic Site Atlas data and others for phylogenetic analysis. A subsequent metabolite analysis integrates small molecule and reaction information from KEGG.
FunTree output includes a sequence diversity summary, a similarity tree of small molecules involved, and EC number distribution. One can select SSGs to organize superfamily data by structural similarity, or MDA groups to view by overall domain composition. The next level has detailed information for SSGs and MDAs, including phylogenetic trees annotated with links to sequence, structure and mechanism data, which can be navigated via the Google Maps API.
The authors' analysis of trends across the 276 superfamilies revealed that most have few SSGs and MDAs, with a few notable exceptions. Most superfamilies have one or a few catalytic functions, as defined by different EC numbers. Exceptions here include one with 223 functions.
In addition to gaining insight into functional evolution, Furnham and colleagues envision the ability to input new sequences into FunTree to determine functional space. For now, the expansion of FunTree is being planned as more data are added to CATH/CATH-Gene3D.
N. Furnham et al. FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies.
Nucleic Acids Res. 40, D776-D782 (2012). doi:10.1093/nar/gkr852
N Furnham et al. Exploring the Evolution of Novel Enzyme Functions within Structurally Defined Protein Superfamilies.
PLoS Comp. Bio. 8 (2012). doi:10.1371/journal.pcbi.1002403