TY - JOUR
T1 - Prediction of enzyme function based on 3D templates of evolutionarily important amino acids
AU - Kristensen, David M.
AU - Matthew, R. Matthew
AU - Lisewski, Andreas
AU - Erdin, Serkan
AU - Chen, Brian Y.
AU - Fofanov, Viacheslav Y.
AU - Kimmel, Marek
AU - Kavraki, Lydia E.
AU - Lichtarge, Olivier
N1 - Funding Information:
This work was supported in part by NSF DBI-0547695 (OL and LK) and by NIH GM066099, GM079656, and the March of Dimes MOD FY06-371 (OL), and by a Sloan Fellowship and the Brown School of Engineering at Rice University (LK). This work was also supported by training fellowships from the Keck Center for Interdisciplinary Bioscience Training from the W.M. Keck Foundation (AML) and NLM Grant No. 5T15LM07093 (RMW, DMK, BYC), from the NIH institutional postdoctoral fellowship program in medical genetics NIH 5 T32 GM07526-29 (SE), and from the VIGRE Training in Bioinformatics Grant NSF DMS 0240058 (VYF).
PY - 2008/1/11
Y1 - 2008/1/11
N2 - Background: Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates - structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates. Results: Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable. Conclusion: These data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome.
AB - Background: Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates - structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates. Results: Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable. Conclusion: These data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome.
UR - http://www.scopus.com/inward/record.url?scp=38849090164&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38849090164&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-9-17
DO - 10.1186/1471-2105-9-17
M3 - Article
C2 - 18190718
AN - SCOPUS:38849090164
SN - 1471-2105
VL - 9
JO - BMC Bioinformatics
JF - BMC Bioinformatics
M1 - 17
ER -