TY - JOUR
T1 - Cavity-aware motifs reduce false positives in protein function prediction.
AU - Chen, Brian Y.
AU - Bryant, Drew H.
AU - Fofanov, Viacheslav Y.
AU - Kristensen, David M.
AU - Cruess, Amanda E.
AU - Kimmel, Marek
AU - Lichtarge, Olivier
AU - Kavraki, Lydia E.
PY - 2006
Y1 - 2006
N2 - Determining the function of proteins is a problem with immense practical impact on the identification of inhibition targets and the causes of side effects. Unfortunately, experimental determination of protein function is expensive and time consuming. For this reason, algorithms for computational function prediction have been developed to focus and accelerate this effort. These algorithms are comparison techniques which identify matches of geometric and chemical similarity between motifs, representing known functional sites, and substructures of functionally uncharacterized proteins (targets). Matches of statistically significant geometric and chemical similarity can identify targets with active sites cognate to the matching motif. Unfortunately statistically significant matches can include false positive matches to functionally unrelated proteins. We target this problem by presenting Cavity Aware Match Augmentation (CAMA), a technique which uses C-spheres to represent active clefts which must remain vacant for ligand binding. CAMA rejects matches to targets without similar binding volumes. On 18 sample motifs, we observed that introducing C-spheres eliminated 80% of false positive matches and maintained 87% of true positive matches found with identical motifs lacking C-spheres. Analyzing a range of C-sphere positions and sizes, we observed that some high-impact C- spheres eliminate more false positive matches than others. High-impact C-spheres can be detected with a geometric analysis we call Cavity Scaling, permitting us to refine our initial cavity-aware motifs to contain only high-impact C-spheres. In the absence of expert knowledge, Cavity Scaling can guide the design of cavity-aware motifs to eliminate many false positive matches.
AB - Determining the function of proteins is a problem with immense practical impact on the identification of inhibition targets and the causes of side effects. Unfortunately, experimental determination of protein function is expensive and time consuming. For this reason, algorithms for computational function prediction have been developed to focus and accelerate this effort. These algorithms are comparison techniques which identify matches of geometric and chemical similarity between motifs, representing known functional sites, and substructures of functionally uncharacterized proteins (targets). Matches of statistically significant geometric and chemical similarity can identify targets with active sites cognate to the matching motif. Unfortunately statistically significant matches can include false positive matches to functionally unrelated proteins. We target this problem by presenting Cavity Aware Match Augmentation (CAMA), a technique which uses C-spheres to represent active clefts which must remain vacant for ligand binding. CAMA rejects matches to targets without similar binding volumes. On 18 sample motifs, we observed that introducing C-spheres eliminated 80% of false positive matches and maintained 87% of true positive matches found with identical motifs lacking C-spheres. Analyzing a range of C-sphere positions and sizes, we observed that some high-impact C- spheres eliminate more false positive matches than others. High-impact C-spheres can be detected with a geometric analysis we call Cavity Scaling, permitting us to refine our initial cavity-aware motifs to contain only high-impact C-spheres. In the absence of expert knowledge, Cavity Scaling can guide the design of cavity-aware motifs to eliminate many false positive matches.
UR - http://www.scopus.com/inward/record.url?scp=34250860664&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34250860664&partnerID=8YFLogxK
U2 - 10.1142/9781860947575_0037
DO - 10.1142/9781860947575_0037
M3 - Article
C2 - 17369649
AN - SCOPUS:34250860664
SN - 1752-7791
SP - 311
EP - 323
JO - Computational systems bioinformatics / Life Sciences Society. Computational Systems Bioinformatics Conference
JF - Computational systems bioinformatics / Life Sciences Society. Computational Systems Bioinformatics Conference
ER -