TY - GEN
T1 - A statistical model to correct systematic bias introduced by algorithmic thresholds in protein structural comparison algorithms
AU - Fofanov, Viacheslav Y.
AU - Chen, Brian Y.
AU - Bryant, Drew H.
AU - Moll, Mark
AU - Lichtarge, Olivier
AU - Kavraki, Lydia
AU - Kimmel, Marek
PY - 2008
Y1 - 2008
N2 - The identification of protein function is crucial to understanding cellular processes and selecting novel proteins as drug targets. However, experimental methods for determining protein function can be expensive and time-consuming. Protein partial structure comparison methods seek to guide and accelerate the process of function determination by matching characterized functional site representations, motifs, to substructures within uncharacterized proteins, matches. One common difficulty of all protein structural comparison techniques is the computational cost of obtaining a match. In an effort to maintain practical efficiency, some algorithms employ efficient geometric threshold-based searches to eliminate biologically irrelevant matches. Thresholds refine and accelerate the method by limiting the number of potential matches that need to be considered. However, because statistical models rely on the output of the geometric matching method to accurately measure statistical significance, geometric thresholds can also artificially distort the basis of statistical models, making statistical scores dependant on geometric thresholds and potentially causing significant reductions in accuracy of the functional annotation method. This paper proposes a point-weight based correction approach to quantify and model the dependence of statistical scores to account for the systematic bias introduced by heuristics. Using a benchmark dataset of 20 structural motifs, we show that the point-weight correction procedure accurately models the information lost during the geometric comparison phase, removing systematic bias and greatly reducing misclassification rates of functionally related proteins, while maintaining specificity.
AB - The identification of protein function is crucial to understanding cellular processes and selecting novel proteins as drug targets. However, experimental methods for determining protein function can be expensive and time-consuming. Protein partial structure comparison methods seek to guide and accelerate the process of function determination by matching characterized functional site representations, motifs, to substructures within uncharacterized proteins, matches. One common difficulty of all protein structural comparison techniques is the computational cost of obtaining a match. In an effort to maintain practical efficiency, some algorithms employ efficient geometric threshold-based searches to eliminate biologically irrelevant matches. Thresholds refine and accelerate the method by limiting the number of potential matches that need to be considered. However, because statistical models rely on the output of the geometric matching method to accurately measure statistical significance, geometric thresholds can also artificially distort the basis of statistical models, making statistical scores dependant on geometric thresholds and potentially causing significant reductions in accuracy of the functional annotation method. This paper proposes a point-weight based correction approach to quantify and model the dependence of statistical scores to account for the systematic bias introduced by heuristics. Using a benchmark dataset of 20 structural motifs, we show that the point-weight correction procedure accurately models the information lost during the geometric comparison phase, removing systematic bias and greatly reducing misclassification rates of functionally related proteins, while maintaining specificity.
UR - http://www.scopus.com/inward/record.url?scp=58049175626&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=58049175626&partnerID=8YFLogxK
U2 - 10.1109/BIBMW.2008.4686202
DO - 10.1109/BIBMW.2008.4686202
M3 - Conference contribution
AN - SCOPUS:58049175626
SN - 9781424428908
T3 - Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW
SP - 1
EP - 8
BT - Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW
T2 - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW
Y2 - 3 November 2008 through 5 November 2008
ER -