The MASH pipeline for protein function prediction and an algorithm for the geometric refinement of 3D motifs

Brian Y. Chen, Viacheslav Y. Fofanov, Drew H. Bryant, Bradley D. Dodson, David M. Kristensen, Andreas M. Lisewski, Marek Kimmel, Olivier Lichtarge, Lydia E. Kavraki

Research output: Contribution to journalArticlepeer-review

38 Scopus citations


The development of new and effective drugs is strongly affected by the need to identify drug targets and to reduce side effects. Resolving these issues depends partially on a thorough understanding of the biological function of proteins. Unfortunately, the experimental determination of protein function is expensive and time consuming. To support and accelerate the determination of protein functions, algorithms for function prediction are designed to gather evidence indicating functional similarity with well studied proteins. One such approach is the MASH pipeline, described in the first half of this paper. MASH identifies matches of geometric and chemical similarity between motifs, representing known functional sites, and substructures of functionally uncharacterized proteins (targets). Observations from several research groups concur that statistically significant matches can indicate functionally related active sites. One major subproblem is the design of effective motifs, which have many matches to functionally related targets (sensitive motifs), and few matches to functionally unrelated targets (specific motifs). Current techniques select and combine structural, physical, and evolutionary properties to generate motifs that mirror functional characteristics in active sites. This approach ignores incidental similarities that may occur with functionally unrelated proteins. To address this problem, we have developed Geometric Sieving (GS), a parallel distributed algorithm that efficiently refines motifs, designed by existing methods, into optimized motifs with maximal geometric and chemical dissimilarity from all known protein structures. In exhaustive comparison of all possible motifs based on the active sites of 10 well-studied proteins, we observed that optimized motifs were among the most sensitive and specific.

Original languageEnglish (US)
Pages (from-to)791-816
Number of pages26
JournalJournal of Computational Biology
Issue number6
StatePublished - Jul 2007
Externally publishedYes


  • Functional annotation of proteins
  • Motif design
  • Motif optimization
  • Pattern matching
  • Protein function prediction

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics


Dive into the research topics of 'The MASH pipeline for protein function prediction and an algorithm for the geometric refinement of 3D motifs'. Together they form a unique fingerprint.

Cite this