TY - JOUR
T1 - Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin
AU - Bokulich, Nicholas A.
AU - Kaehler, Benjamin D.
AU - Rideout, Jai Ram
AU - Dillon, Matthew
AU - Bolyen, Evan
AU - Knight, Rob
AU - Huttley, Gavin A.
AU - Gregory Caporaso, J.
N1 - Funding Information:
This work was funded in part by National Science Foundation award 1565100 to JGC and RK, awards from the Alfred P. Sloan Foundation to JGC and RK, awards from the Partnership for Native American Cancer Prevention (NIH/NCI U54CA143924 and U54CA143925) to JGC, and National Health and Medical Research Council of Australia award APP1085372 to GAH, JGC and RK. These funding bodies had no role in the design of the study, the collection, analysis, or interpretation of data, or in writing the manuscript.
Publisher Copyright:
© 2018 The Author(s).
PY - 2018/5/17
Y1 - 2018/5/17
N2 - Background: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results: We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (https://github.com/caporaso-lab/tax-credit-data). Conclusions: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
AB - Background: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results: We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (https://github.com/caporaso-lab/tax-credit-data). Conclusions: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
UR - http://www.scopus.com/inward/record.url?scp=85052651259&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052651259&partnerID=8YFLogxK
U2 - 10.1186/s40168-018-0470-z
DO - 10.1186/s40168-018-0470-z
M3 - Article
C2 - 29773078
AN - SCOPUS:85052651259
SN - 2049-2618
VL - 6
JO - Microbiome
JF - Microbiome
IS - 1
M1 - 90
ER -