TY - JOUR
T1 - Species abundance information improves sequence taxonomy classification accuracy
AU - Kaehler, Benjamin D.
AU - Bokulich, Nicholas A.
AU - McDonald, Daniel
AU - Knight, Rob
AU - Caporaso, J. Gregory
AU - Huttley, Gavin A.
N1 - Publisher Copyright:
© 2019, The Author(s).
PY - 2019/12/1
Y1 - 2019/12/1
N2 - Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments.
AB - Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments.
UR - http://www.scopus.com/inward/record.url?scp=85073157132&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073157132&partnerID=8YFLogxK
U2 - 10.1038/s41467-019-12669-6
DO - 10.1038/s41467-019-12669-6
M3 - Article
C2 - 31604942
AN - SCOPUS:85073157132
SN - 2041-1723
VL - 10
JO - Nature Communications
JF - Nature Communications
IS - 1
M1 - 4643
ER -