The impact of bioinformatic choices on Coccidioides variant identificationaccuracy

  • Marco Marchetti
  • , Emanuel M. Fonseca
  • , Kimberly E. Hanson
  • , Bridget Barker
  • , Katharine S. Walter

Research output: Contribution to journalArticlepeer-review

Abstract

Emerging fungal pathogens, such as Coccidioides, the causative agent of Valley fever, or coccidioidomycosis, pose significantclinical and public health challenges. While advances in genomic epidemiology have enhanced our understanding of Coccidioides evolutionary history, effectivevariant identificationis complicated by the genome's structural complexity. Repetitive elements, transposable sequences, and regions of low complexity can lead to incorrect variant calls, affectingdownstream analyses. Further, accurate species identificationis essential for understanding the spread of C. immitis and C. posadasii, which, despite having distinct primary geographic distributions, can co-occur. Distinguishing between these species is critical for interpreting patterns of transmission, emergence, and potential shifts in endemicity. To address this, we developed a pipeline to identify genetic variants and assign species directly from sequencing reads. We evaluated the performance of variant identificationboth across the genome and after excluding repetitive regions identifiedby NUCmer, a commonly used tool, on simulated genomic data and empirically generated sequence data. Whole-genome calling detected the highest number of single-nucleotide polymorphisms (SNPs), over 80,000 on average in both species, but included a substantial number of false positives, with 42,834 true positives and 38,115 false positives identified.Masking repetitive regions significantlyenhanced accuracy. In C. immitis, masking with NUCmer increased sensitivity from 70.1% to 91.7% and precision from 52.7% to 91.1%. Similarly, in C. posadasii, sensitivity improved from 80.0% to 96.1% and precision from 53.1% to 90.4%. These improvements were also reflectedin overall F1 scores, which rose from 60%–64% in whole-genome analysis to over 90% after masking. Using simulated reads, our pipeline recovered 83,400 SNPs in C. posadasii, with 40,163 shared across regions and a Jaccard index of 0.36. Species classificationwas highly accurate—100% in simulations and 98.9% in 175 publicly available samples. Here, we provide a benchmarked variant and species identificationpipeline for Coccidioides and quantify the impact of genomic region on variant identificationperformance, which may have downstream impacts on phylogenetic and genomic epidemiology inference. IMPORTANCE Accurate genetic analysis is essential for tracking and understanding emerging fungal pathogens like Coccidioides, the cause of Valley fever. However, the complex structure of fungal genomes makes it difficultto identify genetic differencesreliably. This study demonstrates that the choice of genomic regions has a substantial impact on variant detection accuracy. We developed and tested a new tool called cocci-call and found that focusing on specificregions of the genome dramatically improves the accuracy of genetic variant detection. This improvement could enhance how researchers monitor outbreaks, track fungal evolution, and design better diagnostics. By identifying high-confidenceregions for analysis, our work helps standardize how Coccidioides genomes are studied and compared, laying the groundwork for more accurate and reproducible genomic research in this important pathogen.

Original languageEnglish (US)
JournalMicrobiology spectrum
Volume13
Issue number10
DOIs
StatePublished - Oct 7 2025

Keywords

  • Coccidioides immitis
  • Coccidioides posadasii
  • fungal genomics
  • repetitive elements
  • species identification,variant calling pipeline

ASJC Scopus subject areas

  • Physiology
  • Ecology
  • Genetics
  • General Immunology and Microbiology
  • Cell Biology
  • Microbiology (medical)
  • Infectious Diseases

Fingerprint

Dive into the research topics of 'The impact of bioinformatic choices on Coccidioides variant identificationaccuracy'. Together they form a unique fingerprint.

Cite this