Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrheal illness in children less than 5 years of age in low- and middle-income nations, whereas it is an emerging enteric pathogen in industrialized nations. Despite being an important cause of diarrhea, little is known about the genomic composition of ETEC. To address this, we sequenced the genomes of five ETEC isolates obtained from children in Guinea-Bissau with diarrhea. These five isolates represent distinct and globally dominant ETEC clonal groups. Comparative genomic analyses utilizing a gene-independent whole-genome alignment method demonstrated that sequenced ETEC strains share approximately 2.7 million bases of genomic sequence. Phylogenetic analysis of this "core genome" confirmed the diverse history of the ETEC pathovar and provides a finer resolution of the E. coli relationships than multilocus sequence typing. No identified genomic regions were conserved exclusively in all ETEC genomes; however, we identified more genomic content conserved among ETEC genomes than among non-ETEC E. coli genomes, suggesting that ETEC isolates share a genomic core. Comparisons of known virulence and of surface-exposed and colonization factor genes across all sequenced ETEC genomes not only identified variability but also indicated that some antigens are restricted to the ETEC pathovar. Overall, the generation of these five genome sequences, in addition to the two previously generated ETEC genomes, highlights the genomic diversity of ETEC. These studies increase our understanding of ETEC evolution, as well as provide insight into virulence factors and conserved proteins, which may be targets for vaccine development.
ASJC Scopus subject areas
- Infectious Diseases