PopPUNK advances speed of bacterial pathogen surveillance
Differences in genetic diversity among bacterial pathogens correlate with clinically important factors, such as virulence and antimicrobial resistance, prompting the need to identify clusters of similar bacterial strains. However, current bacterial clustering and typing approaches are not suitable for real-time pathogen surveillance and outbreak detection.
In a study published today in Genome Research, researchers developed PopPUNK (Population Partitioning Using Nucleotide K-mers), a computational tool for analyzing tens of thousands of bacterial genomes in a single run, up to 200-fold faster than previous methods. Using k-mers, short sections of DNA length k, this software enables rapid estimation of the proportion of k-mers present in one genome that are also shared by another. Differences in k-mer content between genomes may represent changes to individual bases in otherwise similar stretches of DNA or differences in gene content. By calculating these relationships across isolates, the population structure of a species can be efficiently estimated.
Importantly, PopPUNK applies a machine learning method that enables easy identification of emerging strains in a population. Using a previously published data set of E. coli isolates collected over a ten-year study, PopPUNK was able to efficiently classify the prevalence of different strains in the population each year and identify the emergence of antibiotic-resistance strains over time.
Researchers envision PopPUNK will expedite the identification of bacterial strains as the scale of bacterial genomes being sequenced increases and, importantly, allow public health agencies to quickly identify outbreak strains that pose a public health risk.