2011 brought two of the deadliest bacterial outbreaks the world has seen during the last 25 years. The two epidemics accounted for more than 4,200 cases of infectious disease and 80 deaths. Software developed at Georgia Tech was used to help characterize the bacteria that caused each outbreak. This helps scientists to better understand the underlying microbiologic features of the disease-causing organisms and shows promise for supporting faster and more efficient outbreak investigations in the future.

From 2008 to 2010, a team of bioinformatics graduate students, led by School of Biology Associate Professor King Jordan, worked in close collaboration with the (CDC) to create an integrated suite of for the analysis of microbial genome sequences.  At that time, CDC scientists were in need of a fast and accurate system that could automate the analysis of sequenced genomes from disease-causing . They turned to the Jordan lab at Georgia Tech to help develop such a tool. The Georgia Tech scientists created an open source package, the Computational Genomics Pipeline (CG-pipeline), to help meet CDC’s need. The software platform is now used worldwide in public health research and response efforts.

“Determining the order of DNA bases for an entire genome has become relatively cheap and easy in recent years because of technological advancements,” said Jordan. “The hard part is figuring out what the information means. Our software takes that next step. It analyzes the sequences, finds the genes and provides clues as to which genes are involved in making people sick. Manually, this process used to take weeks, months or a year. Now it takes us about 24 hours.”

The CG-pipeline software has been used to analyze last summer’s outbreak of  severe Escherichia coli (E. coli) infections that started in Germany and eventually led to illnesses in 16 European countries, Canada and the United States. It was one of the largest E. coli outbreaks in history, causing 50 deaths and 4,075 confirmed worldwide cases. The bacterium was traced to sprouts. Andrey Kislyuk, a graduate of the Bioinformatics Ph.D. program who helped Jordan create the software, used the CG-pipeline while working at Pacific Biosciences to understand why the strain of the bacteria that caused the outbreak was so virulent.

“The software was used to determine that genetic material from two previously distinct strains of E. coli  was combined in a new, hyper-virulent strain,” said Kislyuk. “The resulting hybrid strain seems to be more lethal than either of the parent strains.”

Another Bioinformatics Ph.D. graduate who helped design and implement the pipeline, Lee Katz, analyzed the bacteria that caused last year’s outbreak of listeriosis in the United States while working at the CDC.  That outbreak was traced back to cantaloupes from a single farm in Colorado that were tainted with Listeria. Over the span of several months, there were 146 confirmed cases of listeriosis and 30 deaths, making it the deadliest outbreak of foodborne illness in the U.S. in 25 years. Using the CG-pipeline, Katz was able to identify an important epidemiological genomic marker, which will help track invasive strains of Listeria.

The CG-pipeline software platform can be used to analyze any microbial genome sequence. It has already been applied to bacteria that cause a variety of , including cholera, salmonella and bacterial meningitis.

Katz continues to work closely with the Jordan lab to improve the software. This collaboration is important in ’s efforts to mine genome sequence information in the service of public health using software developed at Georgia Tech.