Supercomputer dramatically accelerates rapid genome analysis

February 19, 2014, University of Chicago Medical Center
Beagle, a Cray XE6 supercomputer at Argonne National Laboratory, supports computation, simulation and data analysis for the biomedical research community. Credit: Argonne National Laboratory

Although the time and cost of sequencing an entire human genome has plummeted, analyzing the resulting three billion base pairs of genetic information from a single genome can take many months.

In the journal Bioinformatics, however, a University of Chicago-based team—working with Beagle, one of the world's fastest supercomputers devoted to life sciences—reports that genome analysis can be radically accelerated. This computer, based at Argonne National Laboratory, is able to analyze 240 full genomes in about two days.

"This is a resource that can change patient management and, over time, add depth to our understanding of the genetic causes of risk and disease," said study author Elizabeth McNally, MD, PhD, the A. J. Carlson Professor of Medicine and Human Genetics and director of the Cardiovascular Genetics clinic at the University of Chicago Medicine.

"The can process many genomes simultaneously rather than one at a time," said first author Megan Puckelwartz, a graduate student in McNally's laboratory. "It converts whole , which has primarily been used as a research tool, into something that is immediately valuable for patient care."

Because the genome is so vast, those involved in clinical genetics have turned to exome sequencing, which focuses on the two percent or less of the genome that codes for proteins. This approach is often useful. An estimated 85 percent of disease-causing mutations are located in coding regions. But the rest, about 15 percent of clinically significant mutations, come from non-coding regions, once referred to as "junk DNA" but now known to serve important functions. If not for the tremendous data-processing challenges of analysis, would be the method of choice.

To test the system, McNally's team used raw sequencing data from 61 human genomes and analyzed that data on Beagle. They used publicly available software packages and one quarter of the computer's total capacity. They found that shifting to the supercomputer environment improved accuracy and dramatically accelerated speed.

"Improving analysis through both speed and accuracy reduces the price per genome," McNally said. "With this approach, the price for analyzing an entire genome is less than the cost of the looking at just a fraction of genome. New technology promises to bring the costs of sequencing down to around $1,000 per genome. Our goal is get the cost of analysis down into that range."

"This work vividly demonstrates the benefits of dedicating a powerful supercomputer resource to biomedical research," said co-author Ian Foster, director of the Computation Institute and Arthur Holly Compton Distinguished Service Professor of Computer Science. "The methods developed here will be instrumental in relieving the data analysis bottleneck that researchers face as genetic sequencing grows cheaper and faster."

The finding has immediate medical applications. McNally's Cardiovascular Genetics clinic, for example, relies on rigorous interrogation of the genes from an initial patient as well as multiple family members to understand, treat and prevent disease. More than 50 genes can contribute to cardiomyopathy. Other genes can trigger heart failure, rhythm disorders or vascular problems.

"We start genetic testing with the patient," she said, "but when we find a significant mutation we have to think about testing the whole family to identify individuals at risk."

The range of testable mutations has radically expanded. "In the early days we would test one to three genes," she said. "In 2007, we did our first five-gene panel. Now we order 50 to 70 genes at a time, which usually gets us an answer. At that point, it can be more useful and less expensive to sequence the whole ."

The information from these genomes combined with careful attention to patient and family histories "adds to our knowledge about these inherited disorders," McNally said. "It can refine the classification of these disorders," she said. "By paying close attention to family members with genes that place then at increased risk, but who do not yet show signs of disease, we can investigate early phases of a disorder. In this setting, each patient is a big-data problem."

Beagle, a Cray XE6 supercomputer housed in the Theory and Computing Sciences (TCS) building at Argonne National Laboratory, supports computation, simulation and data analysis for the biomedical research community. It is available for use by University of Chicago researchers, their collaborators and "other meritorious investigators." It was named after the HMS Beagle, the ship that carried Charles Darwin on his famous scientific voyage in 1831.

Explore further: Research team establishes benchmark set of human genotypes for sequencing

Related Stories

Research team establishes benchmark set of human genotypes for sequencing

February 18, 2014
Led by biomedical engineer Justin Zook of the National Institute of Standards and Technology, a team of scientists from Harvard University and the Virginia Bioinformatics Institute of Virginia Tech has presented new methods ...

Cheap genome tests to predict future illness? Don't hold your breath

January 20, 2014
Sydney's Garvan Institute is this week promoting its acquisition of an Illumina machine which it says can sequence the whole human genome for $1,000. The institute hopes genomic sequencing will become widely available in ...

Baylor, DNAnexus, Amazon Web Services collaboration enables largest-ever cloud-based analysis of genomic data

October 25, 2013
With their participation in the completion of the largest cloud-based analysis of genome sequence data, researchers from the Baylor College of Medicine Human Genome Sequencing Center are helping to usher genomic scientists ...

New method developed for ranking disease-causal mutations within whole genome sequences

February 7, 2014
Researchers from the University of Washington and the HudsonAlpha Institute for Biotechnology have developed a new method for organizing and prioritizing genetic data. The Combined Annotation–Dependent Depletion, or CADD, ...

Whole genome or exome sequencing: An individual insight

June 27, 2013
Focusing on parts rather than the whole, when it comes to genome sequencing, might be extremely useful, finds research in BioMed Central's open access journal Genome Medicine. The research compares several sequencing technologies ...

Recommended for you

Peers' genes may help friends stay in school, new study finds

January 18, 2018
While there's scientific evidence to suggest that your genes have something to do with how far you'll go in school, new research by a team from Stanford and elsewhere says the DNA of your classmates also plays a role.

Two new breast cancer genes emerge from Lynch syndrome gene study

January 18, 2018
Researchers at Columbia University Irving Medical Center and NewYork-Presbyterian have identified two new breast cancer genes. Having one of the genes—MSH6 and PMS2—approximately doubles a woman's risk of developing breast ...

A centuries-old math equation used to solve a modern-day genetics challenge

January 18, 2018
Researchers developed a new mathematical tool to validate and improve methods used by medical professionals to interpret results from clinical genetic tests. The work was published this month in Genetics in Medicine.

Can mice really mirror humans when it comes to cancer?

January 18, 2018
A new Michigan State University study is helping to answer a pressing question among scientists of just how close mice are to people when it comes to researching cancer.

Group recreates DNA of man who died in 1827 despite having no body to work with

January 16, 2018
An international team of researchers led by a group with deCODE Genetics, a biopharmaceutical company in Iceland, has partly recreated the DNA of a man who died in 1827, despite having no body to take tissue samples from. ...

Epigenetics study helps focus search for autism risk factors

January 16, 2018
Scientists have long tried to pin down the causes of autism spectrum disorder. Recent studies have expanded the search for genetic links from identifying genes toward epigenetics, the study of factors that control gene expression ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.