Many diseases, such as cancer, diabetes, and schizophrenia, tend to be passed down through families. After researchers sequenced the human genome about 15 years ago, they had high hopes that this trove of information would reveal the genes that underlie these strongly heritable diseases.
However, around 2010, scientists began to realize that this wasn't panning out. For one, there just weren't enough patients: In order to unearth a statistically significant genetic marker, researchers would need groups of patients much larger than what they had been able to assemble so far. Furthermore, many of the variants that these studies turned up were found outside the regions of DNA that encode proteins, making it much more difficult to figure out how they might cause disease.
A new study from MIT addresses both of those problems. By combining information on gene-disease associations with maps of chemical modifications known as epigenomic marks, which control what genes are turned on, the researchers were able to identify additional genetic contributors to a heritable cardiac disorder that makes people more susceptible to heart failure.
"This approach overcomes a major hurdle in the human genetics field and addresses an important question surrounding the hidden heritability of many complex traits," says Laurie Boyer, the Irwin and Helen Sizer Career Development Associate Professor of Biology and Biological Engineering at MIT and one of the senior authors of the study.
This strategy could also shed light on many other inherited diseases, the researchers say.
"The exciting part is that we've applied this to one trait in one tissue, but we can apply this now to basically every disease," says Xinchen Wang, an MIT graduate student and the paper's lead author. "The new direction for us now is to target some of the bigger diseases like cholesterol-related heart disease and Alzheimer's."
Manolis Kellis, a professor of computer science and a member of MIT's Computer Science and Artificial Intelligence Laboratory and of the Broad Institute, is also a senior author of the paper, which appears in the May 10 issue of the journal eLife.
Since the human genome project was completed, scientists have compared the genetic make-up of thousands of people, in search of genetic differences associated with particular diseases. These studies, known as genome-wide association studies (GWAS), have revealed genetic markers linked with type 2 diabetes, Parkinson's disease, obesity, and Crohn's disease, among others.
However, in order for a variant to be considered significant, it must meet stringent statistical criteria based on how frequently it appears in patients and how much of an effect it has on the disease. Until now, the only way to yield more significant "hits" for a given variant would be to double or triple the number of people in the studies, which is difficult and expensive.
The MIT team took an alternative approach, which was to try to identify variants that don't occur often enough to reach genome-wide significance in the smaller studies but still have an impact on a particular disease.
"Below this genome-wide significance threshold lies a large number of markers that perhaps we should be paying attention to," Kellis says. "If we can successfully prioritize new disease genes in these subthreshold loci, we can have a head start in developing new therapeutics that target these genes."
To test the usefulness of this strategy, the researchers focused on a cardiac trait known as the QT interval, which is a measure of how long it takes for electrical impulses to flow through the heart as it contracts. Variations in this interval are a risk factor for arrhythmia and heart failure, which is one of the leading causes of death in the United States.
Genome-wide association studies had already yielded about 60 genetic markers linked with variations in QT interval length. The MIT team created a computer algorithm that first analyzes these known markers to discover common epigenomic properties among them, and then uses these properties to pick out subthreshold genetic markers with similar properties that make these markers likely contributors to the disease trait.
This analysis revealed that many of the known, significant genetic variants were located in parts of the genome known as enhancers, which control gene activity from a distance. Enhancers where these variants were found were also active specifically in heart tissue, tended to be located in DNA regions that are more likely to be regulatory, and were found in regions that are similar across primate species.
The researchers then analyzed the variants that were only weakly associated with QT interval and found approximately 60 additional locations that shared most of these properties, potentially doubling the number of candidate regions previously identified using genetic evidence alone.
Next, the researchers sought to predict the target genes that these genetic variants affect. To do so, they analyzed models of the three-dimensional structure of chromosomes to predict the long-distance contacts between enhancer regions harboring subthreshold variants and their potential target genes. They selected about two dozen of those genes for further study, and from their own experiments combined with an analysis of previous gene knockout studies, they found that many of the predicted new target genes did have an effect on the heart's ability to conduct electrical impulses.
"This is the smoking gun we were looking for," Kellis says. "We now have genetic evidence from humans, epigenomic evidence from heart cells, and experimental data from mice, together showing that the genetic differences in subthreshold enhancers influence heart function."
Boyer's lab now plans to apply this approach to learning more about congenital heart defects.
"We know very little about the genetic etiology of congenital heart defects. Every 15 minutes a baby is born with a congenital heart defect, and it's a devastating set of defects," she says. "We could now go back to some of these genomic and epigenomic studies to improve our understanding of the biology of these different defects."
This approach developed by the MIT team is general and should allow researchers working on many traits to identify genetic markers that are invisible when using genome-wide association studies alone. This can speed up the development of new therapies, especially for rare diseases, where gathering sufficiently large groups of patients can be very difficult and sometimes impossible.
"Instead of waiting for years until subthreshold variants are elucidated with genetics, we can skip ahead and begin characterizing the prioritized regions and genes immediately," Boyer says.
"We expect that an expanded set of candidate drug targets can shorten the path to new therapeutics by decades for many devastating disorders, and help translate these insights into tangible improvements in human health," Kellis says.
The research was funded by the National Institutes of Health and the National Health, Lung, and Blood Institute Bench to Bassinet Program.
Explore further: Genome studies can help identify lifestyle risks for diseases
Xinchen Wang et al. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures, eLife (2016). DOI: 10.7554/eLife.10557