In September 2012, the Encyclopedia of DNA Elements (ENCODE) Project Consortium, a multi-institution collaboration that included the Broad Institute, capped off nine years of research with a flurry of papers that characterized proteins, enzymes, and other functional elements of the human genome. These elements, which were once dismissed as "junk DNA" because they were not among the protein-coding genes, are now thought to fulfill key functions, often regulating how and when genes are activated.
Because of their critical roles, such regulatory players are also suspected to contribute to a variety of diseases. However, pinpointing these roles by combing through the vast amount of existing data poses a mounting challenge. The ENCODE project alone has produced over 1,500 publicly available datasets. Additionally, genome-sequencing projects and genome-wide association studies (GWAS) have contributed troves of data on the human genetic code and the genetic variants suspected in disease. While all of the data generated to date comprises a valuable resource, the volume that researchers must sift through can, at the same time, be overwhelming.
A team led by Broad researcher Soumya Raychaudhuri has developed a method that may make it easier for researchers to take on this challenge. In a study published online in Nature Genetics on December 23, the group showed that genetic variations thought to be associated with disease tend to cluster near specific chromatin marks – biochemical modifications of the proteins that package DNA inside the cell. These chromatin marks can serve as a sort of genomic signpost, indicating that gene regulation may be taking place nearby. Raychaudhuri's team found that the clustering around these marks occurs most noticeably in cell types that are related to the disease. This suggests that a comparison of suspected genetic variants and the most useful chromatin marks could help researchers home in on the cell types and regulatory pathways involved in a given disease.
Initially inspired by the growing volume of "genetic hits" coming out of GWAS, Raychaudhuri's team set out to create a strategy that would help researchers refine their searches for the genetic culprits of disease. The team's goal was to find one chromatin mark assay that might replace several that would otherwise be needed to narrow the focus of genomic disease research.
"Our hope was that we would be able to take the genome hits coming out of GWAS and use the ENCODE data or other epigenetic data to find the best marks to look at," said Raychaudhuri, who is also assistant professor of medicine at Harvard Medical School and Brigham and Women's Hospital.
Previous studies had shown that genetic variants associated with disease affect the body in cell-specific ways. For instance, variants associated with high cholesterol might only affect liver cells. The researchers thought they might be able to use that cell-specificity to their advantage. They speculated that, if genetic variants were contributing to disease by affecting gene activity only in those cell types most relevant to the disease, then perhaps these variants could be mapped to the chromatin marks in those cell types.
Raychaudhuri's team tested this hypothesis by looking at four different diseases with known, associated SNPs (single-nucleotide variations in the genetic code that are suspected of playing a role in the disease). They mapped SNPs connected with LDL cholesterol, diabetes, rheumatoid arthritis, and psychiatric disease over the chromatin marks. They expected to see the SNPs cluster near the chromatin peaks in the cell types associated with their respective, associated diseases. That is exactly what they found.
SNPs associated with LDL cholesterol were localized in the liver, and rheumatoid arthritis-related SNPs were found near the chromatin marks in CD4+ regulatory T-cells, which are known to play a role in autoimmune disease. Only fourteen SNPs have been positively associated with psychiatric disease, but these, too, overlapped with the chromatin marks in the expected cell type – brain tissue. The team found the results for diabetes particularly encouraging: SNPs associated with the disease overlapped with two cell types – liver and pancreatic islet tissue – both of which have long been connected with diabetes.
Since there are dozens of chromatin marks that could be looked at in this sort of investigation (and new data from ENCODE and similar efforts may yield more), the team also aimed to identify which chromatin marks might be most informative. They found that a chromatin mark known as H3K4me3 consistently overlapped SNPs in the cell types associated with each disease.
The team hopes that these findings will help scientists focus their search when scouring the genome for the cause of disease.
"It can be hard to prioritize which variant to look at when there are many SNPs or whole regions associated with a disease," explained the paper's first author, Gosia Trynka, who is a postdoctoral scholar in Raychaudhuri's lab. "Our approach suggests that you can find a cell type that is most relevant and then follow up on those chromatin peaks within that cell type as a starting point for that research."
Raychaudhuri echoed this point, noting that, while identifying the specific cell types associated with disease can be useful in itself in terms of targeting therapeutics and diagnosing the onset of disease, it is only an early step in discovering the genetic pathways contributing to the disease.
"Our hope was that, through this approach, we can orient people to the right cell types to study, and orient them to the right epigenetic marks to look at in those cell types," Raychaudhuri said. "Our approach helps you figure out the regulatory element that's involved. Connecting that regulatory element to a specific gene is another, separate leap."
The team is currently using their chromatin mark method to search all of the available data for the cell types relevant to rheumatoid arthritis and other autoimmune diseases. Locating these cell types could help researchers target therapies for these conditions more effectively, and could help narrow the search for the pathways – and ultimately the genes – involved in these diseases.
Explore further: New approach to link genome-wide association signals to biological function
Raychaudhuri, et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nature Genetics. Online December 23, 2012. DOI: 10.1038/ng.2504.