ENCODE project: Researchers unlock disease information hidden in genome's control circuitry

University of Washington genome scientist Dr. John Stamatoyannopoulos studies the control circuitry of the human genome. Credit: Clare McLean

Researchers at the University of Washington have determined that the majority of genetic changes associated with more than 400 common diseases and clinical traits affect the genome's regulatory circuitry. These are the regions of DNA that contain instructions dictating when and where genes are switched on or off. Most of these changes affect circuits that are active during early human development, when body tissues are most vulnerable.

By creating extensive blueprints of the control circuitry, the research also exposed previously hidden connections between different diseases. These connections may explain common clinical features, as well as offer a new approach for pinpointing the specific types of cells and tissues that either cause or are most affected by a particular disease. The findings provide a major for understanding the of disease, and open new avenues for development of diagnostics and treatments. The findings appear in the Sept. 5 online issue of Science.

"Genes occupy only a tiny fraction of the genome, and most efforts to map the genetic causes of disease were frustrated by signals that pointed away from genes. Now we know that these efforts were not in vain, and that the signals were in fact pointing to the genome's 'operating system'—the instructions for which are hidden in millions of locations around the genome," said Dr. John A. Stamatoyannopoulos, associate professor of and medicine at the UW. "The findings provide a new lens through which to view the role of genetics and genome function in disease."

The 's control circuitry is encoded in millions of regulatory regions—short that are scattered throughout the 98 percent of the genome that does not specify the protein product of a gene. Specialized proteins, called regulatory factors, recognize specific DNA sequences in these regulatory regions, thereby creating switches that turn genes on and off. In many cases, these switches are located far away from the genes that they control. These distances have made it difficult to determine the relationship between specific switches and genes.

The researchers used a special molecular probe called a nuclease to detect all of the regulatory regions active in each cell type they studied. The specific nuclease they used—called DNase I—snips the genome where regulatory factors are bound to DNA. By treating cells with DNase I and analyzing the pattern of snipped DNA sequences using massively parallel sequencing technology and high-performance computers, the researchers were able to create comprehensive maps of all the regulatory DNA in many different types of cells. These maps were then analyzed with advanced software algorithms to sort through the data and expose previously hidden connections between disease-associated genetic variation and specific regulatory regions.

The regulatory mapping and analysis was conducted on 349 cell and tissue samples. These included samples from all major organs as well as 233 tissue samples from different stages of early human development. In total, nearly 4 million distinct regulatory regions were discovered, though only about 200,000 of these were 'on' in any particular cell type.

To make a connection with and clinical traits, the researchers analyzed genetic variants that had been strongly associated with diseases and traits through so-called genome-wide association studies, which compare genetic information between groups of people with or without a particular disease or trait. During the past decade, hundreds of genome-wide association studies involving hundreds of thousands of patients worldwide have been performed for over 400 diseases and traits. Nearly 95 percent of the time, these studies flagged genetic variants that were located outside of gene protein-coding regions. Comparison of these data with the regulatory DNA blueprints yielded several key findings:

  • 76 percent of disease-associated variants in non-gene regions are actually located within or are tightly linked to regulatory DNA. This suggests that many diseases result from changes in when, where, and how genes are turned on rather than changes to the gene itself.
  • 88 percent of the regulatory regions that contained disease-associated DNA variants were active in early human development fetal development. Because many of these variants are associated with common diseases that occur in adults, the finding indicates that factors influencing the genome's regulatory circuitry early in development may impact the risk of developing particular diseases later in life.
  • DNA changes associated with specific diseases tend to occur in the specific short DNA codes recognized by regulatory proteins involved in physiological processes related to the disease or the organs or cells affected by the disease. For example, DNA variants associated with diabetes tend to occur in the codes recognized by regulatory proteins that control various aspects of sugar metabolism and insulin secretion. Similarly, variants associated with immune system disorders, such as multiple sclerosis, asthma, or lupus, are found in specific recognition codes for proteins that regulate immune system function.
  • Many seemingly unrelated diseases share common regulatory circuitry, including diseases that affect the immune system, different types of cancers, and a range of neuropsychiatric disorders.

The study also revealed a wealth of additional connections between genetic variants and disease that had been lurking within existing genome-wide association studies data. Viewing these data through the lens of regulatory DNA exposed thousands of variants that were highly selectively localized within regulatory DNA of disease-specific cell types. These variants had previously been ignored because the stringent selection criteria used in earlier studies did not take regulatory regions into account.

Another surprising finding was that the regulatory circuitry blueprints could be used to pinpoint cell types that play a role in specific diseases—without requiring any prior knowledge about how the disease worked. For example, genetic variants associated with Crohn's disease (a common type of inflammatory bowel disease) were found to be concentrated in the mapped in two specific subsets of immune cells—the same cell types that took decades of prior research to be linked with development of tCrohn's disease. Applying this approach systematically will enable researchers to identify cell types not previously known to play a role in a particular disease, expanding our understanding of the disease process and potentially leading to new therapies.

Related Stories

The importance of gene regulation for common human disease

Sep 16, 2007

A new study published in Nature Genetics on Sunday 16 September 2007 show that common, complex diseases are more likely to be due to genetic variation in regions that control activity of genes, rather than in the regions that s ...

Epigenomic findings illuminate veiled variants

Mar 23, 2011

Genes make up only a tiny percentage of the human genome. The rest, which has remained measurable but mysterious, may hold vital clues about the genetic origins of disease. Using a new mapping strategy, a collaborative team ...

Recommended for you

Gene research targets scarring process

Jul 28, 2014

Scientists have identified three genes that may be the key to preventing scar formation after burn injury, and even healing existing scars.

Researchers find new mechanism for neurodegeneration

Jul 24, 2014

A research team led by Jackson Laboratory Professor and Howard Hughes Investigator Susan Ackerman, Ph.D., have pinpointed a surprising mechanism behind neurodegeneration in mice, one that involves a defect in a key component ...

User comments

Adjust slider to filter visible comments by rank

Display comments: newest first

not rated yet Sep 05, 2012
In the next few years, we will be able to buy a map of our own whole genome, but this, I guess, opens up the market for the epigenome map. By now, I see nobody selling the DNA-protein epigenomic information directly to a consumer, similar to what 23andMe does with DNA information.