AI accurately predicts effects of genetic mutations in biological dark matter

July 16, 2018, Simons Foundation
Researchers created a new deep learning framework called ExPecto to predict the effect of genetic mutations across different tissue systems in the human body. The chart above shows the results for the expression of immunity-related GTPase family M (IRGM), a gene involved in the immune response associated with Crohn's disease, ulcerative colitis and general inflammatory bowel disease. A subset of possible mutations is listed horizontally while tissue systems are listed vertically. Red indicates mutations that ExPecto predicts signal an increase in the expression of IRGM in a given system, while blue represents an expected decrease in expression. Credit: HumanBase

A new machine learning framework, dubbed ExPecto, can predict the effects of genetic mutations in the so-called "dark matter" regions of the human genome. ExPecto pinpoints how specific mutations can disrupt the way genes turn on and off throughout your body. Such disruptions in gene expression can sometimes have fatal consequences.

Using the method, its creators at the Flatiron Institute's Center for Computational Biology (CCB) in New York City and at Princeton University computed the genetic ramifications of more than 140 million in different tissues. The researchers also precisely pinpointed mutations potentially responsible for increasing the risk of several immune-related diseases, including chronic hepatitis B virus (HBV) infection and Crohn's .

ExPecto could one day aid in the selection of drug therapies and help illuminate how evolution shaped our genetic code, the researchers report in a study published on July 16 in Nature Genetics.

"ExPecto can examine any genetic variant and predict its effect on ," says principal investigator Olga Troyanskaya, deputy director of genomics at CCB and a professor at Princeton. "That's incredibly exciting."

Your DNA contains genes that serve as blueprints for building proteins, the workhorse molecules of our bodies responsible for carrying out important tasks such as ferrying oxygen, communicating with other cells and fighting infections. Protein-coding sequences make up less than two percent of your whole genome. All of these genes are present in cells throughout your body. This ubiquity means that protein-encoding genes vital to brain function, for instance, also exist in your digestive tract, lying dormant.

Genes are switched on and off by the other 98 percent of your genome, the "" portion that doesn't code for proteins. Most are found in this noncoding region. A mutation is essentially a genetic typo—an addition, deletion or alteration in the genomic sequence. Mutations in the noncoding region can sometimes cause genes to express or not express in the wrong part of your body at the wrong time, increasing the risk of diseases such as cancer.

Identifying the specific mutation responsible is difficult because the noncoding portion of DNA is so large. Previous studies compared the genomes of many individuals with a given disease, searching for mutations the individuals had in common. This approach, however, becomes increasingly tricky for rarer mutations. Furthermore, strings of DNA are sometimes inherited in large clusters, so scientists struggle to pinpoint which particular piece of genetic code is the troublemaker.

The study authors took a different approach. They developed ExPecto (named after the Patronus charm from the Harry Potter series) as a program that can read a raw sequence of DNA and predict the corresponding effect on gene expression.

ExPecto harnesses deep learning methods from artificial intelligence. Using a single reference genome, the researchers trained the program to understand how DNA controls gene expression across more than 200 different tissues and cell types. From this information, ExPecto can predict the effect of any mutation, even mutations that scientists have never seen before.

The researchers used ExPecto to predict the mutations that contribute to Crohn's disease, chronic HBV infection and Behçet's disease. Study co-author Chandra Theesfeld then experimentally verified the results. For all three diseases, she found that ExPecto's predicted candidate was a more promising potential contributor to the disease than those proposed by previous studies.

The researchers hope that ExPecto will one day help medical experts identify the genetic contributors to a patient's disease and develop therapies customized to the patient's genome. "Once you know which protein is affected and what the protein does, then you can design drugs that can fix the problem," says study co-author Jian Zhou, a Flatiron research fellow at CCB. For instance, "if you can't produce a certain protein, then you could design a therapy that makes up for the missing protein."

Anyone can access ExPecto's predictions of the effects of more than 140 million possible mutations near protein-encoding genes. These results are available online as part of HumanBase, a data-driven prediction system about human biology and disease developed by the research team. Visitors can type in a gene and see all the potential mutations that could affect that gene's expression in any of 218 tissues and cell types.

Zhou anticipates that ExPecto will be particularly insightful for studying the evolutionary consequences of mutations. He and his colleagues found, for instance, that mutations were less likely to affect genes expressed throughout the human body than genes specialized for one specific tissue type. "We don't have a full explanation yet," he says, but the result could be related to the robustness of more ubiquitous . An issue with a body-wide gene can have a higher likelihood of being fatal or otherwise preventing the individual from passing on his or her genetic information. "Evolution has already done the experiments for us," Zhou says.

Explore further: Mapping the genetic controllers in heart disease

More information: Jian Zhou et al, Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk, Nature Genetics (2018). DOI: 10.1038/s41588-018-0160-6

Related Stories

Mapping the genetic controllers in heart disease

July 10, 2018
Researchers have developed a 3-D map of the gene interactions that play a key role in cardiovascular disease, a study in eLife reports.

Altered gene regulation is more widespread in cancer than expected

July 10, 2018
A large-scale study provides new insights into the mechanisms that can lead to cancer. It can happen when genes mutate, but cancer also can occur when the genetic regions involved in regulating gene expression change. In ...

Even DNA that doesn't encode genes can drive cancer

April 2, 2018
Most of the human genome—98 percent—is made up of DNA but doesn't actually encode genes, the recipes cells use to build proteins. The vast majority of genetic mutations associated with cancer occur in these non-coding ...

New methods for genetics analyses and diagnosis of inflammatory bowel disease

April 27, 2018
The two most common types of inflammatory bowel disease are ulcerous colitis and Crohn's disease. These are diagnosed via endoscopy and gut biopsy. The diagnosis is often difficult, and the wrong diagnosis may have severe ...

Recommended for you

Scientists identify method to study resilience to pain

December 14, 2018
Scientists at the Yale School of Medicine and Veterans Affairs Connecticut Healthcare System have successfully demonstrated that it is possible to pinpoint genes that contribute to inter-individual differences in pain.

CRISPR joins battle of the bulge, fights obesity without edits to genome

December 13, 2018
A weighty new study shows that CRISPR therapies can cut fat without cutting DNA. In a paper published Dec. 13, 2018, in the journal Science, UC San Francisco researchers describe how a modified version of CRISPR was used ...

Noncoding mutations contribute to autism risk

December 13, 2018
A whole-genome sequencing study of nearly 2,000 families has implicated mutations in 'promoter regions' of the genome—regions that precede the start of a gene—in autism. The study, which appears in the December 14 issue ...

New method for studying ALS more effectively

December 13, 2018
The neurodegenerative disease ALS causes motor neuron death and paralysis. However, long before the cells die, they lose contact with muscles as their axons atrophy. Researchers at Karolinska Institutet in Sweden have now ...

Paternal grandfather's high access to food may indicate higher mortality risk in grandsons

December 12, 2018
A paternal grandfather's access to food during his childhood is associated with mortality risk, especially cancer mortality, in his grandson, shows a large three-generational study from Stockholm University. The reason might ...

New genetic study could lead to better treatment of severe asthma

December 12, 2018
The largest-ever genetic study of people with moderate-to-severe asthma has revealed new insights into the underlying causes of the disease which could help improve its diagnosis and treatment.

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.