New computational method predicts genes likely to be causal in disease

August 10, 2015
This stylistic diagram shows a gene in relation to the double helix structure of DNA and to a chromosome (right). The chromosome is X-shaped because it is dividing. Introns are regions often found in eukaryote genes that are removed in the splicing process (after the DNA is transcribed into RNA): Only the exons encode the protein. The diagram labels a region of only 55 or so bases as a gene. In reality, most genes are hundreds of times longer. Credit: Thomas Splettstoesser/Wikipedia/CC BY-SA 4.0

A new computational method developed by scientists from the University of Chicago improves the detection of genes that are likely to be causal for complex diseases and biological traits. The method, PrediXcan, estimates gene expression levels across the whole genome - a better measure of biological action than single mutations - and integrates it with genome-wide association study (GWAS) data. PrediXcan has the potential to identify gene targets for therapeutic applications faster and with greater accuracy than traditional methods. It is described online in Nature Genetics on Aug 10, 2015.

"PrediXcan tells us which are more likely to affect a disease or trait by learning the relationship between genotype, from large-scale transcriptome studies, and disease associations from GWAS studies," said study leader Hae Kyung Im, PhD, research associate (assistant professor) of genetic medicine at the University of Chicago. "This is the first method that accounts for the mechanisms of gene regulation, and can be applied to any heritable disease or phenotype."

Genome-wide association studies are a critical tool in the detection of genes involved in complex diseases such as diabetes and cancer or traits such as height and obesity. GWASs determine these links by identifying single letter DNA variants that appear more frequently in individuals with a disease or trait of interest. However, significant follow-up work is needed to understand the mechanism of action of these variants. Most disease-associated variants are do not alter the function of a gene but instead change the amount of the gene copied in the cells. These studies are unable to determine a causal relationship due to factors such as - a genetic variant may instead contribute to altered expression levels of true causal genes, which remain undetected by a GWAS.

Transcriptome studies such as the National Institute of Health's Genotype-Tissue Expression (GTEx) program aim to overcome this limitation by studying gene expression levels and regulation mechanisms and their relationship with diseases, instead of only DNA sequence. But transcriptome studies also have significant limitations, such an inability to determine reverse causality - whether gene expression levels are altered by disease, or whether disease arises due to altered gene expression.

To develop a method of detecting associations between genes and traits that avoids these issues, Im and her colleagues integrated both transcriptome and GWAS data into a single computation framework, which they named PrediXcan. The method uses computational algorithms to learn how genome sequence influences gene expression, based on large-scale transcriptome datasets such as GTEx. This can then be used to create computational estimates of gene expression levels from any whole genome sequence or chip dataset.

Genomes that have been sequenced as part of a GWAS can be run through PrediXcan to generate a gene expression level profile, which is then analyzed to determine the association between gene and the disease states or the trait of interest being studied.

The method not only can identify potentially causal genes, it can determine directionality - whether high or low levels of expression might cause the disease or trait. As calculations are based on DNA sequence data and not physical measurements, it can tease apart the genetically determined component of gene expression from the effects of the trait itself (avoiding reverse causality) and other factors such as environment. With PrediXcan, validation studies only need to test a few thousand genes at most, instead of millions of potential single mutations. In addition, the method can be used to reanalyze existing genomic datasets with a focus on mechanism in a high-throughput manner, addressing a major gap in GWAS studies.

"This integrates what we know about consequences of genetic variation in the transcriptome in order to discover genes, instead of just looking at mutations," Im said. "In a way, we're modeling one mechanism through which genes affect disease or traits, which is the regulation of gene expression level."

While PrediXcan can discover links between genes and traits in a high-throughput manner, Im notes that because it creates estimates based on data, it is most accurate for strongly heritable traits. However, almost every complex trait or disease has a genetic component. The method can be used to predict the influence of that component, reducing the complexity of follow-up studies.

Im is now working to improve the prediction of PrediXcan and applying it to mental health disorders. In addition, she is working to expand it beyond levels, to predict the links between diseases or traits and protein levels, epigenetics and other measurements that can be estimated based on genomic data.

"GWAS studies have been incredibly successful at finding genetic links to , but they have been unable to account for mechanism," Im said. "We now have a computational method that allows us to understand the consequences of GWAS studies."

Explore further: Complex, large-scale genome analysis made easier

More information: A gene-based association method for mapping traits using reference transcriptome data, DOI: 10.1038/ng.3367

Related Stories

Complex, large-scale genome analysis made easier

June 16, 2015
Researchers at EMBL-EBI have developed a new approach to studying the effect of multiple genetic variations on different traits. The new algorithm, published in Nature Methods, makes it possible to perform genetic analysis ...

New strategy for mapping regulatory networks associated with multi-gene diseases

April 23, 2015
Scientists at the University of Massachusetts Medical School have applied a powerful tool in a new way to characterize genetic variants associated with human disease. The work, published today in Cell, will allow scientists ...

Slight differences: New insights into the regulation of disease-associated genes

June 16, 2015
Researchers of the Max Delbrück Center for Molecular Medicine (MDC) in the Helmholtz Association, in collaboration with the National Heart Research Institute Singapore (NHRIS), have gained new insights into the regulation ...

New candidate genes for immunodeficiency identified by using dogs as genetic models

July 30, 2015
IgA deficiency is one of the most common genetic immunodeficiency disorders in humans and is associated with an insufficiency or complete absence of the antibody IgA. Researchers led from Uppsala University and Karolinska ...

Genome-wide association studies mislead on cardiac arrhythmia risk gene

March 20, 2014
Although genome-wide association studies have linked DNA variants in the gene SCN10A with increased risk for cardiac arrhythmia, efforts to determine the gene's direct influence on the heart's electrical activity have been ...

Recommended for you

An architect gene is involved in the assimilation of breast milk

October 17, 2017
A family of "architect" genes called Hox coordinates the formation of organs and limbs during embryonic life. Geneticists from the University of Geneva (UNIGE) and the Swiss Federal Institute of Technology in Lausanne (EPFL), ...

Study identifies genes responsible for diversity of human skin colors

October 12, 2017
Human populations feature a broad palette of skin tones. But until now, few genes have been shown to contribute to normal variation in skin color, and these had primarily been discovered through studies of European populations.

Genes critical for hearing identified

October 12, 2017
Fifty-two previously unidentified genes that are critical for hearing have been found by testing over 3,000 mouse genes. The newly discovered genes will provide insights into the causes of hearing loss in humans, say scientists ...

Team completes atlas of human DNA differences that influence gene expression

October 11, 2017
Researchers funded by the National Institutes of Health (NIH) have completed a detailed atlas documenting the stretches of human DNA that influence gene expression - a key way in which a person's genome gives rise to an observable ...

Genetic advance for male birth control

October 10, 2017
When it comes to birth control, many males turn to two options: condoms or vasectomies. While the two choices are effective, both methods merely focus on blocking the transportation of sperm.

Researchers uncover new congenital heart disease genes

October 9, 2017
Approximately one in every 100 babies is born with congenital heart disease (CHD), and CHD remains the leading cause of mortality from birth defects. Although advancements in surgery and care have improved rates of survival ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.