New informatics tool makes the most of genomic data

July 11, 2018, University of Illinois at Urbana-Champaign
Professor of Computer Science and Willett Faculty Scholar, Saurabh Sinha, is co-director of the Big Data to Knowledge Center at the University of Illinois. Credit: University of Illinois at Urbana-Champaign

The rise of genomics, the shift from considering genes singly to collectively, is adding a new dimension to medical care; biomedical researchers hope to use the information contained in human genomes to make better predictions about individual health, including responses to therapeutic drugs. A new computational tool developed through a collaboration between the University of Illinois and the Mayo Clinic combines multiple types of genomic information to make stronger predictions about what genomic features are associated with specific drug responses.

The tool, described in Genome Research, was developed by members of KnowEnG, a Center of Excellence established by an NIH Big Data to Knowledge (BD2K) Initiative award to the University of Illinois in partnership with the Mayo Clinic. KnowEnG stands for Knowledge Engine for Genomics, representing the center's mission to develop analytical resources for biomedical work with genomic . The Center is housed within the Carl R. Woese Institute for Genomic Biology at the University of Illinois.

"We all know treatment outcomes for complex diseases like cancers vary dramatically among individuals, from lacking of efficacy resulting in disease recurring to severe toxicity resulting in noncompliance in patients who cannot tolerate these life-saving drugs," said Leiwei Wang, a professor of pharmacology at the Mayo Clinic. "Therefore, it is extremely important for us to understand better of how and why patients respond differently, so that we can truly individualize their therapies by choosing the right drug at the right dose."

The researchers' first step toward this goal was a large-scale data collection effort. They assembled a panel of lab-reared tumor cells derived from a diverse set of individuals, and exposed samples of those cells to one of a set of common anticancer drugs. This allowed them to quantify the drug responses of different genetic backgrounds in a directly comparable way.

Using these data, Mayo Clinic researchers wanted to ask what characteristics of cells from each individual helped determine its unique set of responses to the drugs tested. They collected data on the "expression" of every gene in the genome—how often each gene was being read by the cell and used to create the corresponding protein that gene carries the instructions for.

The team also wanted to look at where those differences in might come from. DNA sequence surrounding in the genome influence when genes are expressed. So do the actions of special proteins called , which bind to DNA and make it easier or harder for genes to be read by cellular machinery. Finally, how different regions of the long DNA strands of the genome are coiled up, the "epigenetic state" of genomic DNA also helps determine how likely a gene is to be expressed.

The team decided to collect data on all of these characteristics of their lines of cells. They had built a comprehensive dataset, but lacked something vital—an analytical tool that could use it to full advantage.

"There was no tool that would exploit all of these together," said Professor of Computer Science and Willett Faculty Scholar Saurabh Sinha, who co-directs the BD2K Center. "From the question came the data . . . then came our part, what do you do with it?"

Sinha and graduate student Casey Hanson developed an algorithm that takes in data on gene expression, genomic factors that help control gene expression, and resulting traits (such as drug response) and uses these to predict which genes are most important in determining the latter. They based their work on a tool they had previously developed named "Gene Expression in the Middle," or GENMi. Their new model, because of its ability to appropriately weight and integrate multiple sources of data, is named "probabilistic GENMi" or pGENMi.

"It's a more rigorous tool; it should automatically handle how to weight different aspects of the data when it's trying to look at many different types of data to reach a common conclusion," Sinha said. "Methodologically, that was the most challenging part, the development of the probabilistic model."

Because this tool is the first of its kind, team had to get creative to assess how well it was working—they had no prior standard of performance for comparison, and the results generated by pGENMi are the basis for further experimental work, not an endpoint.

"Our end result was testable predictions . . . a ranking of what experiments to do and verify that this transcription factor indeed has a role in regulating the response to that drug," Sinha said.

"In a lot of computer science and bioinformatics papers, there is a gold standard database to validate predictions against—but we didn't have the luxury of that," Hanson said. "We had to search a vast literature to try to find, among the myriad ways of doing so and stating that one has done so, experiments that [could] confirm our hypothesis." The team's mix of computer science and biological knowledge was what made this task possible.

Hanson and his coauthors examined whether the predictions generated by the algorithm included associations that were already confirmed by the studies he identified. The literature searches revealed examples in which transcription factors highlighted by pGENMi had been experimentally manipulated, resulting in changes in drug responsiveness. Many of the predictions generated by pGENMi were supported by previous work, making it likely that those not supported by prior work are novel but real associations.

"For example . . . we found a paper in which rapamycin [an anticancer drug] decreased GATA1 [a transcription factor's] binding with DNA. Another paper, we found that . . . rapamycin increased expression of a gene, ERCC1," Hanson said. The same paper linked the transcription factor, GATA1, to ERCC1's expression. Hanson noted that "our own experiments showed that knocking down GATA1 changed the sensitivity of cells to rapamycin," in agreement with the previous work.

To test pGENMi's results even further, the group selected transcription factors predicted to impact responsiveness, as well as several predicted to have little impact, and reduced their function in lab-grown cancer cells. For the majority of the TFs examined, these experimental results were consistent with pGENMi's predictions.

Although in this initial project pGENMi was used to explore the factors that influence the response of cancer cells to , its flexibility would allow for a wide range of applications.

"We have generated tools that can be used broadly by the research community. These tools will be open to anyone who might have the right data sets to both help generate hypothesis and also to help refine the algorithms," Wang said. "This is a perfect example of how expertise in complementary research areas, in this case, computational science and pharmacoproteomics, come together to make a difference."

Explore further: How a thieving transcription factor dominates the genome

More information: Casey Hanson et al, Principled multi-omic analysis reveals gene regulatory mechanisms of phenotype variation, Genome Research (2018). DOI: 10.1101/gr.227066.117

Related Stories

How a thieving transcription factor dominates the genome

June 20, 2018
One powerful DNA-binding protein, the transcription factor PU.1, steals away other transcription factors and recruits them for its own purposes, effectively dominating gene regulation in developing immune cells, according ...

New tools used to identify childhood cancer genes

July 3, 2018
Using a new computational strategy, researchers at UT Southwestern Medical Center have identified 29 genetic changes that can contribute to rhabdomyosarcoma, an aggressive childhood cancer. The group used Bayesian analysis, ...

The surprising role of gene architecture in cell fate decisions

January 16, 2018
Scientists read the code of life—the genome—as a sequence of letters, but now researchers have also started exploring its three-dimensional organisation. In a paper published in Nature Genetics, an interdisciplinary research ...

RefEx, a web tool for a comfortable search of reference data for gene expression analysis

November 3, 2017
Currently, there are many public databases available to everyone in the field of life science, but there are many problems when you try to use them, namely you do not know which to use, how to use and even where to find a ...

Recommended for you

Analytical tool predicts genes that can cause disease by producing altered proteins

July 19, 2018
Predicting genes that can cause disease due to the production of truncated or altered proteins that take on a new or different function, rather than those that lose their function, is now possible thanks to an international ...

Childhood stress leaves lasting mark on genes

July 18, 2018
Kids who experience severe stress are more likely to develop a host of physical and mental health problems by the time they reach adulthood, including anxiety, depression and mood disorders. But how does early life stress ...

Study shows DNA methylation related to liver disease among obese patients

July 18, 2018
DNA methylation is a molecular process that helps enable our bodies to repair themselves, fight infection, get rid of environmental toxins, and even to think. But sometimes this process goes awry.

Protein found to be key component in irregularly excited brain cells

July 17, 2018
In a new study in mice, researchers have identified a key protein involved in the irregular brain cell activity seen in autism spectrum disorders and epilepsy. The protein, p53, is well-known in cancer biology as a tumor ...

World's largest study on allergic rhinitis reveals new risk genes

July 17, 2018
An international team of scientists led by Helmholtz Zentrum München and University of Copenhagen has presented the largest study so far on allergic rhinitis in the journal Nature Genetics. The data of nearly 900,000 participants ...

New platform poised to be next generation of genetic medicines

July 16, 2018
A City of Hope scientist has discovered a gene-editing technology that could efficiently and accurately correct the genetic defects that underlie certain diseases, positioning the new tool as the basis for the next generation ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.