Scientists build better way to decode the genome

April 6, 2018, Columbia University
Gradually eliminating low-affinity binding sites identified by NRLB (from left to right) results in a gradual reduction of gene expression (white). Credit: Mann Lab/Columbia's Zuckerman Institute

The genome is the body's instruction manual. It contains the raw information—in the form of DNA—that determines everything from whether an animal walks on four legs or two, to one's potential risk for disease. But this manual is written in the language of biology, so making sense of all that it encodes has proven challenging. Now, Columbia University researchers have developed a computational tool that shines a light on the genome's most hard-to-translate segments. With this tool in hand, scientists can get closer to understanding how DNA guides everything from growth and development to aging and disease.

The researchers recently published their findings in the Proceedings of the National Academy of Sciences.

"The genomes of even simple organisms such as the fruit fly contain 120 million letters worth of DNA, much of which has yet to be decoded because the cues its provides have been too subtle for existing tools to pick up," said Richard Mann, PhD, a principal investigator at Columbia's Mortimer B. Zuckerman Mind Brain Behavior Institute and a senior author of the paper. "But our new algorithm lets us sweep through these millions of lines of genetic code and pick up even the faintest signals, resulting in a much more complete picture what DNA encodes."

Geneticists have long looked for ways to decipher the mysteries hidden in DNA. One such mystery has involved a particularly pervasive class of genes known as the Hox genes.

"Hox genes are the body's master architects; they drive some of the earliest and most critical aspects of growth and differentiation, such as where in a developing embryo the head and limbs should be positioned," said Dr. Mann, who is also the Higgins Professor of Biochemistry and Molecular Biophysics (in Systems Biology) at Columbia University Irving Medical Center. "Hox genes do this by producing proteins called transcription factors, which bind to DNA sequences in order to turn large cohorts of genes on or off; like flipping thousands of switches in exactly the right order."

But decades of research into Hox genes uncovered a paradox: Even though each individual Hox gene guided a different feature of growth, the Hox transcription factors were all binding strongly and visibly to the same set of easily identifiable DNA sequences.

In 2015, Dr. Mann and his team discovered that the Hox transcription factors were also binding at many other locations—just more discretely at so-called 'low-affinity sites.' The scientists believed these low-affinity binding sites to be key to the Hox transcription factors being able to drive one aspect of development versus another. The problem remained how to decipher these sites from the genome.

To address this challenge, Dr. Mann and his lab joined forces with the lab of Harmen Bussemaker, PhD, a Professor in Columbia's Department of Biological Sciences and Systems Biology and an expert in building computational models of genetic activity.

A few years ago, the two labs developed a genetic sequencing method called SELEX-seq to systematically characterize all Hox binding sites. But their approach still had limitations: It required the same DNA fragment to be sequenced over and over again. With each new round, more pieces of the puzzle were revealed, but information about those critical low-affinity binding sites remained hidden.

"It was akin to running the same paragraph through Google translate multiple times, but in the end still only ten percent of the words are accurately translated," said Dr. Mann.

To overcome this challenge, Dr. Bussemaker and his team developed a sophisticated new computer algorithm that was able to explain—for the first time—the behavior of all DNA sequences in the SELEX-seq experiment. They called this algorithm No Read Left Behind, or NRLB.

"In simple terms, NRLB allows us cover the entire spectrum of binding sites—from the highest to the lowest affinity—with a much greater degree of sensitivity and accuracy than any existing method, including state-of-the-art deep learning algorithms" said Dr. Bussemaker, who was the paper's other senior author. "Building on that foundation, we now hope to develop more in-depth biological and computational models to help answer the most complicated questions about the genome."

"For example, diseases such as schizophrenia, Parkinson's disease and autism have been mapped to particular DNA regions that do not appear to have a clear function," said Dr. Mann. "With NRLB, scientists could potentially piece together how transcription factors bind to and activate those regions. This will be critical for finding ways to manipulate that activity to one day reduce one's risk of disease."

More information: Chaitanya Rastogi et al, Accurate and sensitive quantification of protein-DNA binding affinity, Proceedings of the National Academy of Sciences (2018). DOI: 10.1073/pnas.1714376115

Related Stories

Recommended for you

Critical role of DHA on foetal brain development revealed

August 17, 2018
Duke-NUS researchers have found evidence that a natural form of Docosahexaenoic Acid (DHA) made by the liver called Lyso-Phosphatidyl-Choline (LPC-DHA), is critical for normal foetal and infant brain development, and that ...

New algorithm could improve diagnosis of rare diseases

August 17, 2018
Today, diagnosing rare genetic diseases requires a slow process of educated guesswork. Gill Bejerano, Ph.D., associate professor of developmental biology and of computer science at Stanford, is working to speed it up.

Gene silencing critical for normal breast development

August 17, 2018
Researchers have discovered that normal breast development relies on a genetic 'brake', a protein complex that keeps swathes of genes silenced.

Officials remove special rules for gene therapy experiments

August 16, 2018
U.S. health officials are eliminating special regulations for gene therapy experiments, saying that what was once exotic science is quickly becoming an established form of medical care with no extraordinary risks.

Genetic link discovered between circadian rhythms and mood disorders

August 15, 2018
Circadian rhythms are regular 24-hour variations in behaviour and activity that control many aspects of our lives, from hormone levels to sleeping and eating habits.

Ovarian cancer genetics unravelled

August 14, 2018
Patterns of genetic mutation in ovarian cancer are helping make sense of the disease, and could be used to personalise treatment in future.

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.