Scientists build better way to decode the genome

April 6, 2018, Columbia University
Gradually eliminating low-affinity binding sites identified by NRLB (from left to right) results in a gradual reduction of gene expression (white). Credit: Mann Lab/Columbia's Zuckerman Institute

The genome is the body's instruction manual. It contains the raw information—in the form of DNA—that determines everything from whether an animal walks on four legs or two, to one's potential risk for disease. But this manual is written in the language of biology, so making sense of all that it encodes has proven challenging. Now, Columbia University researchers have developed a computational tool that shines a light on the genome's most hard-to-translate segments. With this tool in hand, scientists can get closer to understanding how DNA guides everything from growth and development to aging and disease.

The researchers recently published their findings in the Proceedings of the National Academy of Sciences.

"The genomes of even simple organisms such as the fruit fly contain 120 million letters worth of DNA, much of which has yet to be decoded because the cues its provides have been too subtle for existing tools to pick up," said Richard Mann, PhD, a principal investigator at Columbia's Mortimer B. Zuckerman Mind Brain Behavior Institute and a senior author of the paper. "But our new algorithm lets us sweep through these millions of lines of genetic code and pick up even the faintest signals, resulting in a much more complete picture what DNA encodes."

Geneticists have long looked for ways to decipher the mysteries hidden in DNA. One such mystery has involved a particularly pervasive class of genes known as the Hox genes.

"Hox genes are the body's master architects; they drive some of the earliest and most critical aspects of growth and differentiation, such as where in a developing embryo the head and limbs should be positioned," said Dr. Mann, who is also the Higgins Professor of Biochemistry and Molecular Biophysics (in Systems Biology) at Columbia University Irving Medical Center. "Hox genes do this by producing proteins called transcription factors, which bind to DNA sequences in order to turn large cohorts of genes on or off; like flipping thousands of switches in exactly the right order."

But decades of research into Hox genes uncovered a paradox: Even though each individual Hox gene guided a different feature of growth, the Hox transcription factors were all binding strongly and visibly to the same set of easily identifiable DNA sequences.

In 2015, Dr. Mann and his team discovered that the Hox transcription factors were also binding at many other locations—just more discretely at so-called 'low-affinity sites.' The scientists believed these low-affinity binding sites to be key to the Hox transcription factors being able to drive one aspect of development versus another. The problem remained how to decipher these sites from the genome.

To address this challenge, Dr. Mann and his lab joined forces with the lab of Harmen Bussemaker, PhD, a Professor in Columbia's Department of Biological Sciences and Systems Biology and an expert in building computational models of genetic activity.

A few years ago, the two labs developed a genetic sequencing method called SELEX-seq to systematically characterize all Hox binding sites. But their approach still had limitations: It required the same DNA fragment to be sequenced over and over again. With each new round, more pieces of the puzzle were revealed, but information about those critical low-affinity binding sites remained hidden.

"It was akin to running the same paragraph through Google translate multiple times, but in the end still only ten percent of the words are accurately translated," said Dr. Mann.

To overcome this challenge, Dr. Bussemaker and his team developed a sophisticated new computer algorithm that was able to explain—for the first time—the behavior of all DNA sequences in the SELEX-seq experiment. They called this algorithm No Read Left Behind, or NRLB.

"In simple terms, NRLB allows us cover the entire spectrum of binding sites—from the highest to the lowest affinity—with a much greater degree of sensitivity and accuracy than any existing method, including state-of-the-art deep learning algorithms" said Dr. Bussemaker, who was the paper's other senior author. "Building on that foundation, we now hope to develop more in-depth biological and computational models to help answer the most complicated questions about the genome."

"For example, diseases such as schizophrenia, Parkinson's disease and autism have been mapped to particular DNA regions that do not appear to have a clear function," said Dr. Mann. "With NRLB, scientists could potentially piece together how transcription factors bind to and activate those regions. This will be critical for finding ways to manipulate that activity to one day reduce one's risk of disease."

More information: Chaitanya Rastogi et al, Accurate and sensitive quantification of protein-DNA binding affinity, Proceedings of the National Academy of Sciences (2018). DOI: 10.1073/pnas.1714376115

Related Stories

Recommended for you

Scientists identify critical cancer immunity genes using new genetic barcoding technology

October 20, 2018
Scientists at Mount Sinai have developed a novel technology for simultaneously analyzing the functions of hundreds of genes with resolution reaching the single cell level. The technology relies on a barcoding approach using ...

A single missing gene leads to miscarriage

October 19, 2018
A single gene from the mother plays such a crucial role in the development of the placenta that its dysfunction leads to miscarriages. Researchers from the Medical Faculty of Ruhr-Universität Bochum (RUB) have observed this ...

Making gene therapy delivery safer and more efficient

October 18, 2018
Viral vectors used to deliver gene therapies undergo spontaneous changes during manufacturing which affects their structure and function, found researchers from the Perelman School of Medicine at the University of Pennsylvania ...

Student develops microfluidics device to help scientists identify early genetic markers of cancer

October 16, 2018
As anyone who has played "Where's Waldo" knows, searching for a single item in a landscape filled with a mélange of characters and objects can be a challenge. Chrissy O'Keefe, a Ph.D. student in the Department of Biomedical ...

Researchers use brain cells in a dish to study genetic origins of schizophrenia

October 16, 2018
A study in Biological Psychiatry has established a new analytical method for investigating the complex genetic origins of mental illnesses using brain cells that are grown in a dish from human embryonic stem cells. Researchers ...

Why heart contractions are weaker in those with hypertrophic cardiomyopathy

October 16, 2018
When a young athlete suddenly dies of a heart attack, chances are high that they suffer from familial hypertrophic cardiomyopathy (HCM). Itis the most common genetic heart disease in the US and affects an estimated 1 in 500 ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.