New software helps detect adaptive genetic mutations

February 20, 2018 by Kevin Stacey, Brown University
Using a new machine learning approach, researchers at Brown's Center for Computational Molecular Biology found adaptive mutations in metabolic genes in a group of African hunter-gatherers. One mutation the software found is closely linked to a protein-altering mutation that is virtually absent in populations around the world, but has a frequency of 27 percent in the hunter-gatherer genome data. Credit: Ramachandran lab / Brown University

Researchers from Brown University have developed a new method for sifting through genomic data in search of genetic variants that have helped populations adapt to their environments. The technique, dubbed SWIF(r), could be helpful in piecing together the evolutionary history of people around the world, and in shedding light on the evolutionary roots of certain diseases and medical conditions.

SWIF(r) brings several different statistical tests together into a single machine-learning framework. That framework can then be used to scan genomic data from multiple individuals and compute the probabilities that individual mutations or regions of a genome are adaptive.

"These individual statistical techniques are useful, but none of them is particularly powerful on its own," said Lauren Alpert Sugden, a postdoctoral researcher at Brown who led the 's development. "The method we've developed combines those techniques in a way that's careful and that produces an output that's easy to interpret."

Alpert Sugden works in the lab of Sohini Ramachandran, an associate professor and director of Brown's Center for Computational Molecular Biology. The researchers describe their work in the journal Nature Communications.

Exploring adaptation

The vast majority of mutations that commonly occur in the genomes of humans and other animals are neutral, meaning they neither help nor hurt an individual's survival. But every once in a while nature hits on a mutation that's beneficial—one that aids in an organism's survival or reproductive success. These adaptive mutations can spread quickly (evolutionarily speaking) through a population in subsequent generations, a process known as a selective sweep.

SWIF(r) looks for the statistical signatures of selective sweeps in genomic datasets. It does so using machine learning and a combination of four established statistical tests measuring different signatures of adaptation. One test checks if a particular mutation appears in a population more frequently than it does in other populations. Others measure genetic variation in a region of the genome, with the idea that strong selection would tend to reduce variability.

This isn't the first technique that brings multiple tests into one composite framework. But part of what's new about SWIF(r) is that it controls for correlations that arise between those tests, which can throw off the results. The acronym SWIF(r) stands for "SWeep Inference Framework (controlling for correlation)," a lowercase "r" being the mathematical notation for correlation.

SWIF(r) has several advantages over other composite techniques, the researchers say. While most techniques identify only regions of the genome likely to contain adaptive mutations, SWIF(r) can also identify the particular mutations themselves. And while other techniques return results that can be difficult to interpret, SWIF(r) returns a simple probability that an individual mutation or genome region is adaptive.

To show that the technique works, the researchers validated it on a simulated dataset in which known adaptive mutations were included, as well as on canonical adaptive mutations that have been identified in human genomes through multiple molecular experiments. SWIF(r) was shown to outperform both individual statistical techniques and other composite techniques in picking out those adaptive mutations, while producing a lower rate of false positives.

Real-world data

Having demonstrated that SWIF(r) works, the researchers used it on a real genomic data from the ‡Khomani San, a group of hunter-gatherers living in southern Africa.

"The ‡Khomani San have the largest genetic diversity of any living population," Alpert Sugden said, "which is interesting from our perspective because there's a lot of opportunity for adaptive mutations to arise."

Among other findings, SWIF(r) identified several adaptive mutations in a set of genes responsible for energy and fat storage. That's interesting from the perspective of what's known as the "thrifty gene" hypothesis, the researchers say.

The hypothesis suggests that because hunter-gatherers often experience an inconsistent food supply, they're likely to have a genetic predisposition to storing energy in the form of fat. However, those genes could be a liability in agricultural societies where food supply tends to be more consistent, potentially contributing to obesity and complications like type 2 diabetes. A deeper dive into the functions of the adaptive genes identified by SWIF(r) may be helpful in further exploring the thrifty gene idea.

Ramachandran says the way in which they used SWIF(r) on the ‡Khomani San data is instructive for how the technique might be used moving forward. The researchers say they didn't start with the notion that they'd find adaptations in genes for metabolism, they simply popped out of the data as it was analyzed. That's a contrast to how such research is currently done, Ramachandran says.

"They way we study genetic adaptation now is we start by looking at a particular trait or phenotype, and then we work backward to identify the associated genes and ," she said. "This new approach uses data-driven machine learning to start in the genome, searching for adaptive signatures that we can then follow up with more study. So we think this is a way of generating new and interesting hypotheses to test."

The researchers have made the SWIF(r) code open-source, and they hope that other research groups will use it to explore from populations worldwide.

Explore further: New algorithm can pinpoint mutations favored by natural selection in large sections of the human genome

More information: Lauren Alpert Sugden et al. Localization of adaptive variants in human genomes using averaged one-dependence estimation, Nature Communications (2018). DOI: 10.1038/s41467-018-03100-7

Related Stories

New algorithm can pinpoint mutations favored by natural selection in large sections of the human genome

February 20, 2018
A team of scientists has developed an algorithm that can accurately pinpoint, in large regions of the human genome, mutations favored by natural selection. The finding provides deeper insight into how evolution works, and ...

New technique searches 'dark genome' for disease mutations

August 10, 2017
When doctors can't find a diagnosis for patient's disease, they turn to genetic detectives. Equipped with genomic sequencing technologies available for less than 10 years, these sleuths now routinely search through a patient's ...

Pygmy phenotype developed many times, adaptive to rainforest

August 18, 2014
The small body size associated with the pygmy phenotype is probably a selective adaptation for rainforest hunter-gatherers, according to an international team of researchers, but all African pygmy phenotypes do not have the ...

Recommended for you

Psychiatric disorders share an underlying genetic basis

June 21, 2018
Psychiatric disorders such as schizophrenia and bipolar disorder often run in families. In a new international collaboration, researchers explored the genetic connections between these and other disorders of the brain at ...

Deep data dive helps predict cerebral palsy

June 21, 2018
When University of Delaware molecular biologist Adam Marsh was studying the DNA of worms living in Antarctica's frigid seas to understand how the organisms managed to survive—and thrive—in the extremely harsh polar environment, ...

Genetic variation in progesterone receptor tied to prematurity risk, study finds

June 21, 2018
Humans have unexpectedly high genetic variation in the receptor for a key pregnancy-maintaining hormone, according to research led by scientists at the Stanford University School of Medicine. The finding may help explain ...

Shared genetics may shape treatment options for certain brain disorders

June 20, 2018
Symptoms of schizophrenia and bipolar disorder, including psychosis, depression and manic behavior, have both shared and distinguishing genetic factors, an international consortium led by researchers from Vanderbilt University ...

Scientists unravel DNA code behind rare neurologic disease

June 20, 2018
Scientists conducting one of the largest full DNA analyses of a rare disease have identified a gene mutation associated with a perplexing brain condition that blinds and paralyzes patients.

Simple sugar delays neurodegeneration caused by enzyme deficiency

June 20, 2018
A new therapeutic approach may one day delay neurodegeneration typical of a disease called mucopolysaccharidoses IIIB (MPS IIIB). Neurodegeneration in this condition results from the abnormal accumulation of essential cellular ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.