For decades, the human genome could only tell us what we already suspected about the evolution of certain traits. Researchers were able to trace the genetic origin stories of lactose tolerance (as opposed to lactose intolerance), malaria resistance, and more only after observing these successful traits in specific populations. Now, the study of positive selection – the ability to determine which genetic changes have conferred an evolutionary advantage – has reached a turning point: the genome itself can be used as a starting point to guide scientists to important genetic locations, leading to hypotheses about human health and disease.
In a paper published this week in Cell, researchers from the Broad Institute, Harvard University, Harvard Medical School, and elsewhere describe the tools and resources that have come together to make this shift possible. Pardis Sabeti, a senior associate member at the Broad and an associate professor at Harvard, and Shari Grossman, a graduate research assistant in the Sabeti and Lander labs, worked with colleagues to develop a tool known as the Composite of Multiple Signals (CMS) test to detect the genetic signals of positive selection, and trace these signals to specific sites in the genome. Thanks to the public release of sequence data from the 1000 Genomes Project, the research team has been able to put CMS into practice to generate a list of 412 candidate signals.
"The field has entered a whole new era of discovery," said Sabeti who began working on algorithms to detect selection as a medical student and during her postdoctoral fellowship in the Lander laboratory at the Broad Institute. "We now have the right tools and the right datasets and are poised to pinpoint important variants."
Because of the genome's structure, positive signals of selection are difficult to trace back to specific sites in the genome. Just as one might struggle to trace the sound of a car alarm to a specific vehicle among thousands in a city, researchers have been able to pick up the sirens of positive selection, but have been unable to pinpoint the precise variants emitting them.
"There are many different ways of detecting selection," said Grossman. Previously, researchers would pick one of these means and design a test based on it, but they would be left with thousands of variants to sift through. "We wanted to combine all of these tests into one, simple test. And that's what CMS is. Combining tests allows us to localize the signal down to 100 candidate variants or less, which is a much more feasible number of variants to test."
The research team then followed up on these candidate signals, looking at possible functions. Several important categories of pathways emerged from the team's analysis, including pathways tied to metabolism, skin pigmentation, and the immune system. Within the latter, the CMS test pointed to genes involve in the activation of the immune system, as well as genes that influence the receptors that detect foreign invaders. As a proof-of concept, the researchers took a deeper look at the gene TLR5, which has been implicated in response to flagellated bacteria. TLR5 is a toll-like receptor – part of the first line of defense against bacteria. The particular variant that the researchers uncovered makes the immune system respond less dramatically to invaders, which, paradoxically, seems to help in the fight against them.
"We were thinking, 'Why would decreasing the signal be important?'" Grossman recalls. "One possibility involves the role of TLR5 in facilitating certain bacterial infections. It turns out that in order for these bacteria to enter the host organism, they have to invade activated immune cells and hitch a ride to the lymph nodes. If the receptors are never activated, the bacteria have much more difficulty infecting the host."
Unlike previous work that has identified large regions of the genome as perhaps harboring signals of positive selection, the new work offers a catalog of specific mutations worth pursuing. In a second Cell paper, published in the same edition of the journal, Sabeti and a team that included anthropologists, biologists, dermatologists, and others pursued another mutation from the CMS results – one that appears to affect sweat, skin, hair, and teeth. They tested the mutation in a mouse model.
"With this new data, we – and others – can examine numerous mutations and search for biologically meaningful outcomes," said Sabeti.
The researchers note that the work was enabled by the data produced from the 1000 Genomes Project, an endeavor to sequence the genomes of more than a thousand people and release this information publicly.
"When the 1000 Genomes data were published, we had a complete set of variants and we realized we could make this list that we'd been dreaming of making," said Grossman. The researchers have added information about function and expression changes as well as the influence of disruptions in the regions outside of genes. With help from the Broad's RNAi Platform and through the use of genome engineering tools, they plan to continue pursuing and expanding these annotations, scaling up their studies to add context and deepen their understanding of the function of telltale variants.
More information: Grossman S et al. Identifying Recent Adaptations in Large-Scale Genomic Data, Cell February 14, 2013. DOI: 10.1016/j.cell.2013.01.035