An international team of researchers has identified just 200 positions within the curves of the DNA helix that they believe capture much of the genetic diversity in European Americans, a population with one of the most diverse and complex historic origins on Earth. Their findings narrow the search for the elusive ancestral clues known as single nucleotide polymorphisms, or SNPs, that cause disease and account for the minute variations in the European American population.
"With this study, we looked at a very large population to determine how each individual could be stratified based on his or her DNA," said Petros Drineas, assistant professor of computer science at Rensselaer Polytechnic Institute and one of the two lead authors of the study. The researchers can now begin to analyze each SNP to understand the possible biological significance of those genetic, ancestral differences.
The research, which was published in the July 2008 edition of PLoS Genetics, is the first to isolate genetic ancestral clues based on a method that is purely computational, requiring no previous personal history. The other lead author of the study is Peristera Paschou of the Democritus University of Thrace in Greece.
The researchers plan to use the data to determine if any of the approximately 200 ancestry informative SNPs that they have identified change the way the body develops. "We want to see if the SNPs tied to a specific ancestry hold any biological significance to populations of different origins. We want to see if the SNPs that we isolated are related to natural selection and adaptation, for example to the weather conditions of different regions," Drineas said. To help do so, the research team will move from the computer lab to the biology lab for further study.
In addition, the researchers hope that their findings will help narrow down the search for those SNPs that cause disease, according to Drineas.
Our genes are being increasingly linked to our susceptibility to certain diseases. Today, scientists are on the prowl to isolate and understand these "weakest links" in our DNA. With the discovery of each tiny SNP that is linked to specific diseases, researchers come closer to understanding our predisposition to certain diseases, as well as to developing cures.
However, SNPs linked to disease account for only a minuscule fraction of the estimated 10 million SNPs found in the human genome. Scientists have made great strides to narrow down the genetic playfield to just the genetic variations that cause disease, but other minor genetic variations like ancestry are only recently being accounted for. With this study, researchers will be able to quickly and inexpensively identify the genes linked to ancestry and unrelated to disease, and remove many of them from contention as causes of disease, thus greatly narrowing the search.
With this method, the researchers did not need prior information from the participants regarding their ancestry, which is required for most current genetic population studies. "Because this method is purely computational and leverages linear algebraic methods such as Principal Components Analysis, without the use of information on self-reported ancestry, we were able to treat the data as a black box," Drineas said. Drineas does note that such self-reporting in genetics studies remains a fairly accurate and important way to trace ancestry, but is often difficult in populations as varied as European Americans.
The European American population was chosen because its genetic background, reflecting its historic origins, is among the most complex on the planet, requiring fine resolution characterization of the genetic code in order to define genetic structure, according to Drineas.
The researchers analyzed 1,521 individuals for more than 300,000 SNPs across the entire genome. The data were made available by the National Institute of Neurological Disorders and Stroke (NINDS) as well as the CAP (Cholesterol and Pharmacogenetics) and PRINCE (Pravastatin Inflammation/CRP Evaluation) studies. The team used linear algebra to find patterns in the highly diverse data. When the data sets were analyzed using the proposed algorithms, these patterns pointed to SNPs shared between groups from the same ancestral background.
"Much of the genetic variation was found to stretch between two 'points' – what we speculate is the Northern European to Southern European ancestry axis," according to Drineas and Paschou. Importantly, their study removes any redundant SNPs uncovered during the modeling process, better targeting the most informative SNPs and reducing genotyping cost.
Source: Rensselaer Polytechnic Institute
Explore further: Consortium develops technology to identify genetic and environmental causes of cancers