Scientists explore safeguards for genomic data privacy

by Nancy Owano weblog

By now the general public has become aware that mobile phone applications, bank security systems and credit card databases are not immune to vulnerabilities; information thefts happen. Some computer scientists now say it's time to recognize that vulnerabilities in genetic databases need recognition too. Are anonymous genetic profiles truly anonymous? Is data de-identifying technically feasible for genetic data?

People having their genomes mapped for research or for private information would prefer to assume the data will be anonymous. After all, as an article in Inside Science points out, a person's genome carries information about inherited diseases and physical traits, stored in strands of DNA. "The consequences of being able to search, cross-reference, and analyze this information are profound." The solution is not to stop the research but to find ways where research can continue without privacy abuse.

At a February 16 symposium at the American Association for the Advancement of Science. (AAAS) Annual Meeting, a panel of experts addressed tensions between the need for genetic privacy and the medical community's interest in genetic data. Yaniv Erlich, a fellow at the Whitehead Institute for Biomedical Research, has been a key voice in discussing the genetic privacy issue. At the February meeting, he talked about routes by which genetic data can be breached..

Also, in a review submitted October last year, "Routes for Breaching and Protecting Genetic Privacy," Erlich and Arvind Narayanan, an assistant professor in the Department of Computer Science at Princeton, called attention to the topic of of genetic information.

"We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity," they wrote. "Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of ."

(Narayanan, elsewhere, in talking about his doctoral research on problems with data anonymization, said his thesis, "in a sentence, is that the level of anonymity that consumers expect—and companies claim to provide—in published or outsourced databases is fundamentally unrealizable.")

The two authors suggested that privacy by design algorithms include access control as well as differential privacy and cryptographic techniques. "So far," they said, "data custodians of mainly adopted access control as a mitigation strategy." They added that new developments in cryptographic techniques "may usher in an additional arsenal of security by design techniques."

In January 2013, Erlich took part in a study, "Identifying Personal Genomes by Surname Inference," published in Science, that clearly demonstrated a potential for breaches of privacy in genomics studies. The authors of that study are Melissa Gymrek, Amy L. McGuire, David Golan, Eran Halperin and Erlich. They showed how identities of volunteers who donate personal genome sequence data for research may be revealed merely on the basis of publicly available information. The researchers recovered the identities of nearly 50 anonymous participants in the 1000 Genomes Project through free, publicly accessible Internet resources.

"Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target. A key feature of this technique is that it entirely relies on free, publicly accessible Internet resources."

The findings were shared with officials at the National Human Genome Research Institute (NHGRI) and National Institute of General Medical Sciences (NIGMS). In response, NIGMS and NHGRI moved certain demographic information from the publicly-accessible portion of the NIGMS cell repository to help reduce the risk of future breaches.

More information: Science 18 January 2013: Vol. 339 no. 6117 pp. 321-324 DOI: 10.1126/science.1229566

add to favorites email to friend print save as pdf

Related Stories

Personal Genome Project Canada launches

Dec 10, 2012

The Personal Genome Project Canada (PGP-C) launches this week giving Canadians an unprecedented opportunity to participate in a groundbreaking research study about human genetics and health.

Recommended for you

Changes in scores of genes contribute to autism risk

Oct 29, 2014

Small differences in as many as a thousand genes contribute to risk for autism, according to a study led by Mount Sinai researchers and the Autism Sequencing Consortium (ASC), and published today in the journal Nature.

Dozens of genes associated with autism in new research

Oct 29, 2014

Two major genetic studies of autism, led in part by UC San Francisco scientists and involving more than 50 laboratories worldwide, have newly implicated dozens of genes in the disorder. The research shows ...

Genetic link to kidney stones identified

Oct 29, 2014

A new breakthrough could help kidney stone sufferers get an exact diagnosis and specific treatment after genetic links to the condition were identified.

User comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.