Genome analysis with near-complete privacy possible, say researchers

August 17, 2017
A depiction of the double helical structure of DNA. Its four coding units (A, T, C, G) are color-coded in pink, orange, purple and yellow. Credit: NHGRI

It is now possible to scour complete human genomes for the presence of disease-associated genes without revealing any genetic information not directly associated with the inquiry, say Stanford University researchers.

This "genome cloaking" technique, devised by biologists, computer scientists and cryptographers at the university, ameliorates many concerns about genomic privacy and potential discrimination based on an individual's genome sequence.

Using the technique, the researchers were able to identify the responsible gene mutations in groups of patients with four rare diseases; pinpoint the likely culprit of a genetic in a baby by comparing his DNA with that of his parents; and determine which out of hundreds of patients at two individual medical centers with similar symptoms also shared gene mutations. They did this all while keeping 97 percent or more of the participants' unique completely hidden from anyone other than the individuals themselves.

"We now have the tools in hand to make certain that genomic discrimination doesn't happen," said Gill Bejerano, PhD, associate professor of developmental biology, of pediatrics and of computer science. "There are ways to simultaneously share and protect this . Now we can perform powerful genetic analyses while also completely protecting our participants' privacy."

Bejerano shares senior authorship of the research, which will be published Aug. 18 in Science, with Dan Boneh, PhD, professor of computer science and of electrical engineering. Graduate students Karthik Jagadeesh and David Wu share lead authorship of the study.

Applying cryptography techniques

The researchers hope that routine implementation of their technique will help individuals overcome any qualms about privacy that may keep them from sharing their genome sequences. In particular, people may be concerned that DNA sequences or genetic variants currently unassociated with diseases may in the future be linked with as-yet-unidentified increases in risk.

"These are techniques that the cryptography community has been developing for some time," said Boneh, who is the Rajeev Motwani Professor in the School of Engineering. "Now we are applying them to biology. Basically, if you have 1 million people with genomic data they would like to keep private, this approach lets researchers analyze the data in aggregate and only report on findings that are pertinent. An individual might have dozens of anomalous genes, but the researchers and clinicians will only learn about the genes relevant to the study, and nothing else."

When the human genome was fully sequenced in 2001, it was hailed as a remarkable achievement. For the first time, the 3 billion nucleotides that encode the approximately 20,000 genes that keep our bodies running smoothly were tidily listed as a string of letters. But every human has many variations from the published, consensus sequence. These individual differences are what make us unique, but they can also confer increased risk of .

More than 7,000 diseases are caused by variations in the sequence of a single gene. But in order to determine which variations cause the condition, it has been necessary until now to compare the genetic sequences of hundreds or thousands of individuals with and without the disease, letter by letter. Geneticists (or their computer software) then make a list of all the differences and identify which are found primarily in people with the disease under study but rarely in any unaffected people. Those variations are then considered to be prime disease-causing suspects.

"There is a general conception that we can only find meaningful differences by surveying the entire genome," said Bejerano. "But these meaningful differences make up only a very tiny proportion of our DNA. There are now amazing tools in computer science and cryptography that allow researchers to pinpoint only these differences while keeping the remainder of the genome completely private."

In 2008, President George W. Bush signed the Genetic Information Nondiscrimination Act, which prohibits discrimination in matters of health insurance and employment based on an individual's genetic information. But there are many other arenas in which such discrimination could potentially occur, including the purchase of life or disability insurance or applying for a loan.

Giving power to the individual

Jagadeesh and Wu worked together to adapt a cryptographic approach known as Yao's protocol and cloud computing for use with human genomes. A key component of the technique is the involvement of the individual whose genome is to be studied. In particular, each individual encrypts their genome (with the help of a simple algorithm on their own computer or smart phone) into a linear series of values describing the presence or absence of the gene variants under study, without revealing any other information about their genetic sequence. The encrypted information is uploaded into the cloud and the researchers then use a secure, multi-party computation (a cryptographic technique that ensures the input data remain private) to conduct the analysis and reveal only those gene variants likely to be pertinent to the investigation.

"In this way, no person or computer, other than the individuals themselves, has access to the complete set of genetic information," said Bejerano. In each case, the analysis was performed within seconds or minutes with moderate computing power. They hope to extend the technique to include diseases caused by combinations of multiple genetic variants or to handle tens of thousands of sequences such as those found in -wide association studies.

Ultimately the goal is to find the best way to both share the genetic information with researchers while also protecting each patient's privacy in order to advance medical knowledge.

"Often people who have diseases, or those who know that a particular genetic disease runs in their family, are the most reluctant to share their genomic information because they know it could potentially be used against them in some way," said Bejerano. "They are missing out on helping themselves and others by allowing researchers and clinicians to learn from their DNA sequences."

Explore further: Gene testing for the public—a way to ward off disease, or a useless worry?

More information: K.A. Jagadeesh el al., "Deriving genomic diagnoses without revealing patient genomes," Science (2017). science.sciencemag.org/cgi/doi … 1126/science.aam9710

Related Stories

Gene testing for the public—a way to ward off disease, or a useless worry?

June 22, 2017
The launch in Australia of a genomic testing service aimed at healthy people heralds a new era of individual patient care. A scan of your genome, which is the complete set of your genes, to find out if you are at risk of ...

New tool uses genetic and clinical information to find the root cause of unexplained illnesses

April 26, 2017
An algorithm developed by Saudi Arabia's King Abdullah University of Science and Technology (KAUST) scientists has the potential to help patients with mysterious ailments find genetic causes for their undiagnosed diseases.

Individualizing health care one byte at a time

April 18, 2017
Genetic diagnosis of disease and personalization of treatment have the potential to dramatically improve strategies for diagnosis and therapy. Around 80% or rare diseases are thought to have a genetic component, but currently ...

Individuals' medical histories predicted by their noncoding genomes, study finds

February 4, 2016
Identifying mutations in the control switches of genes can be a surprisingly accurate way to predict a person's medical history, researchers at the Stanford University School of Medicine have found.

Study outlines framework for identifying disease risk in genome sequence

January 12, 2017
Imagine a day when you visit the doctor's office for your annual physical. Your physician orders routine tests - cholesterol, glucose and blood count - but they also order a sequence of your genome, all 3 billion letters ...

Automating genetic analysis helps keep up with rapid discovery of new diseases, study finds

August 15, 2016
When Shayla Haddock was born in 1997, her parents immediately realized something was wrong. The sixth of seven children, Shayla had unusual facial features. She had club feet and shorter-than-normal limbs. She was smaller ...

Recommended for you

New approach to studying chromosomes' centers may reveal link to Down syndrome and more

November 20, 2017
Some scientists call it the "final frontier" of our DNA—even though it lies at the center of every X-shaped chromosome in nearly every one of our cells.

Genome editing enhances T-cells for cancer immunotherapy

November 20, 2017
Researchers at Cardiff University have found a way to boost the cancer-destroying ability of the immune system's T-cells, offering new hope in the fight against a wide range of cancers.

A math concept from the engineering world points to a way of making massive transcriptome studies more efficient

November 17, 2017
To most people, data compression refers to shrinking existing data—say from a song or picture's raw digital recording—by removing some data, but not so much as to render it unrecognizable (think MP3 or JPEG files). Now, ...

Genetic mutation in extended Amish family in Indiana protects against aging and increases longevity (Update)

November 15, 2017
The first genetic mutation that appears to protect against multiple aspects of biological aging in humans has been discovered in an extended family of Old Order Amish living in the vicinity of Berne, Indiana, report Northwestern ...

US scientists try first gene editing in the body

November 15, 2017
Scientists for the first time have tried editing a gene inside the body in a bold attempt to permanently change a person's DNA to try to cure a disease.

Genetic variant prompts cells to store fat, fueling obesity

November 13, 2017
Obesity is often attributed to a simple equation: People are eating too much and exercising too little. But evidence is growing that at least some of the weight gain that plagues modern humans is predetermined. New research ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.