Team streamlines biomedical research by making genetic data easier to search

May 10, 2016, The Scripps Research Institute
Members of the Scripps Research Institute team include (left to right) Lelong, Andrew Su, Chunlei Wu, Jiwen Xin and Ginger Tsueng. Credit: Photo courtesy of the Scripps Research Institute.

Call them professional "data wranglers." A team of scientists at The Scripps Research Institute (TSRI) is expanding web services to make biomedical research more efficient. With their free, public projects, and, researchers around the world have a faster way to spot new connections between genes and disease.

"This is about how to deliver information quickly to biologists," said Chunlei Wu, associate professor of molecular medicine at TSRI.

Wu and TSRI Associate Professor Andrew Su co-led a new study published in the journal Genome Biology reporting on progress in setting up these services and the positive response from users so far.

Good News, Bad News

Here's the good news: Genetic sequencing is faster and more affordable these days, giving scientists a better understanding of mechanisms behind many diseases. The bad news? This flood of means scientists have to wade through multiple databases and PDF files to gather useful information.

Wu said he has spent hours downloading and parsing data, often running into problems when he discovers that the original data creators didn't annotate information in a standard way.

With support from the National Institutes of Health's (NIH) "Big Data to Knowledge" (BD2K) initiative, Wu, Su and their colleagues have begun to tame this problem by creating a data-harvesting platform to automatically import and update data from a variety of public databases. The data they aggregate are then structured and delivered via two high-performance web search services, and, powered by the latest cloud-computation technology.

"Now researchers can focus on their own work instead of going through the data-wrangling effort," said Wu. and are also powerful because of their ability to scale up as the user base and datasets grow. holds information on more than 13 million genes from about 15,000 species. The service receives four to five million user "queries" each month, and the researchers are prepared to accommodate even more by expanding their use of Amazon cloud servers. currently covers more than 316 million unique variants gathered from 14 community data sources.

The services have received positive feedback from the research community so far, said Ginger Tsueng, scientific outreach project manager in the Su lab and co-author of the new study. In just this year, has received more than four million hits, while has handled more than 17 million.

A Foundation for Future Applications

The researchers have made these services open source to encourage others to use the data and develop their own applications.

For example, researchers at the University of Washington have built an interface that retrieves data from and contributes additional information to run, a site that aims to connect patients who share rare genetic diseases. also provides the backbone for BioGPS, a resource for learning about gene and protein function, run by Su, Wu and TSRI programmer Max Nanis.

Another project in the pipeline is an app built on the platform that displays variants when a user scans a gene name—from a poster at a scientific conference, for example.

"Bioinformatics tools and analyses are highly dependent on having solid foundations of other tools on which to build," said Su. " and are key pieces of infrastructure that many bioinformaticians are using every day."

Explore further: 2nd security firm raises concerns about Cruz and Kasich apps (Update)

More information: Jiwen Xin et al, High-performance web services for querying gene and variant annotation, Genome Biology (2016). DOI: 10.1186/s13059-016-0953-9

Related Stories

2nd security firm raises concerns about Cruz and Kasich apps (Update)

April 25, 2016
Another computer-security firm raised concerns Monday about the potential for hackers to glean users' personal data from phone apps released by the campaigns of Republican presidential contenders Ted Cruz and John Kasich.

Latest clinical information on Zika virus available at info centers on Elsevier Connect and The Lancet

February 10, 2016
To help healthcare professionals, medical researchers and the public understand the ongoing outbreak of the Zika virus, Elsevier has created a Zika Virus Resource Center on Elsevier Connect, Elsevier's public news and information ...

Recommended for you

New methods find undiagnosed genetic diseases in electronic health records

March 15, 2018
Patients diagnosed with heart failure, stroke, infertility and kidney failure could actually be suffering from rare and undiagnosed genetic diseases.

Hundreds of genes linked to intelligence in global study

March 14, 2018
More than 500 genes linked to intelligence have been identified in the largest study of its kind. Scientists compared variation in DNA in more than 240,000 people from around the world, to discover which genes are associated ...

Study finds that genes play a role in empathy

March 12, 2018
A new study published today suggests that how empathic we are is not just a result of our upbringing and experience but also partly a result of our genes.

Large-scale genetic study provides new insight into the causes of stroke

March 12, 2018
An international research consortium studying 520,000 individuals from around the world has identified 22 new genetic risk factors for stroke, thus tripling the number of gene regions known to affect stroke risk. The results ...

Study suggests some CpGs in the genome can be hemimethylated by design

March 9, 2018
A pair of researchers at Emory University has found that some CpGs in the genome can be hemimethylated by design, rather than by chance. In their paper published in the journal Science, Chenhuan Xu and Victor Corces describe ...

Intravenous arginine benefits children after acute metabolic strokes

March 9, 2018
Children with mitochondrial diseases who suffered acute metabolic strokes benefited from rapid intravenous treatment with the amino acid arginine, experiencing no side effects from the treatment. The diseases were caused ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.