Team streamlines biomedical research by making genetic data easier to search

May 10, 2016
Members of the Scripps Research Institute team include (left to right) Lelong, Andrew Su, Chunlei Wu, Jiwen Xin and Ginger Tsueng. Credit: Photo courtesy of the Scripps Research Institute.

Call them professional "data wranglers." A team of scientists at The Scripps Research Institute (TSRI) is expanding web services to make biomedical research more efficient. With their free, public projects, MyGene.info and MyVariant.info, researchers around the world have a faster way to spot new connections between genes and disease.

"This is about how to deliver information quickly to biologists," said Chunlei Wu, associate professor of molecular medicine at TSRI.

Wu and TSRI Associate Professor Andrew Su co-led a new study published in the journal Genome Biology reporting on progress in setting up these services and the positive response from users so far.

Good News, Bad News

Here's the good news: Genetic sequencing is faster and more affordable these days, giving scientists a better understanding of mechanisms behind many diseases. The bad news? This flood of means scientists have to wade through multiple databases and PDF files to gather useful information.

Wu said he has spent hours downloading and parsing data, often running into problems when he discovers that the original data creators didn't annotate information in a standard way.

With support from the National Institutes of Health's (NIH) "Big Data to Knowledge" (BD2K) initiative, Wu, Su and their colleagues have begun to tame this problem by creating a data-harvesting platform to automatically import and update data from a variety of public databases. The data they aggregate are then structured and delivered via two high-performance web search services, MyGene.info and MyVariant.info, powered by the latest cloud-computation technology.

"Now researchers can focus on their own work instead of going through the data-wrangling effort," said Wu.

MyGene.info and MyVariant.info are also powerful because of their ability to scale up as the user base and datasets grow.

MyGene.info holds information on more than 13 million genes from about 15,000 species. The service receives four to five million user "queries" each month, and the researchers are prepared to accommodate even more by expanding their use of Amazon cloud servers. MyVariant.info currently covers more than 316 million unique variants gathered from 14 community data sources.

The services have received positive feedback from the research community so far, said Ginger Tsueng, scientific outreach project manager in the Su lab and co-author of the new study. In just this year, MyVariant.info has received more than four million hits, while MyGene.info has handled more than 17 million.

A Foundation for Future Applications

The researchers have made these services open source to encourage others to use the data and develop their own applications.

For example, researchers at the University of Washington have built an interface that retrieves data from MyGene.info and contributes additional information to run MyGene2.org, a site that aims to connect patients who share rare genetic diseases. MyGene.info also provides the backbone for BioGPS, a resource for learning about gene and protein function, run by Su, Wu and TSRI programmer Max Nanis.

Another project in the pipeline is an app built on the MyVariant.info platform that displays variants when a user scans a gene name—from a poster at a scientific conference, for example.

"Bioinformatics tools and analyses are highly dependent on having solid foundations of other tools on which to build," said Su. "MyGene.info and MyVariant.info are key pieces of infrastructure that many bioinformaticians are using every day."

Explore further: 2nd security firm raises concerns about Cruz and Kasich apps (Update)

More information: Jiwen Xin et al, High-performance web services for querying gene and variant annotation, Genome Biology (2016). DOI: 10.1186/s13059-016-0953-9

Related Stories

2nd security firm raises concerns about Cruz and Kasich apps (Update)

April 25, 2016
Another computer-security firm raised concerns Monday about the potential for hackers to glean users' personal data from phone apps released by the campaigns of Republican presidential contenders Ted Cruz and John Kasich.

Latest clinical information on Zika virus available at info centers on Elsevier Connect and The Lancet

February 10, 2016
To help healthcare professionals, medical researchers and the public understand the ongoing outbreak of the Zika virus, Elsevier has created a Zika Virus Resource Center on Elsevier Connect, Elsevier's public news and information ...

Recommended for you

Scientists identify key regulator of male fertility

September 19, 2017
When it comes to male reproductive fertility, timing is everything. Now scientists are finding new details on how disruption of this timing may contribute to male infertility or congenital illness.

A piece of the puzzle: Eight autism-related mutations in one gene

September 19, 2017
Scientists have identified a hotspot for autism-related mutations in a single gene.

New assay leads to step toward gene therapy for deaf patients

September 18, 2017
Scientists at Oregon State University have taken an important step toward gene therapy for deaf patients by developing a way to better study a large protein essential for hearing and finding a truncated version of it.

Biologists identify gene involved in kidney-related birth defects

September 18, 2017
A team led by University of Iowa researchers has identified a gene linked to rare, often fatal kidney-related birth defects.

Genomic recycling: Ancestral genes take on new roles

September 18, 2017
One often hears about the multitude of genes we have in common with chimps, birds or other living creatures, but such comparisons are sometimes misleading. The shared percentage usually refers only to genes that encode instructions ...

A new approach to high insulin levels

September 18, 2017
Diabetes is characterised by a deficiency of insulin. Its opposite is a condition called congenital hyperinsulinism—patients produce the hormone too frequently and in excessive quantities, even if they haven't eaten any ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.