Finding real value in big data for public health

July 2, 2014, San Diego State University
For example, during the 2012/2013 season, GFT predicted that 10.6% of the population had influenza like illness when only 6.1% did according to patient records. The team's alternative significantly reduced the error in that prediction, estimating that 7.7% of people would have the flu. And within two weeks the model self-updated, considerably changing the weight given to certain queries that spiked during that time, improving the model for future performance. Credit: SDSU

Media reports of public health breakthroughs made possible by big data have been largely oversold, according to a new study, published today in the American Journal of Preventive Medicine.

"Many studies deserve praise for being the first of their kind, but if we actually began relying on the claims made by big data in public health, we would come to some peculiar conclusions," said John W. Ayers, San Diego State University Graduate School of Public Health research professor and senior author of the study. "Some of these conclusions may even pose serious public health harm."

But don't throw away that data just yet.

The authors maintain that the promise of big data can be fulfilled by tweaking existing methodological and reporting standards. In the study, the Ayers and his colleagues demonstrate this by revising the inner plumbing of the Google Flu Trends (GFT) digital disease surveillance system, which was heavily criticized last year (see here and here) after producing erroneous forecasts.

"Assuming you can't use big data to improve public health is simply wrong," added Ayers. "Existing shortcomings are a result of methodologies, not the data themselves."

A solution for Google Flu Trends

In the first external revision proposed to GFT, Ayers and co-researchers David Zhang, Maurcio Santiliana (both with Harvard University), and Benjamin Althouse (with the Santa Fe institute) explored new methods for using open-sourced, publicly available Google search archives to forecast influenza, an approach that can serve as a blueprint to fix broader shortcomings in .

To address GFT's problems, the team significantly beefed up the existing GFT model. First, rather than relying on a single trend that represents a group of influenza search queries, they monitored changes in individual search queries, giving various algorithmic weight to some queries over others based on how they potentially improved predictions compared to patient data collected by health agencies.

Second, instead of relying on investigator opinion for periodic updates to the model, the team built in automatic updating that adjusts the weight given to any single query in the model each and every week based on artificial intelligence techniques to maximize predictive accuracy.

During the 2009 H1N1 pandemic and 2012/13 season—two critically important periods of influenza surveillance in the United States—the alternative method yielded more accurate influenza predictions than GFT every week, and was typically more accurate than GFT during other influenza seasons.

"With these tweaks, GFT could live up to the high expectations it originally aspired to," Ayers said. "Still, the greatest strength of our model is how the queries being used to describe influenza trends are changing over time as search patterns change in the population or the model occasional underperforms due to false-positive queries."

For example, during the 2012/2013 season, GFT predicted that 10.6% of the population had influenza like illness when only 6.1% did according to patient records. The team's alternative significantly reduced the error in that prediction, estimating that 7.7% of people would have the flu. And within two weeks the model self-updated, considerably changing the weight given to certain queries that spiked during that time, improving the model for future performance.

What's next for big data

"Big data is no substitute for good methods, and consumers need to better discern good from bad methods," Ayers said. To achieve these ends, he and his colleagues added that digital disease surveillance researchers need greater transparency in the reporting of studies and better methods when using big data in .

"When dealing with big data methods, it is extremely important to make sure they are transparent and free," co-author Althouse added. "Reproducibility and validation are keystones of the scientific method, and they should be at the center of the big data revolution."

Importantly, these criticisms shouldn't be taken as an indictment of the promise of big data, or of the early attempts to wrangle it into something beneficial for the public, Ayers said. Now that the initial hype is wearing off, researchers can begin seriously exploring and testing the strengths and limitations of existing models and sharpening their methodologies.

"We certainly don't want any single entity or investigator, let alone Google—who has been at the forefront of developing and maintaining these systems—to feel like they are unfairly the targets of our criticism," Ayers said. "It's going to take the entire community recognizing and rectifying existing shortcomings. When we do, will certainly yield big impacts."

Explore further: When big isn't better: How the flu bug bit Google

Related Stories

When big isn't better: How the flu bug bit Google

March 13, 2014
Numbers and data can be critical tools in bringing complex issues into crisp focus. The understanding of diseases, for example, benefits from algorithms that help monitor their spread. But without context, a number may just ...

Tracking flu levels with Wikipedia

April 17, 2014
Can monitoring Wikipedia hits show how many people have the flu? Researchers at Boston Children's Hospital, USA, have developed a method of estimating levels of influenza-like illness in the American population by analysing ...

By studying Google search data, researchers discovered Americans had more health concerns during the recession

January 7, 2014
We ring in the New Year with hopes of being healthy, wealthy, and wise. A new study led by San Diego State University School of Public Health research professor John W. Ayers suggests that from a public health standpoint, ...

Workshop gives web-based disease tracking a checkup

May 20, 2014
Web-based disease trackers like Google Flu Trends are supposed to revolutionize public health response to outbreaks, but how well do they actually work, and can they be made to work better? SFI Omidyar Fellows and epidemiologists ...

First real-time flu forecast successful

December 3, 2013
Scientists were able to reliably predict the timing of the 2012-2013 influenza season up to nine weeks in advance of its peak. The first large-scale demonstration of the flu forecasting system by scientists at Columbia University's ...

Recommended for you

Researchers illustrate how muscle growth inhibitor is activated, could aid in treating ALS

January 19, 2018
Researchers at the University of Cincinnati (UC) College of Medicine are part of an international team that has identified how the inactive or latent form of GDF8, a signaling protein also known as myostatin responsible for ...

Bioengineered soft microfibers improve T-cell production

January 18, 2018
T cells play a key role in the body's immune response against pathogens. As a new class of therapeutic approaches, T cells are being harnessed to fight cancer, promising more precise, longer-lasting mitigation than traditional, ...

Weight flux alters molecular profile, study finds

January 17, 2018
The human body undergoes dramatic changes during even short periods of weight gain and loss, according to a study led by researchers at the Stanford University School of Medicine.

Secrets of longevity protein revealed in new study

January 17, 2018
Named after the Greek goddess who spun the thread of life, Klotho proteins play an important role in the regulation of longevity and metabolism. In a recent Yale-led study, researchers revealed the three-dimensional structure ...

The HLF gene protects blood stem cells by maintaining them in a resting state

January 17, 2018
The HLF gene is necessary for maintaining blood stem cells in a resting state, which is crucial for ensuring normal blood production. This has been shown by a new research study from Lund University in Sweden published in ...

Magnetically applied MicroRNAs could one day help relieve constipation

January 17, 2018
Constipation is an underestimated and debilitating medical issue related to the opioid epidemic. As a growing concern, researchers look to new tools to help patients with this side effect of opioid use and aging.

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.