Forecasting the flu better—through combo of 'big' and traditional data
Three UC San Diego researchers say they can predict the spread of flu a week into the future with as much accuracy as Google Flu Trends can display levels of infection right now.
The study - appearing in Scientific Reports, an online journal from the publishers of Nature - uses social network analysis and combines the power of Google Flu Trends' "big data" with traditional flu monitoring data from the U.S. Centers for Disease Control and Prevention (CDC).
"Our innovation," said corresponding author Michael Davidson, a doctoral student in political science at UC San Diego, "is to construct a network of ties between different U.S. health regions based on information from the CDC. We asked: Which places in years past got the flu at about the same time? That told us which regions of the country have the strongest ties, or connections, and gave us the analytic power to improve Google's predictions."
Google Flu Trends (GFT) is very good, Davidson said, at showing where in the U.S. people are searching for information on flu and flu-like symptoms. And these data are valuable because they come in real time, he said, about two weeks ahead of when the CDC can issue its reports. But GFT has also made some infamous errors - errors that probably reflect widespread public concerns about flu more than actual confirmed illness.
By weighting GFT predictions with a social network derived from CDC reports on laboratory-tested cases of flu, the researchers were able to refine and improve GFT's predictions.
The researchers are optimistic their work will soon be put to public use. "We hope our method will be implemented by epidemiologists and data scientists," Davidson said, "to better target prevention and treatment efforts, especially during epidemics."