Hashtag health: Using Twitter to track the spread of influenza
A social media-monitoring program led by San Diego State University geography professor Ming-Hsiang Tsou could help physicians and health officials learn when and where severe outbreaks are occurring in real time. In results published last month in the Journal of Medical Internet Research, Tsou demonstrated that his technique might allow officials to more quickly and efficiently direct resources to outbreak zones and better contain the spread of the disease.
"There is the potential to use social media to really improve the way we monitor the flu and other public health concerns,"Tsou said.
The Centers for Disease Control and Prevention (CDC) defines flu season as the period from October through May, usually peaking around February. But the unpredictability in exactly when and where outbreaks occur makes it difficult for hospitals and regional health agencies to prepare for where and when to deploy physicians and nurses armed with vaccines and medicines.
There's about a two-week lag in the time between hospitals first noticing an uptick in flu patients and the CDC issuing a regional warning. Tsou and his colleagues, funded by a $1.3 million grant from the National Science Foundation, wanted to find a quicker, more efficient way to identify these patterns.
They selected 11 U.S. cities and monitored tweets originating from within a 17-mile radius of those cities. Whenever people tweeted the keywords "flu" or "influenza," the program would record characteristics about those tweets, including username, location, whether they were original tweets or retweets, and whether they linked to a Web site.
From June 2012 to the beginning of December, the algorithm recorded 161,821 tweets containing the word "flu," 6,174 containing "influenza."
Tsou compared his team's findings to regional data based on the CDC's definition of influenza-like illnesses (ILI). Nine of the 11 cities showed a statistically significant correlation between an increase in the number of tweets mentioning those keywords and regionally reported outbreaks. In five of those cities, Tsou's algorithm picked up on the outbreaks earlier than the regional reports. The cities with the strongest correlations were San Diego, Denver, Jacksonville, Seattle and Fort Worth.
"Traditional procedures take at least two weeks to detect an outbreak," Tsou said. "With our method, we're detecting daily."
Original tweets and tweets without Web site links also proved more predictive than retweets or those that did include links, possibly because original and non-linking tweets are more likely to reflect individuals posting about their own symptoms, Tsou said.
The next step in Tsou's ongoing research will be hunting for even finer-grained correlations between ILI data and specific symptomatic keywords like "cough," "sneeze," "congestion," and "sore throat."
Tsou envisions this kind of "infoveillance" applying to a range of public health, such as monitoring regional incidences of heart attack or diabetes. The project is connected to a larger SDSU initiative, Human Dynamics in the Mobile Age, one of the university's four recently selected Areas of Excellence. Tsou is a core faculty member for the initiative.
"In social media, there's a lot of noise in the data," Tsou said. "But if we can filter that noise out and focus on what's relevant, we can find all kinds of useful connections between real life and cyberspace."