Estimating county health statistics by looking at tweets

March 27, 2014

A researcher at Illinois Institute of Technology (IIT) has found that Twitter knows if you're obese—or at least, if your county is. Tweets can accurately predict a county's rates of obesity, diabetes, teen births, health insurance coverage, and access to health foods, according to Aron Culotta, assistant professor of computer science and director of the Text Analysis in the Public Interest Lab. As a result, Twitter and other social media may complement other data sources for public health officials to identify at-risk communities and offer support. Culotta will report his findings in a paper, "Estimating County Health Statistics with Twitter," to be given at CHI 2014, the ACM (Association for Computing Machinery) CHI Conference on Human Factors in Computing Systems, April 26-May 1 in Toronto.

For each of the 100 most populous counties in the U.S., Culotta collected 27 -related statistics. He also collected more than 1.4 million Twitter user profiles and 4.3 million Tweets over a nine-month span from the same 100 counties. He then performed a statistical analysis to identify how accurately the can be predicted from the Twitter data and which linguistic markers are most predictive of each statistic.

Among other things, Culotta found the Tweets predicted county-level health statistics for 6 of 27 statistics, including obesity, diabetes, teen births, , and access to healthy foods. Models that augmented demographic variables (race, age, gender, income) with linguistic variables (from Twitter) were more accurate than models using demographic variables alone for 20 of the 27 considered. That is, the Twitter data helped to make the traditional models more accurate, suggesting that this new methodology can complement existing approaches. For two statistics—limited access to health foods and prevalence of fast foods—the Twitter model alone was actually more accurate than the demographic variable model.

Analysis of for most health concerns such as influenza focus on detecting specific mentions of a symptom of interest—e.g., "Staying home from work today with a sore throat." But Culotta investigated more nuanced linguistic cues that correlate with the overall health of a population. He identified the linguistic indicators that are most predictive of each outcome. For example, references to religion and certain pronouns ("we", "her") correlate with better socio-emotional support. References to money and inhibition correlate with lower unemployment. References to family and love correlate with higher rates of teen births. For obesity, indicators include what are known as Negative Engagement words (e.g., "tired", "bored", "sleepy"), as well as profanity.

"Twitter activity provides a more fine-grained representation of a community's health than demographics alone," Culotta said. "The reason for this appears to come from the insights Twitter provides into personality, attitudes, and behavior, which in turn correlate health outcomes.

The U.S. Centers for Disease Control and Prevention lead community health data collection and intervention efforts such as the Behavioral Risk Factor Surveillance System to identify vulnerable populations to better target intervention strategies. But such programs take considerable time and often are limited in sample size or geographic specificity. Culotta's research suggests that social media could be a complementary data source to identify at-risk communities.

Culotta said, "While this new methodology requires further experimentation, we believe it can aid public health researchers by providing (1) a more nuanced alternative to demographic profiles for identifying at-risk populations; (2) a low-cost method to measure risk across different subpopulations; (3) a process to help formulate new hypotheses about the relationship between environment, behaviors, and health outcomes, which can then be tested in a more controlled setting."

Explore further: Twitter 'big data' can be used to monitor HIV and drug-related behavior, study shows

Related Stories

Twitter 'big data' can be used to monitor HIV and drug-related behavior, study shows

February 27, 2014
Real-time social media like Twitter could be used to track HIV incidence and drug-related behaviors with the aim of detecting and potentially preventing outbreaks, a new UCLA-led study shows.

Tweets can help track national health trends—and now local ones too

March 21, 2014
(Medical Xpress)—When Twitter recently unveiled a new grant program that will allow outside researchers to mine its stockpile of tweets, the social media site pointed to Johns Hopkins' flu tracking as one example of the ...

Suicidal talk on Twitter mirrors suicide rates

October 10, 2013
(Medical Xpress)—Heart-breaking accounts of cyber bullying and suicide seem all too common, but a new study offers hope that social media can become an early warning system to help prevent such tragedies.

Study finds more tweets means more votes for political candidates

August 11, 2013
An Indiana University study found that the percentage of votes for Republican and Democratic candidates in 2010 and 2012 races for the U.S. House of Representatives could be predicted by the percentage of tweets that mentioned ...

Apple buys analytics firm for $200 mn: report (Update)

December 2, 2013
Apple has acquired social media analytics firm Topsy for more than $200 million, The Wall Street Journal reported Monday.

Recommended for you

To reduce postoperative pain, consider sleep—and caffeine

August 18, 2017
Sleep is essential for good mental and physical health, and chronic insufficient sleep increases the risk for several chronic health problems.

Despite benefits, half of parents against later school start times

August 18, 2017
Leading pediatrics and sleep associations agree: Teens shouldn't start school so early.

Doctors exploring how to prescribe income security

August 18, 2017
Physicians at St. Michael's Hospital are studying how full-time income support workers hired by health-care clinics can help vulnerable patients or those living in poverty improve their finances and their health.

Schoolchildren who use e-cigarettes are more likely to try tobacco

August 17, 2017
Vaping - or the use of e-cigarettes - is widely accepted as a safer option for people who are already smoking.

Federal snack program does not yield expected impacts, researchers find

August 17, 2017
A well-intentioned government regulation designed to offer healthier options in school vending machines has failed to instill better snacking habits in a sample of schools in Appalachian Virginia, according to a study by ...

Study shows cigarette makers shifted stance on nicotine patches, gum

August 17, 2017
The use of nicotine patches, gum, lozenges, inhalers or nasal sprays—together called "nicotine replacement therapy," or NRT—came into play in 1984 as prescription medicine, which when combined with counseling, helped ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.