Technology researcher discusses tracking disease outbreaks via social media
Today, when there is an outbreak of disease, the first reports of it are likely to be online, through Facebook or Twitter. And as word in cyberspace goes viral, it can map closely to the spread of the actual virus in the physical world. That's the conclusion of NYU researcher Rumi Chunara (BS '04), whose paper analyzing Twitter and other online activity surrounding the 2010 outbreak of cholera in Haiti made waves in the public health world. So much so that in 2014 she was named to MIT Technology Review's "35 Innovators Under 35" list for her work in digital disease detection. Ben Tomlin from Caltech's Alumni Association spoke with Chunara about her research and the emerging area of crowdsourced health data.
What is the focus of your work?
The goal of my research is to try and understand how infectious disease spreads in populations. Traditional health systems are really the gold standard for collecting and analyzing information on viral outbreaks, but information can travel slowly. With the proliferation of mobile Internet-based systems, we can crowdsource real-time information to offer clinicians and the public an enhanced picture of the path and progress of an outbreak.
Your study of the cholera outbreak in Haiti was one of the first to compare online activity with the movement of disease. What did you find?
In January of 2010, Haiti suffered a catastrophic earthquake. Nine months later, the Haitian Ministry of Public Health and Population (MSPP) announced an outbreak of cholera, which eventually affected nearly half a million people.
While I was at my previous post with HealthMap (a real-time disease surveillance website) and the Children's Hospital Informatics Program at Harvard Medical School, my fellow researchers and I reviewed the first 100 days of the outbreak. We took data from HealthMap and Twitter and compared it to Haiti's official reports, which were published daily. We then developed epidemiological models to try and get a sense of the rate at which things were changing.
Overall, we found strong correlation between the Twitter data and official reports. Moreover, we also found points when the Twitter data could be used to estimate the trajectory of the outbreak. That's particularly useful because data from social media is publicly available in real time. Since we published that study, we have continued to improve on these efforts, as have a number of groups.
What are the challenges to this type of research?
First, we have to keep in mind that this data is not a replacement for traditional methods of disease tracking but meant to augment them. We also have to be careful to isolate the relevant data. In 2013, Google Flu Trends, a terrific service that monitors web searches to predict outbreaks of the flu, overestimated the number of cases in the U.S. by a good margin. It was a good cautionary example. Online searches are just a proxy for what might be happening.
Is a person searching for themselves or someone in their household? Are they really sick? If they are sick, do they really have the flu? Are they influenced by the media or other social networks? These questions led me to launch GoViral last year, a research program that combines online data from mobile apps with home diagnostic kits. This is the first time that we have actually crowdsourced to get diagnostic samples from people. We're able to get a clearer picture of a person's online activity correlated with their actual health.
How do you imagine this kind of research being used?
The main advantages are in speed of access to medical information. That's important, because in our era of increased mobility, diseases can spread globally more quickly. First responders could leverage information to target emerging situations or create public health infrastructure.
Wearable technology and mobile apps are starting to collect more data on health and fitness. Will this affect your research?
Absolutely. There has been a lot of discussion surrounding the development and implications of those products. This field of research will continue to grow as we acquire new capabilities to collect information. At the same time, there are significant issues to be thought out regarding privacy and security, which will take some time. We are seeing a faster pace of information sharing between the tech industry and academic research, overall. Hopefully it will offer more opportunities for scientists to examine data and then see results from their research deployed.
You are advocating for more crowdsourcing of public health data. Why?
I think that we can do more than just monitor data from social media traffic. We can also ask the public directly for their help in collecting health information. Our early work with GoViral shows that they can be willing and effective partners. By actively crowdsourcing, we can collect information at the point of care. We can also learn about other things, like contact patterns and social interactions that affect disease dynamics.
And there is another advantage to this approach: while conducting research, we can engage and educate individuals to become more proactive in their own health—which is ultimately the best way to curb the spread of disease.