Big data analysis shows health care professionals at risk treating Ebola
The Ebola crisis is disturbing and alarming in many ways. Among them: The fact that the U.S. response to date hasn't fully utilized the statistical and big data tools that could play a vital role in both protecting health workers from exposure and stemming broader spread of the virus in the United States and elsewhere.
Exhibit A: The Centers for Disease Control and Prevention initially telegraphed complete confidence that its protocols could prevent any domestic transmission, including to health workers. On Sept. 30, just after Thomas Eric Duncan—who had recently traveled to Texas from Liberia—was diagnosed with Ebola, CDC head Thomas Frieden said in a news conference that "we're stopping it in its tracks in this country."
Less than two weeks later, on Oct. 12, the CDC was backpedaling. A nurse who had provided care for Duncan—and who had reportedly followed "full CDC precautions," including "gown, glove, mask and shield"—was diagnosed with Ebola. At first Frieden implicitly blamed the nurse, telling an interviewer that, "clearly there was a breach in protocol." A day later, he apologized, acknowledging that "we have to rethink the way we address Ebola infection control." Then, on Oct. 15, the Texas Department of State Health Services announced that "A second health care worker at Texas Health Presbyterian Hospital who provided care for the first Ebola patient diagnosed in the United States has tested positive for the disease."
Despite the CDC's apparent belief to the contrary, basic statistics make clear that the very real possibility of transmission to health workers was entirely foreseeable. Why? Because the CDC's infection control protocol appears to have been designed without fully recognizing how the laws of probability operate in the combined presence of 1) an extremely contagious virus, and 2) large numbers of contacts between health workers and Ebola patients.
The gloves, mask, and other gear used for infection control are undoubtedly very protective. But when used in the real world, as opposed to in the laboratory, they cannot possibly be completely protective—a fact that should have been suspected earlier, and has been proven now in transmissions to health care workers in both Texas and in Madrid. Each time a health worker caring for an Ebola patient wears and then removes protective gear, there is some very small probability of an exposure. And, over many repetitions, that probability gets amplified. As I explained in an Oct. 12 article in Forbes:
If you do something once that has a very low probability of a very negative consequence, your risks of harm are low. But if you repeat that activity many times, the laws of probability—or more specifically, a formula called the "binomial distribution"—will eventually catch up with you.
For example, consider an activity that, each time you do it, has a 1 percent chance of exposing you to a highly dangerous chemical. If you do it once, you have a 1 percent chance of exposure. If you do it twice, your chances of at least one exposure are slightly under 2 percent. After 20 times, you have an 18 percent chance of at least one exposure, and after 69 times the exposure probability crosses above 50 percent. After 250 times, the odds of exposure are about 92 percent. And the exposure odds top 99 percent after about 460 times.
In other words, something that has a very low probability after one repetition can become far more likely if it is done enough times by enough people (or by one person alone). These are the kinds of nonintuitive insights available from looking at data through the right lens.
In developing protocols to protect the health workers on the front lines in the fight against Ebola, statistical methods—and more broadly, the big data those methods can be so vital in analyzing—shouldn't be an afterthought. Instead, they should be a core component of the strategy to help us understand how health workers can be better protected and how the spread of Ebola can be slowed and hopefully stopped in the general population.
Concretely, what does this mean? Big data involves collecting and analyzing large volumes of diverse information to tease out significant patterns, correlations and trends that might otherwise escape notice. Statistical methods provide the tools for performing that analysis. In the context of the Ebola crisis, here are some suggestions to help bring the power of big data to protecting health workers:
We should collect and make use of the wealth of data that can be obtained about interactions between health workers and Ebola patients. In the United States, each contact should be documented in full detail: What did the health worker do? For how long was he or she in contact with the patient? What specific protective gear was he or she wearing? What type of ventilation system is used in the room? What was the condition of the patient? Was the health worker alone in the room with the patient or were there also other health workers in the room? A mechanism to gather, suitably anonymize (for the privacy of both patients and health workers), report, and exchange these data should be developed.
In addition, we shouldn't be using health workers as guinea pigs in a potentially deadly trial-and-error process to learn about Ebola transmission. According to a survey of U.S. nurses recently conducted by National Nurses United, 85 percent of respondents reported that they have not received Ebola training that allows them to "interact and ask questions." On a conference call Oct. 14 organized by NNU, some of the nurses at the hospital that treated Duncan said they were "unsupported and unprepared" and that protocols changed repeatedly.
Health officials should develop a protocol for simulations that health workers can opt to conduct in their own workplaces, so they can practice the process of donning, using, and removing protective gear and interacting with model "patients" shedding harmless imitations of the virus. The components of the imitation substance should be chosen so that it can be detected in the tiniest quantities, making it easy to identify (simulated) exposure events. Simulations would give health workers vital and possibly lifesaving practice in a setting that provides them with actionable feedback, as well as generating reams of data about the weak links in the current infection control protocol. The lessons learned would apply not only to Ebola but also to other pathogens that are certain to surface in the future.
Big data and statistics alone aren't going to keep health workers safe from Ebola. But they can certainly help. If we are going to ask health workers to repeatedly step into rooms with patients contagious with a virus that now appears to have a fatality rate of about 70 percent, we have the obligation to do everything possible to minimize the chances that they might be exposed. And today, we're not doing nearly enough.