In an unprecedented windfall for public access to health data, University of Pittsburgh Graduate School of Public Health researchers have collected and digitized all weekly surveillance reports for reportable diseases in the United States going back more than 125 years.
The easily searchable database, described in the Nov. 28 issue of the New England Journal of Medicine, is free and publicly available. Supported by the Bill & Melinda Gates Foundation and the National Institutes of Health (NIH), the project's goal is to aid scientists and public health officials in the eradication of deadly and devastating diseases.
"Using this database, we estimate that more than 100 million cases of serious childhood contagious diseases have been prevented, thanks to the introduction of vaccines," said lead author Willem G. van Panhuis, M.D., Ph.D., assistant professor of epidemiology at Pitt Public Health. "But we also are able to see a resurgence of some of these diseases in the past several decades as people forget how devastating they can be and start refusing vaccines."
Despite the availability of a pertussis vaccine since the 1920s, the largest pertussis epidemic in the U.S. since 1959 occurred last year. Measles, mumps and rubella outbreaks also have reoccurred since the early 1980s.
"Analyzing historical epidemiological data can reveal patterns that help us understand how infectious diseases spread and what interventions have been most effective," said Irene Eckstrand, Ph.D., of NIH, which partially funded the research through its Models of Infectious Disease Agent Study. "This new work shows the value of using computational methods to study historical data—in this case, to show the impact of vaccination in reducing the burden of infectious diseases over the past century."
"We are very excited about the release of the database," said Steven Buchsbaum, deputy director, Discovery and Translational Sciences, for the Bill & Melinda Gates Foundation. "We anticipate this will not only prove to be an invaluable tool permitting researchers around the globe to develop, test and validate epidemiological models, but also has the potential to serve as a model for how other organizations could make similar sets of critical public health data more broadly, publicly available."
The digitized dataset is dubbed Project TychoTM, for 16th century Danish nobleman Tycho Brahe, whose meticulous astronomical observations enabled Johannes Kepler to derive the laws of planetary motion.
"Tycho Brahe's data were essential to Kepler's discovery of the laws of planetary motion," said senior author Donald S. Burke, M.D., Pitt Public Health dean and UPMC-Jonas Salk Chair of Global Health. "Similarly, we hope that our Project Tycho disease database will help spur new, life-saving research on patterns of epidemic infectious disease and the effects of vaccines. Open access to disease surveillance records should be standard practice, and we are working to establish this as the norm worldwide."
The researchers selected eight vaccine-preventable contagious diseases for a more detailed analysis: smallpox, polio, measles, rubella, mumps, hepatitis A, diphtheria and pertussis. By overlaying the reported outbreaks with the year of vaccine licensure, the researchers are able to give a clear, visual representation of the effect that vaccines have in controlling communicable diseases.
"Infectious disease research is critically dependent on reliable historical data to understand underlying epidemic dynamics. However, my colleagues and I repeatedly find ourselves digging out historical datasets from various sources in different states of preservation," said Dr. van Panhuis. "By digitizing and giving open access to the entire collection of U.S. notifiable disease data, we've made a bold move toward solving this problem."
The researchers obtained all weekly notifiable disease surveillance tables published between 1888 and 2013—approximately 6,500 tables—in various historical reports, including the U.S. Centers for Disease Control and Prevention's Morbidity and Mortality Weekly Report. These tables were available only in paper format or as PDF scans in online repositories that could not be read by computers and had to be hand-entered. With an estimated 200 million keystrokes, the data—including death counts, reporting locations, time periods and diseases—were digitized. A total of 56 diseases were reported for at least some period of time during the 125-year time span, with no single disease reported continuously.
"This work by the Tycho Team is remarkable and represents the next step in making government data accessible and useful," said Bryan Sivak, U.S. Department of Health and Human Services chief technology officer and entrepreneur in residence. "This is a great example of how our policies on open data and public access accelerate the use of computer-readable data by researchers and application developers to create new tools and provide valuable insights into the nation's health."
All these data now can be explored and retrieved by everyone on the Project Tycho Web site http://www.tycho.pitt.edu. The open access release of these data has ignited a collaboration with the United States Open Government Initiative and, in the near future, the Project Tycho database will be available on the HealthData.gov Web pages.
"Historical records are a precious yet undervalued resource. As Danish philosopher Soren Kierkegaard said, we live forward but understand backward," explained Dr. Burke. "By 'rescuing' these historical disease data and combining them into a single, open-access, computable system, we now can better understand the devastating impact of epidemic diseases, and the remarkable value of vaccines in preventing illness and death."