Credit: Pixabay/CC0 Public Domain

The COVID-19 pandemic has highlighted both the necessity and the difficulty of using clinical data to inform state and national public health policymaking. In a new study, Regenstrief Institute and Indiana University researchers demonstrate that machine learning models trained using clinical data from a statewide health information exchange can predict, on a patient level, the likelihood of hospitalization of individuals with the virus.

"It has been quite challenging to bring the bread-and-butter data generated by healthcare systems together with decision-making—entities which have long been separate and distinct," said study senior author Shaun Grannis, M.D., M.S., Regenstrief Institute vice president for data and analytics and professor of family at Indiana University School of Medicine. "Our work shows how you can build and employ AI (artificial intelligence) models to securely utilize the clinical information in a health information exchange to support public health needs such as predicting hospital utilization within one week and within six weeks of onset of COVID infection.

"When new circumstances requiring rapid response arise, such as emergence of omicron or other new variants, once there are sufficient cases to train models, one can confidently access and plug into these readily available models to make accurate public health predictions and provide valuable insights into patient-level need for healthcare resource utilization," said Dr. Grannis.

The researchers used clinical data from 96,026 individuals from all 957 zip codes in Indiana to train decision models that predicted healthcare resource utilization.

"Since the onset of COVID-19, researchers, healthcare systems, public health departments and others have leveraged existing data repositories and health information infrastructure for rapid analytics," said study first author Suranga Kasturi, Ph.D., a Regenstrief Institute research scientist and an assistant professor of pediatrics at IU School of Medicine. "Machine learning has been invaluable in these efforts."

"But any model is only as good as the data that goes into it," he added. "The broad, robust data from the Indiana Network for Patient Care is representative of the U.S. population. What we have done could be characterized as a precursor of how AI tools can be deployed across the entire country with the important caveat that whatever models are used should be evaluated for fairness across all subpopulations."

The Indiana Network for Patient Care (INPC), a regional information exchange developed by Regenstrief Institute and managed by the Indiana Health Information Exchange (IHIE), is the nation's largest inter-organizational clinical data repository and houses more than 14 billion pieces of patient data.

The research is published in the Journal of Medical Internet Research.

More information: Suranga N Kasturi et al, Predicting COVID-19–Related Health Care Resource Utilization Across a Statewide Patient Population: Model Development Study, Journal of Medical Internet Research (2021). DOI: 10.2196/31337

Journal information: Journal of Medical Internet Research