October 7, 2022

Study finds the risks of sharing health care data are low

by Anne Trafton, Massachusetts Institute of Technology

In recent years, scientists have made great strides in their ability to develop artificial intelligence algorithms that can analyze patient data and come up with new ways to diagnose disease or predict which treatments work best for different patients.

The success of those algorithms depends on access to patient health data, which has been stripped of personal information that could be used to identify individuals from the dataset. However, the possibility that individuals could be identified through other means has raised concerns among privacy advocates.

In a new study, a team of researchers led by MIT Principal Research Scientist Leo Anthony Celi has quantified the potential risk of this kind of patient re-identification and found that it is currently extremely low relative to the risk of data breach. In fact, between 2016 and 2021, the period examined in the study, there were no reports of patient re-identification through publicly available health data.

The findings suggest that the potential risk to patient privacy is greatly outweighed by the gains for patients, who benefit from better diagnosis and treatment, says Celi. He hopes that in the near future, these datasets will become more widely available and include a more diverse group of patients.

"We agree that there is some risk to patient privacy, but there is also a risk of not sharing data," he says. "There is harm when data is not shared, and that needs to be factored into the equation."

Celi, who is also an instructor at the Harvard T.H. Chan School of Public Health and an attending physician with the Division of Pulmonary, Critical Care and Sleep Medicine at the Beth Israel Deaconess Medical Center, is the senior author of the new study. Kenneth Seastedt, a thoracic surgery fellow at Beth Israel Deaconess Medical Center, is the lead author of the paper, which appears today in PLOS Digital Health.

Risk-benefit analysis

Large health record databases created by hospitals and other institutions contain a wealth of information on diseases such as heart disease, cancer, macular degeneration, and COVID-19, which researchers use to try to discover new ways to diagnose and treat disease.

Celi and others at MIT's Laboratory for Computational Physiology have created several publicly available databases, including the Medical Information Mart for Intensive Care (MIMIC), which they recently used to develop algorithms that can help doctors make better medical decisions. Many other research groups have also used the data, and others have created similar databases in countries around the world.

Typically, when patient data is entered into this kind of database, certain types of identifying information are removed, including patients' names, addresses, and phone numbers. This is intended to prevent patients from being re-identified and having information about their medical conditions made public.

However, concerns about privacy have slowed the development of more publicly available databases with this kind of information, Celi says. In the new study, he and his colleagues set out to ask what the actual risk of patient re-identification is. First, they searched PubMed, a database of scientific papers, for any reports of patient re-identification from publicly available health data, but found none.

To expand the search, the researchers then examined media reports from September 2016 to September 2021, using Media Cloud, an open-source global news database and analysis tool. In a search of more than 10,000 U.S. media publications during that time, they did not find a single instance of patient re-identification from publicly available health data.

In contrast, they found that during the same time period, health records of nearly 100 million people were stolen through data breaches of information that was supposed to be securely stored.

"Of course, it's good to be concerned about patient privacy and the risk of re-identification, but that risk, although it's not zero, is minuscule compared to the issue of cyber security," Celi says.

Better representation

More widespread sharing of de-identified health data is necessary, Celi says, to help expand the representation of minority groups in the United States, who have traditionally been underrepresented in medical studies. He is also working to encourage the development of more such databases in low- and middle-income countries.

"We cannot move forward with AI unless we address the biases that lurk in our datasets," he says. "When we have this debate over privacy, no one hears the voice of the people who are not represented. People are deciding for them that their data need to be protected and should not be shared. But they are the ones whose health is at stake; they're the ones who would most likely benefit from data-sharing."

Instead of asking for patient consent to share data, which he says may exacerbate the exclusion of many people who are now underrepresented in publicly available health data, Celi recommends enhancing the existing safeguards that are in place to protect such datasets. One new strategy that he and his colleagues have begun using is to share the data in a way that it can't be downloaded, and all queries run on it can be monitored by the administrators of the database. This allows them to flag any user inquiry that seems like it might not be for legitimate research purposes, Celi says.

"What we are advocating for is performing data analysis in a very secure environment so that we weed out any nefarious players trying to use the data for some other reasons apart from improving population health," he says. "We're not saying that we should disregard patient privacy. What we're saying is that we have to also balance that with the value of data sharing."

More information: Kenneth P. Seastedt et al, Global healthcare fairness: We should be sharing more, not less, data, PLOS Digital Health (2022). DOI: 10.1371/journal.pdig.0000102

Journal information: PLOS Digital Health

Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation: Study finds the risks of sharing health care data are low (2022, October 7) retrieved 23 June 2024 from https://medicalxpress.com/news/2022-10-health.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

9 in 10 Americans want their health info kept private

49 shares

Feedback to editors

Study identifies first drug therapy for sleep apnea

Jun 21, 2024

Research suggests potential targets for prevention and early identification of psychotic disorders

Jun 21, 2024

C. elegans study finds mRNA balance in cells influences lifespan

Jun 21, 2024

Mapping the heart to prevent damage caused by a heart attack

Jun 21, 2024

Study reveals evolution of human cold and menthol sensing protein, offers hope for future non-addictive pain therapies

Jun 21, 2024

Study uncovers hidden DNA mechanisms of rare genetic diseases

Jun 21, 2024

Popular diabetes drugs may reduce the risk of dementia

Jun 21, 2024

Using digital technology and data to sustain intermittent fasting and improve health outcomes: One man's journey

Jun 21, 2024

Study finds connection between cannabis use and increased risk of severe COVID-19

Jun 21, 2024

Activating a molecular target reverses multiple hallmarks of aging, new study demonstrates

Jun 21, 2024

Load comments (0)

Study finds the risks of sharing health care data are low

Risk-benefit analysis

Better representation

Study identifies first drug therapy for sleep apnea

Research suggests potential targets for prevention and early identification of psychotic disorders

C. elegans study finds mRNA balance in cells influences lifespan

Mapping the heart to prevent damage caused by a heart attack

Study reveals evolution of human cold and menthol sensing protein, offers hope for future non-addictive pain therapies

Study uncovers hidden DNA mechanisms of rare genetic diseases

Popular diabetes drugs may reduce the risk of dementia

Using digital technology and data to sustain intermittent fasting and improve health outcomes: One man's journey

Study finds connection between cannabis use and increased risk of severe COVID-19

Activating a molecular target reverses multiple hallmarks of aging, new study demonstrates

9 in 10 Americans want their health info kept private

Computational tool uses DNA-encoded approach to integrate and analyze different health databases

'Digital mask' could protect patients' privacy in medical records

Research reveals how common online health marketing practices may violate patient privacy

The best way to protect personal biomedical data from hackers could be to treat the problem like a game

Two methods to de-identify large patient datasets greatly reduced risk of re-identification

Study suggests AI may soon be able to detect cancer

Customizable AI tool helps pathologists identify diseased cells

Study uses powerful new 'digital cohort' method to understand vaping epidemic

Study shows AI can predict anxiety levels with picture tasks

Blood test could predict Parkinson's seven years before symptoms

Molecular biomarkers for transplant medicine—AI-driven insights predict organ transplant success

Phys.org

Tech Xplore

Science X

Study finds the risks of sharing health care data are low

Risk-benefit analysis

Better representation

Study identifies first drug therapy for sleep apnea

Research suggests potential targets for prevention and early identification of psychotic disorders

C. elegans study finds mRNA balance in cells influences lifespan

Mapping the heart to prevent damage caused by a heart attack

Study reveals evolution of human cold and menthol sensing protein, offers hope for future non-addictive pain therapies

Study uncovers hidden DNA mechanisms of rare genetic diseases

Popular diabetes drugs may reduce the risk of dementia

Using digital technology and data to sustain intermittent fasting and improve health outcomes: One man's journey

Study finds connection between cannabis use and increased risk of severe COVID-19

Activating a molecular target reverses multiple hallmarks of aging, new study demonstrates

Related Stories

9 in 10 Americans want their health info kept private

Computational tool uses DNA-encoded approach to integrate and analyze different health databases

'Digital mask' could protect patients' privacy in medical records

Research reveals how common online health marketing practices may violate patient privacy

The best way to protect personal biomedical data from hackers could be to treat the problem like a game

Two methods to de-identify large patient datasets greatly reduced risk of re-identification

Recommended for you

Study suggests AI may soon be able to detect cancer

Customizable AI tool helps pathologists identify diseased cells

Study uses powerful new 'digital cohort' method to understand vaping epidemic

Study shows AI can predict anxiety levels with picture tasks

Blood test could predict Parkinson's seven years before symptoms

Molecular biomarkers for transplant medicine—AI-driven insights predict organ transplant success

Newsletter sign up

Donate and enjoy an ad-free experience