April 3, 2024

Good evidence confuses ChatGPT when used for health information, study finds

A world-first study has found that when asked a health-related question, the more evidence that is given to ChatGPT, the less reliable it becomes—reducing the accuracy of its responses to as low as 28%.

The study was recently presented at Empirical Methods in Natural Language Processing (EMNLP), a Natural Language Processing conference in the field. The findings are published in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.

As large language models (LLMs) like ChatGPT explode in popularity, they pose a potential risk to the growing number of people using online tools for key health information.

Scientists from CSIRO, Australia's national science agency, and The University of Queensland (UQ) explored a hypothetical scenario of an average person (non-professional health consumer) asking ChatGPT if "X" treatment has a positive effect on condition "Y."

The 100 questions presented ranged from "Can zinc help treat the common cold?" to "Will drinking vinegar dissolve a stuck fish bone?"

ChatGPT's response was compared to the known correct response, or "ground truth," based on existing medical knowledge.

CSIRO Principal Research Scientist and Associate Professor at UQ Dr. Bevan Koopman said that even though the risks of searching for health information online are well documented, people continue to seek health information online, and increasingly via tools such as ChatGPT.

"The widespread popularity of using LLMs online for answers on people's health is why we need continued research to inform the public about risks and to help them optimize the accuracy of their answers," Dr. Koopman said. "While LLMs have the potential to greatly improve the way people access information, we need more research to understand where they are effective and where they are not."

The study looked at two question formats. The first was a question only. The second was a question biased with supporting or contrary evidence.

Results revealed that ChatGPT was quite good at giving accurate answers in a question-only format, with an 80% accuracy in this scenario.

However, when the language model was given an evidence-biased prompt, accuracy reduced to 63%. Accuracy was reduced again to 28% when an "unsure" answer was allowed. This finding is contrary to popular belief that prompting with evidence improves accuracy.

"We're not sure why this happens. But given this occurs whether the evidence given is correct or not, perhaps the evidence adds too much noise, thus lowering accuracy," Dr. Koopman said.

ChatGPT launched on November 30, 2022, and has quickly become one of the most widely used large language models (LLMs). LLMs are a form of artificial intelligence that recognize, translate, summarize, predict, and generate text.

Study co-author UQ Professor Guido Zuccon, Director of AI for the Queensland Digital Health Centre (QDHeC), said that major search engines are now integrating LLMs and search technologies in a process called Retrieval Augmented Generation.

"We demonstrate that the interaction between the LLM and the search component is still poorly understood and controllable, resulting in the generation of inaccurate health information," said Professor Zuccon.

Next steps for the research are to investigate how the public uses the health information generated by LLMs.

More information: Bevan Koopman et al, Dr ChatGPT tell me what I want to hear: How different prompts impact health answer correctness, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023). DOI: 10.18653/v1/2023.emnlp-main.928

Provided by CSIRO

Citation: Good evidence confuses ChatGPT when used for health information, study finds (2024, April 3) retrieved 11 July 2024 from https://medicalxpress.com/news/2024-04-good-evidence-chatgpt-health.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Trust your doctor: Study shows human medical professionals are more reliable than artificial intelligence tools

0 shares

Feedback to editors

Coordinated activity of mossy cells contributes to encoding of spatial and contextual memories, study finds

47 minutes ago

Blood fat profiles confirm health benefits of replacing butter with high-quality plant oils

2 hours ago

Major trial looks at most effective speech therapy for people with Parkinson's disease

13 hours ago

Models show promise in predicting cognitive decline in early Alzheimer's

15 hours ago

New material derived from graphene improves the performance of neuroprostheses

17 hours ago

Discovery could help with early detection of vision loss in age-related macular degeneration

17 hours ago

New Co-STAR T cells show promise for treating cancers in laboratory study

17 hours ago

Microproteins exclusively produced in liver tumors could lead to cancer vaccines

17 hours ago

Scientists demonstrate a combination treatment can increase human insulin-producing cells in vivo

17 hours ago

Cognitive skills in early toddlerhood: Study demonstrates importance of 16-months

17 hours ago

Load comments (0)

Good evidence confuses ChatGPT when used for health information, study finds

Coordinated activity of mossy cells contributes to encoding of spatial and contextual memories, study finds

Blood fat profiles confirm health benefits of replacing butter with high-quality plant oils

Major trial looks at most effective speech therapy for people with Parkinson's disease

Models show promise in predicting cognitive decline in early Alzheimer's

New material derived from graphene improves the performance of neuroprostheses

Discovery could help with early detection of vision loss in age-related macular degeneration

New Co-STAR T cells show promise for treating cancers in laboratory study

Microproteins exclusively produced in liver tumors could lead to cancer vaccines

Scientists demonstrate a combination treatment can increase human insulin-producing cells in vivo

Cognitive skills in early toddlerhood: Study demonstrates importance of 16-months

Trust your doctor: Study shows human medical professionals are more reliable than artificial intelligence tools

DeepMind develops SAFE, an AI-based app that can fact-check LLMs

Large language models in health: Useful, but not a miracle cure

Microsoft's small language model outperforms larger models on standardized math tests

AI researchers expose critical vulnerabilities within major large language models

New study shows LLMs respond differently based on user's motivation

Blood fat profiles confirm health benefits of replacing butter with high-quality plant oils

Feedback plus cash incentives reduce phone use while driving, researchers discover

Researchers hone ChatGPT, creating AI tools for digital pathology

New period product offers progress in women's health

AI model can predict continuous renal replacement therapy survival

Maintaining prediabetic status after diagnosis results in better long-term health, study finds

Phys.org

Tech Xplore

Science X

Good evidence confuses ChatGPT when used for health information, study finds

Coordinated activity of mossy cells contributes to encoding of spatial and contextual memories, study finds

Blood fat profiles confirm health benefits of replacing butter with high-quality plant oils

Major trial looks at most effective speech therapy for people with Parkinson's disease

Models show promise in predicting cognitive decline in early Alzheimer's

New material derived from graphene improves the performance of neuroprostheses

Discovery could help with early detection of vision loss in age-related macular degeneration

New Co-STAR T cells show promise for treating cancers in laboratory study

Microproteins exclusively produced in liver tumors could lead to cancer vaccines

Scientists demonstrate a combination treatment can increase human insulin-producing cells in vivo

Cognitive skills in early toddlerhood: Study demonstrates importance of 16-months

Related Stories

Trust your doctor: Study shows human medical professionals are more reliable than artificial intelligence tools

DeepMind develops SAFE, an AI-based app that can fact-check LLMs

Large language models in health: Useful, but not a miracle cure

Microsoft's small language model outperforms larger models on standardized math tests

AI researchers expose critical vulnerabilities within major large language models

New study shows LLMs respond differently based on user's motivation

Recommended for you

Blood fat profiles confirm health benefits of replacing butter with high-quality plant oils

Feedback plus cash incentives reduce phone use while driving, researchers discover

Researchers hone ChatGPT, creating AI tools for digital pathology

New period product offers progress in women's health

AI model can predict continuous renal replacement therapy survival

Maintaining prediabetic status after diagnosis results in better long-term health, study finds

Newsletter sign up

Donate and enjoy an ad-free experience