August 1, 2024 report

ChatGPT still not very good at diagnosing human ailments

by Bob Yirka , Medical Xpress

A team of medical researchers at Western University's Schulich School of Medicine and Dentistry has found that despite being trained on terabytes of data, the LLM ChatGPT is still not good at diagnosing human ailments. In their study, published on the open-access site PLOS ONE, the group trained the popular LLM on 150 case studies and prompted it to provide a diagnosis.

Prior research and anecdotal evidence have shown that LLMs such as ChatGPT can provide impressive results on some prompts, such as to write a love poem for a girlfriend, but it can also return incorrect or bizarre responses. Many in the field have suggested caution when using the results produced by an LLM for important topics like health advice.

For this new study, the team in Canada evaluated how well ChatGPT would diagnose human ailments if given symptoms of real patients as described in actual case studies. They chose 150 case studies from Medscape, an online website created and used by medical professionals for informational and educational purposes, that were accompanied by a known accurate diagnosis. They trained ChatGPT 3.5 with pertinent data, such as patient history, lab results and office exam findings, and then asked it for a diagnosis and/or a treatment plan.

After the LLM returned an answer, the research team graded its results based on how close it came to the correct diagnosis. They also graded it on how well it reported its rationale in reaching its diagnosis, including offering citations—an important part of medical diagnostics. They then averaged the scores received for all the case studies and found that the LLM gave a correct diagnosis just 49% of the time.

The researchers note that while the LLM scored poorly, it did do a good job describing how it reached its diagnosis—a characteristic, the team suggests, that might prove useful for medical students. They also noted that the LLM was reasonably good at ruling out possible ailments. They conclude by suggesting that LLMs are not yet ready for use in diagnostic settings.

More information: Ali Hadi et al, Evaluation of ChatGPT as a diagnostic tool for medical learners and clinicians, PLOS ONE (2024). DOI: 10.1371/journal.pone.0307383

Journal information: PLoS ONE

Citation: ChatGPT still not very good at diagnosing human ailments (2024, August 1) retrieved 1 August 2024 from https://medicalxpress.com/news/2024-08-chatgpt-good-human-ailments.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

ChatGPT found to have very low success rate in diagnosing pediatric case studies

7 shares

Feedback to editors

New prediction model could help inform patients of their risks of having shoulder replacement surgery

41 minutes ago

Researchers determine how the hippocampus generates and sustains oscillations

1 hour ago

New insights into cellular processes after a stroke

1 hour ago

Weight-loss drug may slow Alzheimer's decline

1 hour ago

Why clinical trials stop: Researchers find a link between genetic evidence and trial outcome

1 hour ago

New study highlights scale and impact of long COVID

1 hour ago

Judging your own happiness could backfire: Experiencing emotions with acceptance is more useful, study finds

1 hour ago

Studies find teens with problematic smartphone use are twice as likely to have anxiety

1 hour ago

New study shows polymersomes' potential in cancer immunotherapy

2 hours ago

Liver cancer growth tied to tryptophan intake

2 hours ago

Load comments (0)

ChatGPT still not very good at diagnosing human ailments

New prediction model could help inform patients of their risks of having shoulder replacement surgery

Researchers determine how the hippocampus generates and sustains oscillations

New insights into cellular processes after a stroke

Weight-loss drug may slow Alzheimer's decline

Why clinical trials stop: Researchers find a link between genetic evidence and trial outcome

New study highlights scale and impact of long COVID

Judging your own happiness could backfire: Experiencing emotions with acceptance is more useful, study finds

Studies find teens with problematic smartphone use are twice as likely to have anxiety

New study shows polymersomes' potential in cancer immunotherapy

Liver cancer growth tied to tryptophan intake

ChatGPT found to have very low success rate in diagnosing pediatric case studies

Scientists find ChatGPT is inaccurate when answering computer programming questions

Can ChatGPT diagnose your condition? Not yet, say researchers

ChatGPT shows 'impressive' accuracy in clinical decision making

Trust your doctor: Study shows human medical professionals are more reliable than artificial intelligence tools

Are AI-chatbots suitable for hospitals? Diagnostic capabilities of large language models tested

New insights into fruit fly cell regulation may offer clues for treating brain tumors

New AI tool simplifies heart monitoring: Fewer leads, same accuracy

Experimental AI method boosts doctors' ability to diagnose cancers and precancers of the esophagus

Generation X, millennials in US have higher risk of developing 17 cancers compared to older generations

AI bowel cancer test can tell whether patients need chemotherapy

Study evaluates AI that creates cardiology reports for patients

Phys.org

Tech Xplore

Science X

ChatGPT still not very good at diagnosing human ailments

New prediction model could help inform patients of their risks of having shoulder replacement surgery

Researchers determine how the hippocampus generates and sustains oscillations

New insights into cellular processes after a stroke

Weight-loss drug may slow Alzheimer's decline

Why clinical trials stop: Researchers find a link between genetic evidence and trial outcome

New study highlights scale and impact of long COVID

Judging your own happiness could backfire: Experiencing emotions with acceptance is more useful, study finds

Studies find teens with problematic smartphone use are twice as likely to have anxiety

New study shows polymersomes' potential in cancer immunotherapy

Liver cancer growth tied to tryptophan intake

Related Stories

ChatGPT found to have very low success rate in diagnosing pediatric case studies

Scientists find ChatGPT is inaccurate when answering computer programming questions

Can ChatGPT diagnose your condition? Not yet, say researchers

ChatGPT shows 'impressive' accuracy in clinical decision making

Trust your doctor: Study shows human medical professionals are more reliable than artificial intelligence tools

Are AI-chatbots suitable for hospitals? Diagnostic capabilities of large language models tested

Recommended for you

New insights into fruit fly cell regulation may offer clues for treating brain tumors

New AI tool simplifies heart monitoring: Fewer leads, same accuracy

Experimental AI method boosts doctors' ability to diagnose cancers and precancers of the esophagus

Generation X, millennials in US have higher risk of developing 17 cancers compared to older generations

AI bowel cancer test can tell whether patients need chemotherapy

Study evaluates AI that creates cardiology reports for patients

Newsletter sign up

Donate and enjoy an ad-free experience