June 6, 2023

ChatGPT flunks self-assessment test for urologists

At a time of growing interest in the potential role of artificial intelligence (AI) technology in medicine and healthcare, a new study reported in Urology Practice finds that the groundbreaking ChatGPT chatbot performs poorly on a major specialty self-assessment tool.

ChatGPT achieved less than a 30% rate of correct answers on the AUA's widely used Self-Assessment Study Program for Urology (SASP). "ChatGPT not only has a low rate of correct answers regarding clinical questions in urologic practice, but also makes certain types of errors that pose a risk of spreading medical misinformation," comment Christopher M. Deibert, MD, MPH, and colleagues of University of Nebraska Medical Center.

Can AI-trained chatbot pass a test of clinical urology knowledge?

Recent advances in large language models (LLMs) provide opportunities for adapting AI technology as a tool for mediating human interaction. "With adequate training and application, these AI systems can process complex information, analyze relationships between ideas, and generate coherent responses to an inquiry," note the authors.

ChatGPT (Chat Generative Pre-Trained Transformer) is an innovative LLM chatbot that has spurred interest in use in a wide range of settings—including health and medicine. In one recent study, ChatGPT scored at or near passing levels on all three steps of the United States Medical Licensing Examination (USMLE), without any special training or feedback on medical topics. Could this innovative AI-trained tool perform similarly well on a more advanced test of clinical knowledge in a surgical specialty?

To find out, Dr. Deibert and colleagues evaluated ChatGPT's performance on the AUA's Self-Assessment Study Program (SASP)—a 150-question practice examination addressing the core curriculum of medical knowledge in urology. The SASP is a valuable test of clinical knowledge for urologists in training and practicing specialists preparing for Board certification. The study excluded 15 questions containing visual information such as pictures or graphs.

ChatGPT scores low on SASP, with 'redundant and cyclical' explanations

Overall, ChatGPT gave correct answers to less than 30% of SASP questions: 28.2% of multiple-choice questions and 26.7% of open-ended questions. The chatbot provided "indeterminate" responses to several questions. On these questions, accuracy was decreased when the LLM model was asked to regenerate its answers.

For most open-ended questions, ChatGPT provided an explanation for the selected answer. The explanations provided by ChatGPT were longer than those provided by SASP, but "frequently redundant and cyclical in nature," according to the authors.

"Overall, ChatGPT often gave vague justifications with broad statements and rarely commented on specifics," Dr. Deibert and colleagues write. Even when given feedback, "ChatGPT continuously reiterated the original explanation despite it being inaccurate."

ChatGPT's poor accuracy on the SASP contrasts with its performance on the USMLE and other graduate-level exams. The authors suggest that while ChatGPT may do well on tests requiring recall of facts, it seems to fall short on questions pertaining to clinical medicine, which require "simultaneous weighing of multiple overlapping facts, situations and outcomes."

"Given that LLMs are limited by their human training, further research is needed to understand their limitations and capabilities across multiple disciplines before it is made available for general use," Dr. Deibert and colleagues conclude. "As is, utilization of ChatGPT in urology has a high likelihood of facilitating medical misinformation for the untrained user."

More information: Linda My Huynh et al, New Artificial Intelligence ChatGPT Performs Poorly on the 2022 Self-assessment Study Program for Urology, Urology Practice (2023). DOI: 10.1097/UPJ.0000000000000406

Provided by Wolters Kluwer Health

Citation: ChatGPT flunks self-assessment test for urologists (2023, June 6) retrieved 17 July 2024 from https://medicalxpress.com/news/2023-06-chatgpt-flunks-self-assessment-urologists.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

ChatGPT does not pass American College of Gastroenterology tests

15 shares

Feedback to editors

Irritable bowel syndrome following gastroenteritis may last 4+ years in around half of those affected

8 hours ago

Study suggests reviewing current recommendations that discourage exercise before bed

8 hours ago

Children with conduct disorder show widespread brain structural differences, finds new international study

8 hours ago

World-first international guidelines weeds-out potentially critical scientific fraud

8 hours ago

Active commuting linked to lower risks of mental and physical ill health: Strongest benefits seen for cyclists

8 hours ago

Proof-of-principle study shows protein isoform inhibitors may hold the key to making opioids safer

9 hours ago

Automated appointment scheduling, reminder messages may improve postpartum health for those with chronic conditions

10 hours ago

Scientists find small regions of the brain can take micro-naps while the rest of the brain is awake and vice versa

10 hours ago

First health care device powered by body heat made possible by liquid based metals

11 hours ago

Large study confirms siblings of autistic children have 20% chance of autism

11 hours ago

Load comments (0)

ChatGPT flunks self-assessment test for urologists

Can AI-trained chatbot pass a test of clinical urology knowledge?

ChatGPT scores low on SASP, with 'redundant and cyclical' explanations

Irritable bowel syndrome following gastroenteritis may last 4+ years in around half of those affected

Study suggests reviewing current recommendations that discourage exercise before bed

Children with conduct disorder show widespread brain structural differences, finds new international study

World-first international guidelines weeds-out potentially critical scientific fraud

Active commuting linked to lower risks of mental and physical ill health: Strongest benefits seen for cyclists

Proof-of-principle study shows protein isoform inhibitors may hold the key to making opioids safer

Automated appointment scheduling, reminder messages may improve postpartum health for those with chronic conditions

Scientists find small regions of the brain can take micro-naps while the rest of the brain is awake and vice versa

First health care device powered by body heat made possible by liquid based metals

Large study confirms siblings of autistic children have 20% chance of autism

ChatGPT does not pass American College of Gastroenterology tests

ChatGPT scores nearly 50% on board certification practice test for ophthalmology, study shows

Q&A: ChatGPT answers common patient questions about colonoscopy

ChatGPT can (almost) pass the US Medical Licensing Exam

Interview: How does ChatGPT perform on the United States Medical Licensing Examination?

Exploring potential applications for ChatGPT in nuclear medicine and molecular imaging

Machine learning helps define new subtypes of Parkinson's disease

Study shows AI tool successfully responds to patient questions in electronic health record

Off-the-shelf wearable trackers provide clinically-useful information for patients with heart disease

Accurate and continuous remote monitoring of step length can be sensitive marker for neurological diseases and aging

Beyond algorithms: The role of human empathy in AI-enhanced therapy

Artificial intelligence outperforms clinical tests at predicting progress of Alzheimer's disease

Phys.org

Tech Xplore

Science X

ChatGPT flunks self-assessment test for urologists

Can AI-trained chatbot pass a test of clinical urology knowledge?

ChatGPT scores low on SASP, with 'redundant and cyclical' explanations

Irritable bowel syndrome following gastroenteritis may last 4+ years in around half of those affected

Study suggests reviewing current recommendations that discourage exercise before bed

Children with conduct disorder show widespread brain structural differences, finds new international study

World-first international guidelines weeds-out potentially critical scientific fraud

Active commuting linked to lower risks of mental and physical ill health: Strongest benefits seen for cyclists

Proof-of-principle study shows protein isoform inhibitors may hold the key to making opioids safer

Automated appointment scheduling, reminder messages may improve postpartum health for those with chronic conditions

Scientists find small regions of the brain can take micro-naps while the rest of the brain is awake and vice versa

First health care device powered by body heat made possible by liquid based metals

Large study confirms siblings of autistic children have 20% chance of autism

Related Stories

ChatGPT does not pass American College of Gastroenterology tests

ChatGPT scores nearly 50% on board certification practice test for ophthalmology, study shows

Q&A: ChatGPT answers common patient questions about colonoscopy

ChatGPT can (almost) pass the US Medical Licensing Exam

Interview: How does ChatGPT perform on the United States Medical Licensing Examination?

Exploring potential applications for ChatGPT in nuclear medicine and molecular imaging

Recommended for you

Machine learning helps define new subtypes of Parkinson's disease

Study shows AI tool successfully responds to patient questions in electronic health record

Off-the-shelf wearable trackers provide clinically-useful information for patients with heart disease

Accurate and continuous remote monitoring of step length can be sensitive marker for neurological diseases and aging

Beyond algorithms: The role of human empathy in AI-enhanced therapy

Artificial intelligence outperforms clinical tests at predicting progress of Alzheimer's disease

Newsletter sign up

Donate and enjoy an ad-free experience