AI uncovers bias in dermatology training tools
Skin diseases do not look the same across the skin-tone spectrum, and medical textbooks and presentations used to train dermatologists often lack example images of darker skin tones. During the recent pandemic, for instance, studies showed annotated photos of COVID-19's dermatologic symptoms lacked adequate representation of darker tones.
Other research has gone so far as to suggest that such disparities may contribute to why skin cancer diagnosis—particularly melanoma, which is among the most metastatic of all cancers—is often "significantly delayed" for people of color and why death rates are higher in those same populations.
"Unfairness in the teaching materials equates to unfairness in society," says Roxana Daneshjou, a dermatologist and biomedical data scientist at Stanford University. Daneshjou recently co-authored a study published in npj Digital Medicine introducing the Skin Tone Analysis for Representation in EDucational materials (STAR-ED) framework that uses machine learning to assess bias in skin tones in frequently used medical training materials.
The team trained STAR-ED on thousands of images in medical textbooks, lecture notes, presentation slides, and journal articles to determine just how underrepresented black and brown skin is in the materials. They found that just one in ten images throughout these materials is in the black-brown range on the Fitzpatrick Scale used to evaluate skin tone.
"We're turning AI-bias storyline on its head a little bit," Daneshjou says. "There's lots of news out there of bias in AI models, but in this case we've trained an AI model that detects human bias."
STAR-ED is not the first study to highlight this disparity, Daneshjou acknowledges, but it is the first to use automatic skin tone grading by computer algorithms and is therefore far more scalable than the human-labeled data of the past. Human annotators, Daneshjou says, are prone to fatigue and variability in judging skin tone. While STAR-ED is not perfect, the results closely mirror the findings of human annotators, who spent many hours labeling the data.
In highlighting the disparity in existing training materials, Daneshjou's hope is that authors and editors will use STAR-ED to evaluate their soon-to-be-published textbooks, journals, and slides for potential biases and to remedy any biases prior to publication.
"Our suggestion is physicians are not being trained adequately and that this shortfall may contribute to why people of color with psoriasis, eczema, melanoma, and other skin diseases don't get diagnosed and treated sooner and better," Daneshjou adds. "The bottom line is that we need to get more images of black and brown skin diseases into the training literature."
STAR-ED works on a comprehensive range of file formats—pdf, png, jpeg, pptx, and docx among others—and can be applied to materials beyond textbooks, such as research papers, image study sets, and lecture slides, to bring greater representation into classrooms, reading materials, and seminars.
"We envision STAR-ED helping medical educators, publishers, clinicians, and even students quickly and more easily assess their educational materials for skin-tone bias," she says. Ultimately, she hopes that would translate to earlier, more frequent, more accurate dermatological diagnoses for some already underserved populations.
Future plans are to get STAR-ED into the hands of publishers, editors, and content creators and, potentially, to extend STAR-ED to other educational domains—such as history textbooks—to identify similar gaps in diverse representation.
In this iteration, STAR-ED did not consider texts, tables, and other written content, which its creators hope to integrate later to further reduce bias. One limitation of the methodology the authors noted is that it did not fully exclude diseased or lesional skin, which can obscure the appearance —and thus the tone grading—of the subject's healthy skin. Last, Daneshjou also noted that the Fitzgerald Scale itself is known to be biased, a concern the team hopes to rectify in future versions with the use of alternative tones scales.
More information: Girmaw Abebe Tadesse et al, Skin Tone Analysis for Representation in Educational Materials (STAR-ED) using machine learning, npj Digital Medicine (2023). DOI: 10.1038/s41746-023-00881-0