Deep learning can distinguish recalled-benign mammograms from malignant and negative images
An artificial intelligence (AI) approach based on deep learning convolutional neural network (CNN) could identify nuanced mammographic imaging features specific for recalled but benign (false-positive) mammograms and distinguish such mammograms from those identified as malignant or negative.
The study is published in Clinical Cancer Research, a journal of the American Association for Cancer Research, by Shandong Wu, Ph.D., assistant professor of radiology, biomedical informatics, bioengineering, intelligent systems, and clinical and translational science, and director of the Intelligent Computing for Clinical Imaging lab in the Department of Radiology at the University of Pittsburgh, Pennsylvania
"In order to catch breast cancer early and help reduce mortality, mammography is an important screening exam; however, it currently suffers from a high false recall rate," said Wu. "These false recalls result in undue psychological stress for patients and a substantial increase in clinical workload and medical costs. Therefore, research on possible means to reduce false recalls in screening mammography is an important topic to investigate."
Wu and colleagues studied whether a technique in artificial intelligence called deep learning could be applied to analyze a large set of mammograms in order to distinguish images from women with a malignant diagnosis, images from women who were recalled and were later determined to have benign lesions (false recalls), and images from women determined to be breast cancer-free at the time of screening.
"The assumption is that there may be some nuanced imaging features associated with some mammogram images that could lead to a false/unnecessary recall when the images are interpreted by human radiologists, and our goal is to utilize a deep learning CNN-based method to build a computer toolkit to identify those potential mammogram images," Wu said.
The researchers used a total of 14,860 images of 3,715 patients from two independent mammography datasets, Full-Field Digital Mammography Dataset (FFDM—1,303 patients) and Digital Dataset of Screening Mammography (DDSM—2,412 patients). They built CNN models and utilized enhanced model training approaches to investigate six classification scenarios that would help distinguish images of benign, malignant, and recalled-benign mammograms.
When the datasets from FFDM and DDSM were combined, the area under the curve (AUC) to distinguish benign, malignant, and recalled-benign images ranged from 0.76 to 0.91. The higher the AUC, the better the performance, with a maximum of 1, Wu explained. "AUC is a metric that summarizes the comparison of true positives against false positives, so it gives an indication not only of accuracy (how many were correctly identified), but also how many were falsely identified," he said.
Wu said, "We showed that there are imaging features unique to recalled-benign images that deep learning can identify and potentially help radiologists in making better decisions on whether a patient should be recalled or is more likely a false recall."
"Based on the consistent ability of our algorithm to discriminate all categories of mammography images, our findings indicate that there are indeed some distinguishing features/characteristics unique to images that are unnecessarily recalled," Wu noted. "Our AI models can augment radiologists in reading these images and ultimately benefit patients by helping reduce unnecessary recalls."
As limitations of the study, Wu noted that additional independent datasets could help further evaluate the accuracy and robustness of the algorithms, and utilizing alternative deep learning models, architectures, and model training strategies can help improve performance.