More Evidence Backs AI-Supported Mammography

Interval breast cancer rates with use of AI-supported mammography were noninferior compared with rates with standard double reading by radiologists in a randomized trial.
Use of AI also resulted in higher sensitivity and the same specificity compared with standard double reading.
AI-supported mammography can be used to improve screening performance, while reducing screen reading workload, researchers suggested.

Mammography screening supported by artificial intelligence (AI) showed consistently favorable outcomes compared with standard double reading by radiologists, including a noninferior interval cancer rate, according to results from the MASAI randomized trial.

Among over 105,000 women, interval cancer rates were 1.55 and 1.76 per 1,000 participants in the intervention and control group, respectively, with a noninferior proportion ratio of 0.88 (95% CI 0.65-1.18, P=0.41), which met noninferiority criteria, reported Kristina Lång, PhD, of Lund University in Malmö, Sweden, and colleagues in The Lancet.

Since the analysis was not powered to detect superiority, the researchers presented differences between the intervention and control groups descriptively, and reported that AI-supported screening resulted in the following outcomes versus standard screening:

12% fewer interval cancers and 16% fewer invasive interval cancers
27% fewer non-luminal A subtypes and a similar number of luminal A subtypes (7% more)
Fewer invasive interval cancers across histological grades (23% fewer of the no special type, but 20% more of the lobular type)
21% fewer T2+ interval cancers and a similar number of T1 tumors (3% fewer)
Fewer interval cancers with TNM stage I and II+ (21% and 13% fewer, respectively)

Use of AI also resulted in higher sensitivity (80.5% vs 73.8%) and the same specificity (98.5%) compared with standard double reading.

Initial results from the trial showed that AI-supported mammography resulted in a cancer detection rate that was comparable to that of two breast radiologists working together (6.1 vs 5.1 per 1,000 screened), while also substantially cutting screening workload by 44%.

A follow-up per-protocol analysis showed that AI-supported screening led to a significant increase in the cancer detection rate versus standard double reading (6.4 vs 5.0 per 1,000 screened participants), with small nonsignificant increases in recall rates and false-positive rates.

“Together with results from the two previous protocol-defined analyses of the MASAI trial, we show that an AI-supported mammography screen reading procedure can be used to improve screening performance, while reducing screen reading workload compared with standard double reading without AI,” wrote Lång and colleagues.

“Further analyses of subsequent screening rounds and cost-effectiveness will clarify the long-term balance of benefits and harms and could provide a strong rationale for implementing AI in population-based mammography screening programs, particularly in the context of workforce shortages,” they noted.

In an accompanying commentary, M. Luke Marinovich, PhD, MPH, and Nehmat Houssami, MBBS, PhD, both of the University of Sydney, cautioned against over-interpreting the noninferiority findings on interval cancer rates, but conceded that they provide “substantial reassurance that the AI screening workflow used in the [trial] is a safe alternative to double-reading of screening mammography.”

“Some breast cancer screening programs have already moved to include AI in the screen-reading workflow, generally in selected screeners,” they wrote. “Results of the MASAI trial will probably accelerate such moves or inform newer trials and large-scale implementation studies.”

From April 2021 to December 2022, 105,934 women who were undergoing mammography screening at four sites in Sweden were randomly assigned in a 1:1 ratio to AI-supported mammography screening or to standard double reading by radiologists without AI. Median age was 53.8 years in the intervention group and 53.7 years in the control group.

In the intervention arm, a specialist AI system analyzed the mammograms and triaged low-risk cases to single reading and high-risk cases to double reading performed by radiologists. AI was also used as detection support to the radiologists, in which it highlighted suspicious findings in the image.

The AI system was trained, validated, and tested with more than 200,000 examinations from multiple institutions across more than 10 countries.

The authors acknowledged several limitations to their trial related to generalizability, pointing out that the study was conducted in Sweden, with one mammographic vendor, one AI system, low baseline recall rates, and overall experienced radiologists.

“Outcomes might vary between different screen reading procedures or with less-experienced radiologists, who might have higher false positive rates and greater AI influence on performance,” they wrote.