Breast cancer continues to be one of the most pressing health concerns worldwide, ranking as the most prevalent form of cancer among women and a leading cause of all cancer-related deaths. In 2022, 2.3 million new cases were diagnosed, and 670,000 women died globally from the disease. By 2050, new cases and deaths are expected to grow by 38 percent and 68 percent, respectively. However, while one in eight women will face a breast cancer diagnosis during her lifetime in the US and Europe, there is still reason for hope. Mortality rates are declining in these regions, largely thanks to earlier detection through screening and advances in systemic therapies.
Mammography remains the most effective tool for early detection of breast cancer, helping to reduce later-stage diagnoses and mortality. Breast cancer develops in stages, typically starting within the ducts and potentially progressing to invasive disease. As tumor size increases, so does the likelihood of metastasis and mortality, making it critical for radiologists to detect cancers while they are still small and confined to the ducts.
Screening programs using mammography are therefore highly labor-intensive, due both to the large number of patients screened and the requirements in most European programs for double reading, where two specialists independently review each mammogram. Despite careful review, about 30 percent of tumors are missed during screening and later appear as interval cancers, which tend to be more aggressive. Retrospective studies show that 20–25 percent of these missed cancers were already visible on earlier mammograms but went unrecognized.
Several factors may contribute to tumors going unrecognized. In particular, dense breast tissue can make tumors difficult to see as both dense tissue and cancer appear white on an X-ray, creating a "snowstorm" effect that can hide tumors. Young women and those using hormone replacement therapy are more likely to have dense breast tissue, which further increases the chance that a tumor might be missed.

Suzanne van Winkel is a clinical epidemiologist focused on AI and personalized approaches in breast cancer screening.
CREDIT: Suzanne van Winkel
These statistics are particularly concerning given the growing shortage of trained radiologists. In the UK, for example, 25 percent of breast units lack at least one breast radiologist, and retirements are projected to outpace new appointments over the next decade. According to the Royal College of Radiologists (RCR), as of 2023, the UK faced a shortfall of approximately 30 percent of clinical radiologists, projected to increase to 40 percent by 2028. Delays in cancer diagnosis carry serious consequences, with a patient's risk of death increasing by 10 percent for every month treatment is postponed. Despite this, in a 2024 consensus report by the RCR, every radiology leader in the UK reported diagnostic scan delays due to staff shortages.
Against this backdrop, researchers and clinicians are increasingly exploring whether artificial intelligence (AI) could help fill the gap. In particular, mammography has emerged as an early opportunity to integrate AI into clinical workflows, spurred by advances in algorithm development, the growing shortage of radiologists, and the workload burden of double readings.
“AI could play an important role in optimizing workflow and accessibility,” said Suzanne van Winkel, a clinical epidemiologist focused on AI and personalized approaches in breast cancer screening at Radboud University Medical Center. “This might involve replacing a second reader, triaging low-risk populations with stand-alone AI, or potentially substituting for multiple readers in the future if its performance meets or exceeds that of human experts.”
Recent studies are now evaluating whether AI can support radiologists, enhance diagnostic accuracy, and potentially replace some of the human workload.
Retrospective studies of AI in breast cancer screening
Computer-aided detection (CAD) has been FDA-approved for use in mammography since 1998. A decade later, it was employed in analyzing 74 percent of screening mammograms for Medicare patients in the US. Despite this widespread adoption, multiple studies have found that CAD did not improve diagnostic accuracy, and in some cases, it even reduced screening sensitivity, meaning fewer true breast cancers were detected during screening.
However, the era of AI is rapidly changing this due to the success of novel algorithms based on deep learning convolutional neural networks. These technologies excel at automating cognitively difficult tasks, such as self-driving cars and advanced speech recognition. In medical imaging, deep learning-based AI is rapidly narrowing the gap between humans and computers.
AI could play an important role in optimizing workflow and accessibility. This might involve replacing a second reader, triaging low-risk populations with stand-alone AI, or potentially substituting for multiple readers in the future if its performance meets or exceeds that of human experts.
- Suzanne van Winkel, Radboud University Medical Center
One of the first major tests of AI’s potential in breast cancer screening came in 2019. Researchers pooled nearly 2,700 digital mammograms from seven countries, each with radiologist assessments and biopsy-confirmed outcomes.
When analyzing the same exams, the AI system performed on par with the average of the 101 radiologists and even exceeded that of over 60 percent of the individual radiologists. However, the best radiologists still outperformed the AI, suggesting that while AI can match human proficiency at scale, it had yet to reach the top tier of expert judgment.
A later study expanded on this approach by evaluating over 1.1 million mammograms from the German national breast cancer screening program and proposed a collaborative AI-radiologist decision-referral system. In this two-part system, the AI automatically processed high-confidence exams and referred uncertain or potentially high-risk cases to radiologists for expert evaluation.
This hybrid approach maintained the expertise of human readers but also improved overall screening performance: Sensitivity increased by 2.6 percentage points and specificity by 1.0 percentage point compared with individual radiologists. Importantly, the system correctly triaged 63 percent of normal exams, substantially reducing radiologist workload without compromising diagnostic accuracy. This approach demonstrates how AI can enhance screening accuracy, adapt to the heterogeneous demands of population screening, and support radiologists rather than replace them.
Despite the promising results of these studies, a key limitation is that both relied on retrospective, enriched datasets rather than real-world screening populations. In practice, the prevalence of cancer, the types of lesions encountered, and workflow conditions differ from these curated datasets, meaning the performance of both AI systems and radiologists might not fully reflect actual outcomes.
Implementing AI in real-world screening
To address the gap in real-world data, the first randomized, controlled, population-based trial of AI-assisted mammography screening was conducted in Sweden between 2021 and 2022. The MASAI study enrolled 105,000 women aged 40–80, who were randomly allocated to either AI-supported screening or standard double reading without AI.
The trial demonstrated that AI-assisted screening detected nearly 30 percent more cancers than standard double reading, identifying 338 cancers compared with 262 in the control group. Notably, AI-supported screening also specifically identified more aggressive subtypes that are often more challenging to detect with standard screening. These included non-luminal A invasive cancers — such as triple-negative, HER2 (human epidermal growth factor receptor 2)-positive, and luminal B cancers — as well as high-grade ductal carcinoma in situ (DCIS). These subtypes are clinically important because they tend to grow faster, have a higher likelihood of spreading to lymph nodes, and are associated with poorer prognosis compared with luminal A cancers.
Triple-negative cancers, for example, lack hormone receptors and HER2 expression, limiting treatment options, while HER2-positive and luminal B cancers have higher proliferation rates and greater aggressiveness. High-grade DCIS is more likely to progress to invasive disease if untreated, unlike low-grade DCIS, which is generally indolent. By detecting these cancers at an earlier, lymph-node negative stage, AI-supported screening could enable timely intervention and improve clinical outcomes.

Stefan Bunk is the cofounder and Chief Technology Officer of Vara
CREDIT: Stefan Bunk
A similar study in Germany, the PRAIM study, also compared AI-supported double reading with standard double reading in more than 463,000 women aged 50–69. They reported similar results, showing that radiologists using AI achieved a higher cancer detection rate and identified a higher proportion of aggressive and clinically relevant invasive subtypes. The study also found improvements in efficiency and diagnostic accuracy, reinforcing the promise of AI-assisted screening across large populations and real-world screening workflows.
As Stefan Bunk, cofounder and Chief Technology Officer of Vara, which funded the PRAIM study, noted, “The integration of AI into breast cancer screening creates a remarkable win-win-win scenario. First, we achieve higher cancer detection rates, catching more cases that might otherwise be missed. Second, this improved detection comes with fewer false positives, which means less stress and anxiety for the women participating in screening programs. Third, AI significantly reduces radiologist workload, allowing our already stretched and aging workforce to focus their expertise on the most critical cases."
Consistent with these findings, both the MASAI and the PRAIM trials showed that AI-supported screening improved detection without increasing false positives and cut the reading workload by almost half, demonstrating that AI can enhance patient outcomes while making screening programs more efficient and sustainable.
While these findings are highly encouraging, the ultimate value of AI in breast cancer screening will not be judged solely by the number of cancers detected in the short term. What truly matters is whether earlier detection leads to fewer interval cancers, less advanced disease, and improved survival over time. Long-term follow-up data is essential for answering this question.
Long-term success
One of the first large-scale attempts to address long-term follow-up data came from the Netherlands in 2025, where van Winkel and her team retrospectively analyzed 42,236 mammograms from more than 42,000 women screened between 2016 and 2018. Using a commercially available AI system, they compared outcomes across different screening scenarios: stand-alone AI reading, AI as a second reader, and the traditional single- and double-reading models. Crucially, the study linked mammography results with outcomes from the Netherlands Cancer Registry and tracked women for up to 52 months, enabling the team to assess not only cancers detected at screening but also interval cancers and those diagnosed later.
The rationale behind this design was clear: to test whether AI could identify cancers that radiologists miss, and to determine if those missed cases were clinically significant or likely to progress. By comparing tumor characteristics, such as size, lymph-node involvement, and invasiveness, the researchers could judge whether AI was simply flagging borderline lesions or whether it was catching aggressive cancers at an earlier stage when treatment is most effective.
Van Winkel explained, “We chose to compare AI directly with a second human reader rather than evaluating AI as a support tool, because from a scientific perspective this design provides a clearer and more methodologically robust insight into the relative performance of AI versus radiologists.” This head-to-head approach enabled the researchers to highlight differences and potential overlaps in diagnostic accuracy, without affecting the performance of the original human readers. This was crucial for pinpointing cases where AI detected tumors that human readers had missed.
In the study, cancers were categorized based on when they were diagnosed relative to the screening exam. Cancers found within six months of a screening, following a referral based on the human-read mammogram, were considered screen-detected. Those diagnosed between the first screening and the next scheduled round were classified as interval cancers, while cancers detected 22 months or more after the first screening were labeled future breast cancers. Using this framework, van Winkel commented that, “AI was able to identify 28 additional interval cancers and 33 future breast cancers beyond the 291 cancers originally detected by human readers.”
Analysis of the cancers flagged by AI but missed by human readers revealed that they were often larger and more invasive by the time of eventual detection, indicating that these were clinically relevant tumors rather than incidental findings. “This suggests that many women who were later diagnosed with these cancers might have benefited from earlier AI detection,” van Winkel noted.
In other words, AI didn’t just find more cancers — it found the ones that benefit most from early intervention, detecting aggressive tumors earlier when treatment is most likely to succeed and giving patients a better chance at successful outcomes.
Maximizing AI benefits requires careful integration
Overall, van Winkel’s study suggests that combining human and AI reading may offer the greatest benefit for population-based screening, increasing breast cancer detection by 8.4 percent compared with standard double reading. However, the study also revealed an important trade-off.
Adding AI as a second reader boosted overall cancer detection, but it nearly doubled the number of women recalled for further assessment. With AI, 2,112 women were flagged, compared with 1,244 women under standard double reading. This increase reflects AI’s ability to spot subtle or early signs that human readers might miss, but it also captures more benign cases, leading to higher false positives and a greater workload for radiologists, as well as additional stress and follow-up for patients. This finding was in direct contradiction to the MASAI and PRAIM studies. However, this difference is likely due to how AI was integrated into the screening process.
In the Dutch study, AI was used as a full second reader, meaning every mammogram was reviewed by both a radiologist and the AI system — a setup that naturally increases recalls when both parties flag potential abnormalities. In contrast, the MASAI and PRAIM trials used AI in a triage or support role, where low-risk cases identified by AI were either single read or automatically excluded from double reading. This design significantly reduced the number of cases requiring human review, helping maintain or even lower recall rates while still improving cancer detection.
Therefore, implementing AI into real-world workflows requires careful planning. Van Winkel described that, “when two human radiologists disagree, a consensus meeting is held to resolve discrepancies. This indicates the need to design a similar form of consensus or arbitration process when integrating AI.”
The MASAI trial provides a useful model: AI triaged examinations so that low-risk cases were single read, while high-risk cases were double read, with consensus meetings available whenever needed. The key to this approach is ensuring that neither human nor AI findings are dismissed automatically, preserving oversight while optimizing efficiency. “This hybrid model shows a pragmatic pathway for integrating AI safely into clinical practice,” van Winkel said.
A support, not a replacement
While AI’s performance in detecting cancers is impressive, its integration into screening programs carries broader implications — both for radiologists and for patients. Van Winkel is clear that AI is not a replacement for human expertise, but rather a tool that can reshape how radiologists work.
“Radiologists shouldn’t fear for their jobs, although their tasks may shift over time. A likely scenario is that with the use of AI for triaging low-risk screening exams, radiologists will see far fewer ‘normal’ mammograms and much more often, or almost exclusively, abnormal ones. This allows them to focus their expertise where it is needed most, while AI helps to handle the more routine aspects of screening,” van Winkel said.
Patient perception and communication are equally critical. Introducing AI into breast cancer screening raises questions about trust, privacy, and the role of human care. “It’s essential to explain clearly what AI is doing and why,” van Winkel explained. “Women need to understand that the goal is earlier and more accurate detection, not less personal care.”
By supporting radiologists in triaging low-risk exams, highlighting subtle abnormalities, and reducing workload, AI may be able to enhance accuracy, efficiency, and consistency across screening programs. If integrated well, AI could lead to earlier detection of aggressive cancers, better patient outcomes, and the potential to make breast screening more accessible, scalable, and sustainable. The future of breast cancer screening is likely to be a partnership between human insight and AI, combining the strengths of both to deliver the best possible care.
















