Poster Spotlight 11: Applying AI to Pathology and Risk Stratification
Session Details
Moderator
Amrita Basu, University of California San Francisco, San Francisco, CA
Presentation numberPD11-02
Artificial Intelligence for Tumor-Infiltrating Lymphocytes in Early-Stage TNBC: Results of a Collaborative Prospective TIL Validation Challenge
Julia R Dixon-Douglas, Gustave Roussy, Villejuif, France
J. R. Dixon-Douglas1, D. Drubay2, R. Salgado3, B. Acs4, J. A. van de Laark5, Y. Yuan6, M. Amgad7, L. A. Cooper7, Y. B. Hagos8, K. AbdulJabbar9, J. Meakin10, B. Van Ginneken5, H. Yan9, J. Lemonnier11, F. Penault-Llorca11, M. Lacroix-Triki12, H. Jounsuu13, P. Kellokumpu-Lehtinen14, S. Loibl15, C. Denkert16, G. Viale17, M. Colleoni18, C. Sotiriou19, M. Piccart20, M. Dieci21, S. Demaria22, R. Kammler23, A. C. Wolff24, S. Adams25, S. Badve26, R. J. Gray27, G. Curigliano28, A. Vincent-Salomon29, T. Nielsen30, L. Pusztai31, F. Ciompi5, S. Michiels32, S. Loi33; 1Unit 981 Molecular predictors and new targets in oncology, Gustave Roussy, Villejuif, FRANCE, 2Biostatistics and Epidemiology Department, Gustave Roussy, Villejuif, FRANCE, 3Department of Anatomical Pathology, GZA-ZNA-Hospitals, Antwerp, BELGIUM, 4Department of Oncology Pathology, Karolinska Insitute, Stockholm, SWEDEN, 5Department of Pathology, Radboud University Medical Center, Nijmegen, NETHERLANDS, 6Division of Pathology and Laboratory Medicine, Department of Translational Molecular Pathology, Case45, Houston, TX, 7Feinberg School of Medicine, Northwestern University, Chicago, IL, 8Computational Pathology, Institute of Cancer Research, London, UNITED KINGDOM, 9Case45, Case45, London, UNITED KINGDOM, 10Pathology, Radboud University Medical Center, Nijmegen, NETHERLANDS, 11Unicancer, Unicancer, Paris, FRANCE, 12Department Anatomical Pathology, Gustave Roussy, Villejuif, FRANCE, 13Department of Oncology, HUS Comprehensive Cancer Center, Helsinki, FINLAND, 14Faculty of Medicine and Health Technology, Tampere University, Tampere, FINLAND, 15German Breast Group, German Breast Group, Forschungs, GERMANY, 16Institute of Pathology,, Philipps-University Marburg and University Hospital Marburg (UKGM), Marburg, GERMANY, 17Department of Pathology and Laboratory Medicine, European Institute of Oncology (IEO), Milan, ITALY, 18Department of Breast Oncology, European Institute of Oncology (IEO), Milan, ITALY, 19Breast cancer translational research laboratory, Institut Jules Bordet, Brussels, BELGIUM, 20Department of Medical Oncology, Institut Jules Bordet, Brussels, BELGIUM, 21Department of Surgery, Oncology and Gastroenterology DISCOG, University of Padua, Padua, ITALY, 22Radiation Oncology, Weill Cornell, New York City, NY, 23Translational Research Working Group Breast Cancer, ETOP-IBCSG, Bern, SWITZERLAND, 24Medical Oncology, John Hopkins Kimmel Cancer Centre, Lutherville, MD, 25Medical Oncology, NYU Langone Laura and Issac Perlmutter Cancer Center, New York, NY, 26Cell and Molecular Biology Research, Winship Cancer Institute, Emory University, Atlanta, GA, 27ECOG-ACRIN Biostatistics Center, Dana Farber Cancer Institute, Boston, MA, 28Development of New Drugs for Innovative Therapies, European Institute of Oncology (IEO), Milan, ITALY, 29Department of Pathology, Institut Curie, Paris, FRANCE, 30Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, CANADA, 31Breast Cancer Translational Research, Yale School of Medicine, New Haven, CT, 32Department of Biostatistics and Epidemiology, Gustave Roussy, Villejuif, FRANCE, 33Department of Cancer Research, Peter MacCallum Cancer Centre, Melbourne, AUSTRALIA.
Background: Tumor-infiltrating lymphocytes (TIL) provide key prognostic information in triple negative breast cancer (TNBC). The CATALINA challenge evaluated multiple TIL-scoring AI algorithms (“cTIL”) on whole-slide images (WSIs) from prospective clinical trial cohorts to assess analytical validity and prognostic performance of cTIL, compared to pathologist-scored stromal TIL (sTIL). Aim: To independently assess the prognostic performance of computational TIL (cTIL) models, compared to pathologist-scored sTIL in a large, prospective cohort. Methods: Two independently developed AI algorithms, producing a total of 5 cTIL scores, were applied to digitized slides blinded to sTIL score and outcomes. We compared agreement between cTIL and sTIL (Spearman’s rho) on 220 breast cancer H&E WSI. We then collected H&E WSI and clinical outcome data for 1,356 early-stage TNBC patients enrolled across seven prospective trials, with pathologist sTIL scored using international guidelines. Five-year invasive disease-free (IDFS), distant disease-free (DDFS), and overall survival (OS) were assessed by Cox proportional hazards model including sTIL and cTIL, adjusted for age, tumor size, nodal status, grade, and treatment. Cases were categorized as high or low sTIL and cTIL, using a previously validated cut-off of 30% for sTIL and the 75th percentile to dichotomize cTIL measures. Discordant cases (cTIL-high/sTIL-low or vice versa) were analyzed for outcome patterns. Results: Moderate correlation (rho 0.37 – 0.47) was observed between cTIL and sTIL scores from the 220 WSI. Pathologist sTIL and all 5 cTIL scores were statistically significantly associated with improved 5-year DDFS, IDFS and OS in the multivariable model. After adjustment for sTIL, only one AI cTIL score (percentage_lymphocyte) measure retained prognostic significance, whereas sTIL remained highly statistically significant for all endpoints (Table 1). Five-year survival probabilities most closely matched the sTIL category. For example, the estimated 5-year DDFS (95% CI) for cTIL/sTIL concordant high/high was 0.82 (0.76 – 0.88), concordant low/low 0.66 (0.63 – 0.70), cTIL-high/sTIL-low 0.67 (0.60 – 0.75), and cTIL-low/sTIL-high groups was 0.78 (0.72 – 0.84), for the percentage_lymphocyte score. Conclusion: In this large, multicenter validation, AI-based TIL quantification demonstrated moderate agreement with pathologist sTIL, and favourable association with prognosis. However, sTIL and cTIL were not directly interchangeable, and discordant cases most closely matched sTIL prognosis. These findings underscore the importance of comparing computational tools to existing, validated biomarkers in large prospective cohorts with comprehensive clinical survival data.
| TIL variable | HR for cTIL score (95% CI) | p-value | HR for sTIL score (95% CI) | p-value |
| DDFS | ||||
| sTILs | – | – | 0.71 [0.63; 0.81] | <0.001 |
| ai_til | 0.81 [0.71; 0.93] | <0.001 | – | – |
| ai_til adjusted for sTIL | 0.95 [0.83; 1.08] | 0.42 | 0.73 [0.63; 0.84] | <0.001 |
| calMeanAllStroma | 0.80 [0.72; 0.90] | <0.001 | – | – |
| calMeanAllStroma adjusted for sTIL | 0.90 [0.79; 1.01] | 0.079 | 0.74 [0.65; 0.85] | <0.001 |
| calMeanAnyStroma | 0.79 [0.69; 0.90] | <0.001 | – | – |
| calMeanAnyStroma adjusted for sTIL | 0.89 [0.78; 1.01] | 0.082 | 0.74 [0.65; 0.85] | <0.001 |
| itlr | 0.80 [0.71; 0.90] | <0.001 | – | – |
| itlr adjusted for sTIL | 0.90 [0.79; 1.02] | 0.087 | 0.74 [0.65; 0.85] | <0.001 |
| percentage_lymphocyte | 0.77 [0.69; 0.86] | <0.001 | – | – |
| percentage_lymphocyte adjusted for sTIL | 0.86 [0.76; 0.96] | 0.012 | 0.77 [0.67; 0.88] | <0.001 |
| OS | – | – | ||
| sTILs | – | – | 0.74 [0.66; 0.83] | <0.001 |
| ai_til | 0.87 [0.76; 0.99] | 0.036 | – | – |
| ai_til adjusted for sTIL | 1.00 [0.88; 1.14] | 0.981 | 0.75 [0.66; 0.85] | <0.001 |
| calMeanAllStroma | 0.83 [0.74; 0.94] | 0.003 | – | – |
| calMeanAllStroma adjusted for sTIL | 0.93 [0.82; 1.06] | 0.272 | 0.76 [0.67; 0.85] | <0.001 |
| calMeanAnyStroma | 0.82 [0.72; 0.94] | 0.003 | – | – |
| calMeanAnyStroma adjusted for sTIL | 0.92 [0.81; 1.06] | 0.247 | 0.75 [0.67; 0.85] | <0.001 |
| itlr | 0.81 [0.71; 0.91] | <0.001 | – | – |
| itlr adjusted for sTIL | 0.89 [0.78; 1.02] | 0.095 | 0.78 [0.69; 0.87] | <0.001 |
| percentage_lymphocyte | 0.79 [0.70; 0.88] | <0.001 | – | – |
| percentage_lymphocyte adjusted for sTIL | 0.88 [0.77; 0.99] | 0.040 | 0.76 [0.66; 0.88] | <0.001 |
Presentation numberPD11-01
Development of a Multi-Modal Artificial Intelligence (MMAI) Model for Predicting Distant Metastasis in HR+ Early-Stage Invasive Breast Cancer
Charles E. Geyer Jr., University of Pittsburgh, Pittsburgh, PA
C. E. Geyer Jr.1, D. A. Kates-Harbeck2, P. Rastogi3, R. Kates4, M. Filipits5, D. Hlauschek5, C. Fesl5, M. Christgen6, U. Nitz7, S. Kuemmel8, M. Graeser9, H. Christgen10, O. Gluz11, T. Freeman12, S. Anderson13, H. Pinckaers14, A. Piehler15, W. Zwerink14, J. Zhang15, S. Joun15, J. Ross16, C. Chao17, J. Griffin18, H. Kreipe19, M. Gnant5, N. Wolmark20, N. Harbeck2; 1NSABP and Department of Medicine, University of Pittsburgh, Pittsburgh, PA, 2WSG and Dep. OB/GYN and CCC Munich, LMU University Hospital, Munich, GERMANY, 3NSABP and Department of Medicine,, University of Pittsburgh, Pittsburgh, PA, 4-, West German Study Group, Moenchengladbach, GERMANY, 5-, Austrian Breast and Colorectal Cancer Study Group (ABCSG), Vienna, AUSTRIA, 6WSG and Institute of Pathology, Hannover Medical School, Hannover, GERMANY, 7-, West German Study Group, Mönchengladbach, GERMANY, 8WSG and Interdisciplinary Breast Center, Evangelical Hospital Essen-Mitte, Essen, GERMANY, 9WSG and Dept. of Obstetrics and Gynecology, University Witten-Herdecke, Witten, GERMANY, 10Institute of Pathology, Hannover Medical School, Hannover, GERMANY, 11WSG and Breast Center Niederrhein, Evangelical Bethesda Hospital, Moenchengladbach, GERMANY, 12NSABP and Departement of Pathology, University of Pittsburgh, Pittsburgh, PA, 13NSABP and NRG Oncology Statistical Center, University of Pittsburgh, Pittsburg, PA, 14AI, Artera Inc., Mountain View, CA, 15Biostatistics, Artera Inc., Mountain View, CA, 16Product Strategy, Artera Inc., Mountain View, CA, 17Medical Science, Artera Inc., Mountain View, CA, 18Clinical Development, Artera Inc., Mountain View, CA, 19WSG and Department of Pathology, Medical School Hannover, Hannover, GERMANY, 20NSABP and Department of Surgery, University of Pittsburgh, Pittsburg, PA.
Background:Accurately predicting the risk of distant metastasis (DM) in hormone receptor–positive (HR+) early-stage breast cancer (EBC) remains a key clinical challenge. Current prognostic tools often rely on limited clinical or genomic features. We developed and evaluated a pathology-based multimodal artificial intelligence (MMAI) prognostic biomarker that integrates clinical and histopathological data from six Phase III randomized trials to predict the risk of DM in HR+ EBC. Methods:Digitized pre-treatment biopsy and surgical slides from six Phase III trials conducted by three cooperative groups (WSG, NSABP, and ABCSG) were used in this study. The MMAI model was developed on 8616 patients from four trials: ADAPT, PlanB, B34, and ABCSG 6, incorporating AI-derived histopathological features alongside clinical variables including age, tumor size, nodal status to predict risk of DM. The continuous MMAI raw score was locked prior to validating in two cohorts: NSABP B14 Tamoxifen Set and NSABP B39 with HR-positive invasive breast cancer. The primary endpoint was time to DM, and model performance was evaluated using area under the time-dependent receiver operating characteristic curve (tdAUC), Cox Proportional Hazards regression and Kaplan-Meier curves. 10-year tdAUC, hazard ratios (HR), and the 10-year estimated DM rates and 95% confidence intervals (CI) were reported. A clinical comparator model including age, tumor size, and pathological N stage was used as a performance benchmark. Performance in pre-specified subgroups including nodal and menopausal status was also assessed. Cut points were empirically determined by maximizing the difference in restricted mean survival time of 10-year DM between risk groups with clinically guided constraints, using data of 4,332 pN0-1 patients pooled from the six trials (ABCSG6, ADAPT, B14, B34, B39, and PlanB) that were held-out for model testing and evaluation. Results:MMAI raw scores were generated for 2,188 patients in B14, and 1,198 patients in B39 with HR+ Stage I-II, EBC. The median follow-up time for B14 was 17.6 years, and 9.4 years for B39. MMAI demonstrated strong prognostic performance: in B14, the locked MMAI showed a 10-year tdAUC of 0.71 [0.67-0.74] compared to the clinical comparator model (0.65 [0.62-0.69]); in B39, the 10-year tdAUC for MMAI was 0.72 [0.59-0.82] versus 0.69 [0.60-0.79] for the clinical comparator model. The MMAI raw score was significantly associated with risk of DM in both B14 (HR [95%CI] = 2.06 [1.81-2.35]) and B39 (HR [95%CI] = 2.31 [1.63-3.28]). The score remained significant after adjusting for age, tumor size, and pathological N stage in both B14 (HR [95%CI] = 1.94 [1.68-2.25]) and B39 (HR [95%CI] = 2.06 [1.36-3.11]). Subgroup analyses showed consistent prognostic performance across nodal status and menopausal status. Using additional data as described, cut points were chosen to ensure sufficient sample sizes within risk categories, balancing statistical power for future validation and clinical interpretability. This resulted in 65.5% of patients being classified as low risk, 10.5% as intermediate risk, and 24.0% as high risk, with the corresponding 10-year DM-free rates of 95.5% (95%CI: 94.6% – 96.3%), 89.5% (95%CI: 86.2% – 92.4%), and 83.6% (95%CI: 81.1% – 86.0%), respectively. Conclusion: We have successfully developed and evaluated an MMAI model across six Phase III randomized breast cancer trials, demonstrating its utility as a prognostic biomarker for predicting risk of DM in HR+ EBC patients. This non-tissue destructive, faster turnaround technology is practical for real-world use and holds significant promise for personalizing patient care in breast cancer.
Presentation numberPD11-03
Predicting treatment outcomes in breast cancer from H&E slides using pathology foundation models with multiple instance learning
Anthony Sun, UCSF, San Francisco, CA
A. Sun1, S. Venters1, C. Yau1, D. Wolf2, G. Hirst1, M. Campbell1, A. Asare1, W. Symmans3, L. Brown-Swigart2, N. Hylton4, J. Perlmutter5, A. DeMichele6, D. Yee7, H. Rugo8, A. Borowsky9, F. Howard10, L. Esserman1, L. van’t Veer2, A. Basu1; 1Surgery, UCSF, San Francisco, CA, 2Laboratory Medicine, UCSF, San Francisco, CA, 3Anatomical Pathology, The University of Texas M.D. Anderson Cancer Center, Houston, TX, 4Radiology, UCSF, San Francisco, CA, 5–, Gemini Group, –, CA, 6Hematology/Oncology, University of Pennsylvania, Philadelphia, PA, 7Hematology, Oncology, and Transplantation, University of Minnesota, Minneapolis, MN, 8Medical Oncology, UCSF, San Francisco, CA, 9Pathology and Lab Medicine, UC Davis, Sacramento, CA, 10Medicine, University of Chicago, Chicago, IL.
Background. Pathologic complete response (pCR) is the absence of residual invasive cancer in the breast and axillary lymph nodes after neoadjuvant therapy. In breast cancer treatment, pCR is a proven surrogate for long-term outcomes. However, accurately predicting pCR at diagnosis remains a clinical challenge. Current tools primarily rely on clinical, genomic, or transcriptomic data. Advances in computational pathology and deep learning enable the extraction of meaningful features from H&E-stained whole slide images (WSIs) to identify phenotypic biomarkers of response. Methods. We apply attention-based multiple instance learning (MIL) to predict pCR from pre-treatment H&E-stained frozen tumor biopsy WSIs in the I-SPY2 trial. A total of 3,306 WSIs from 911 patients across 13 treatment arms were tiled and filtered. Vectors of 1024 features were extracted per tile using the UNI pathology foundation model. The MIL model was independently trained and evaluated with 3-fold cross-validation on these vectors for each arm and tumor subtype. We compared model performance by AUROC between MIL and two elastic net regression models: one trained on pathologist-assessed features (tumor grade, DCIS, invasive histology, and lymphovascular invasion), and another adding clinical features: pre-treatment MRI functional tumor volume (FTV) and transcriptome-derived response predictive subtypes (RPS). Results. 298 of 911 patients achieved pCR (142 HR+/HER2+, 347 HR+/HER2-, 85 HR-/HER2+, and 337 HR-/HER2-). MIL performance varied by arm (AUROC 0.501-0.893) with 6 arms achieving statistically significant performance (95% CI > 0.5). Highest model performance was in HER2+ cohorts: (1) Paclitaxel + Trastuzumab and (2) Paclitaxel + Pertuzumab + Trastuzumab (AUROC = 0.893, 0.785) (Table 1). Of the 6 arms, MIL outperformed the elastic net trained on pathologist-assessed histology features in 5 arms. After including FTV and RPS in the elastic net, MIL still outperformed in 3 arms. Across subtypes, the model predicted better in HR+ subgroups (HR+/HER2- AUROC = 0.706, HR+/HER2+ AUROC = 0.677) than in HR- subgroups (HR-/HER2+ AUROC = 0.533, HR-/HER2- AUROC = 0.548). Conclusion. These findings demonstrate the feasibility of applying MIL vision models to predict treatment-specific response in breast cancer, even with frozen section WSIs and limited data. MIL detects important histology patterns not captured by conventional pathology. Even with added MRI and transcriptomic data, the model provides complementary predictive value. This approach enables early, accurate predictions from routine histology and supports personalized, less toxic treatment—particularly in under-resourced settings.
$$MISSING OR BAD TABLE SPECIFICATION {359A0B19-E74D-4B87-B9F3-0B96DBA90915}$$
| Arm | Total | pCR | HR+/HER2+ | HR+/HER2- | HR-/HER2+ | HR-/HER2- | MIL AUROC (95% CI) | Hist. EN AUROC (95% CI) | Hist. + FTV + RPS EN AUROC (95% CI) |
| Paclitaxel + Trastuzumab | 27 | 7 | 17 | 0 | 10 | 0 | 0.893 (0.730-1.000) | 0.336 (0.047–0.639) | 0.575 (0.270–0.840) |
| Paclitaxel + Pertuzumab + Trastuzumab | 43 | 26 | 28 | 0 | 15 | 0 | 0.785 (0.640–0.905) | 0.500 (0.328–0.680) | 0.839 (0.695–0.944) |
| Paclitaxel + Ganitumab | 96 | 22 | 0 | 50 | 0 | 46 | 0.709 (0.564–0.841) | 0.518 (0.371–0.662) | 0.555 (0.403–0.702) |
| Paclitaxel | 164 | 26 | 0 | 88 | 0 | 76 | 0.702 (0.585–0.809) | 0.535 (0.416–0.650) | 0.517 (0.403–0.620) |
| T-DM1 + Pertuzumab | 52 | 30 | 35 | 0 | 17 | 0 | 0.683 (0.534–0.825) | 0.558 (0.410–0.724) | 0.695 (0.541–0.849) |
| Paclitaxel + ABT 888 + Carboplatin | 70 | 25 | 0 | 32 | 0 | 38 | 0.662 (0.537–0.790) | 0.612 (0.477–0.747) | 0.924 (0.860–0.978) |
| Paclitaxel + Neratinib | 108 | 38 | 38 | 18 | 23 | 29 | 0.618 (0.496–0.725) | 0.462 (0.350–0.576) | 0.415 (0.312–0.522) |
| Paclitaxel + Pembrolizumab | 64 | 31 | 0 | 35 | 0 | 29 | 0.616 (0.473–0.757) | 0.491 (0.347–0.634) | 0.844 (0.717–0.934) |
| Paclitaxel + AMG 386 + Trastuzumab | 19 | 6 | 15 | 0 | 4 | 0 | 0.615 (0.249–0.949) | 0.442 (0.155–0.726) | 0.564 (0.199–0.867) |
| Paclitaxel + AMG 386 | 100 | 30 | 0 | 55 | 0 | 45 | 0.602 (0.490–0.716) | 0.719 (0.598–0.826) | 0.846 (0.747–0.930) |
| Paclitaxel + MK-2206 | 52 | 16 | 0 | 22 | 0 | 30 | 0.582 (0.397–0.764) | 0.521 (0.360–0.677) | 0.693 (0.525–0.832) |
| Paclitaxel + MK-2206 + Trastuzumab | 25 | 16 | 9 | 0 | 16 | 0 | 0.549 (0.278–0.795) | 0.656 (0.407–0.869) | 0.622 (0.327–0.883) |
| Paclitaxel + Ganetespib | 91 | 25 | 0 | 47 | 0 | 44 | 0.501 (0.369–0.633) | 0.615 (0.500–0.727) | 0.521 (0.396–0.641) |
Presentation numberPD11-04
Comparative performance of an AI-based digital pathology tool and genomic signatures in early ER+/HER2- breast cancer
Victor Aubert, Owkin, Paris, France
V. Aubert1, V. Gaury1, I. Garberis2, Z. Vaquette1, E. Hocquet1, D. Almaraz-Klippel3, F. Daidj2, D. Drubay4, D. Jacobs1, N. Arfaoui1, L. Guillou1, D. Lin1, J. Guillon1, C. Barcenas3, F. Andre5, S. Krishnamurthy6, M. Lacroix-Triki7; 1Diagnostics, Owkin, Paris, FRANCE, 2INSERM U981, Gustave Roussy, Villejuif, FRANCE, 3Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, 4Office of Biostatistics and Epidemiology, Gustave Roussy, Villejuif, FRANCE, 5Cancer Medicine, Gustave Roussy, Villejuif, FRANCE, 6Anatomical Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, 7Anatomical Pathology, Gustave Roussy, Villejuif, FRANCE.
BackgroundPrognostic tools are important for guiding adjuvant therapy in early breast cancer (EBC), particularly for patients with estrogen receptor-positive, HER2-negative (ER+/HER2-) tumors, where optimizing the treatment strategy, like making a de-escalation decision, remains a clinical challenge (1). Recent advances in artificial intelligence (AI) applied to whole-slide images have opened new possibilities for capturing prognostic information directly from routinely available H&E slides of invasive breast tumors. This may offer a scalable, cost-effective, and accessible approach to personalized risk assessment. This study evaluates the prognostic performance of RlapsRisk BC (RR) (2), an AI digital pathology-based tool, in comparison with EndoPredict (EP) (3) and Oncotype DX (ODX) (4), using five-years distant recurrence-free survival (dRFS) as the primary endpoint.MethodsWe conducted a retrospective analysis of ER+/HER2- EBC patients from two independent cohorts who underwent genomic testing as part of their routine clinical care. The first cohort comprised patients treated at Gustave Roussy (France) between 2016 and 2019, all of whom underwent the EP test (according to local guidelines) which combines gene expression with clinico-pathologic factors resulting in the EPclin sore (EP cohort). The second cohort included patients with node negative disease treated at MD Anderson Cancer Center (USA), who underwent ODX testing and received adjuvant endocrine therapy alone (ODX cohort). For both cohorts, a minimum of five years of clinical follow-up was available. Diagnostic H&E-stained slides from surgical resection specimens were digitized at x20 and analyzed using RR.ResultsIn the EP cohort (n=381, 7 events), RR achieved an AUC of 0.73 versus 0.57 for EPclin, while in the ODX cohort (n=154, 42 events), its AUC reached 0.78 compared to 0.58 for ODX, indicating stronger discriminative ability in both settings. RR classified fewer patients as high risk than EPclin (33% vs. 66%) and more patients as high risk than ODX (47.4% vs. 37.7%), suggesting improved alignment with observed risk. Among patients classified as low risk by each tool, the 5-year dRFS was 99.6% with RR and 98.3% with EPclin in the EP cohort, and 95.3% with RR compared to 80.2% with ODX in the ODX cohort. In the high-risk groups, 5-year dRFS was 96.3% with RR and 98.6% with EPclin in the EP cohort, and 65.3% with RR versus 84.0% with ODX in the ODX cohort. Notably, RR high-risk classification captured the majority of observed events in both cohorts (6 of 7 in EP and 35 of 42 in ODX), reinforcing its utility for identifying patients at true elevated risk.ConclusionThese findings suggest that RR may support treatment de-escalation by safely expanding the proportion of patients classified as low risk who experience excellent outcomes within their risk group, while also improving identification of patients with a genuinely higher likelihood of recurrence. This ability to improve risk stratification makes it a promising tissue-efficient alternative to current genomic assays such as ODX and EP in ER+/HER2- early breast cancer. References1. Krop I, Ismaila N, Andre F, et al. Use of biomarkers to guide decisions on adjuvant systemic therapy for women with early-stage invasive breast cancer: American Society of Clinical Oncology clinical practice guideline focused update. J Clin Oncol. 2017;35(24):2838-2847. doi:10.1200/JCO.2017.74.04722. Garberis I, Gaury V, Saillard C, et al. Deep learning assessment of metastatic relapse risk from digitized breast cancer histological slides. Nat Commun. 2025;16:5876.3. Dubsky P, Filipits M, Jakesz R, et al. EndoPredict improves the prognostic classification derived from common clinical guidelines in ER-positive, HER2-negative early breast cancer. Ann Oncol. 2012;24(3):640-647. doi:10.1093/annonc/mds3584. Paik S, Shak S, Tang G, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004;351(27):2817-2826.
Presentation numberPD11-05
Inferring Spatial Genomic Expression profiles from H&E-Stained Histology Using Weakly Supervised Deep Learning in Breast Cancer
Shweta S Chavan, PATHOMIQ, Inc., New York, NY
S. S. Chavan1, C. Feng2, H. Muhammad3, H. S. Basu2, W. Huang2, R. Roy2, G. Wilding2, G. B. Mills4, S. Kummar5; 1Computational Biology, PATHOMIQ, Inc., New York, NY, 2Computational Biology, PATHOMIQ, Inc., Cupertino, CA, 3AI, PATHOMIQ, Inc., Cupertino, CA, 4Precision Oncology, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, 5Molecular Oncology, Knight Cancer Institute, Oregon Health & Science University, Portland, OR.
Background: Genomic markers such as ESR1, PGR, EGFR, MKI67, and FOXC1, along with clinically validated multigene signatures like Oncotype DX, MammaPrint, Prosigna, and EndoPredict, are widely used to inform prognosis and guide therapeutic decisions in breast cancer. However, the widespread use of these gene expression-based assays is often constrained by their high cost and limited accessibility. To address this challenge, we developed an AI model capable of inferring the spatial distribution of gene expression directly from routine digitized H&E-stained whole slide images (WSIs). Methods: We trained a weakly supervised multiple instance learning model using a publicly available dataset (TCGA), containing paired bulk RNASeq and H&E WSIs across multiple disease types. Remarkably, although trained solely on bulk tissue-level expression data, the model accurately reconstructs spatial gene expression patterns, offering a scalable and cost-effective alternative to spatial transcriptomics. This pan-cancer model was trained to predict expression levels of thousands of gene expressions, including known breast cancer-relevant genes such as ESR1, PGR, FOXA1, AURKA, BIRC5, MELK, MYBL2, PLK1, CDC20, CCNE1 and NAT1. Once trained, the model was predicted on a held-out validation dataset of breast cancer WSIs to measure slide-level prediction performance using Pearson’s correlation. Spatial predictions were validated on external WSI datasets (HEST and CPTAC) which includes associated spatial transcriptomics and an external WSI dataset (IMPRESS) with associated immunohistochemistry data. Results: Our model accurately predicted the expression of over 2,500 genes with a Pearson correlation greater than 0.6. Notably, breast cancer relevant genes such as ESR1, FOXA1, and AURKA showed particularly high correlation between predicted expression values and ground truth bulk RNA-seq measurements. Among these, we qualitatively compared the predicted spatial expression patterns of PD-L1, CD163, and CD8 with corresponding IHC-stained tissue sections. The regions of high predicted expression are aligned with areas of high marker density in the IHC images. Additionally, comparison with spatial transcriptomic data from 10x Genomics demonstrated strong concordance between our model’s spatial predictions and the ground truth measurements, particularly for breast cancer-relevant genes. Conclusions: Our results demonstrate that routine H&E-stained whole slide images can be leveraged to accurately infer both bulk and spatial gene expression using a weakly supervised AI model. This capability enables large-scale, retrospective spatial analysis of archival tissue and opens new avenues for understanding tumor heterogeneity in breast cancer using standard pathology slides. By providing gene-level insights without the need for additional staining or molecular assays, this approach may augment current histopathological evaluation and support more informed clinical research, risk stratification, and treatment planning.
Presentation numberPD11-06
Discussant: Artificial Intelligence and Pathology
Frederick Matthew Howard, University of Chicago, Chicago, IL
Presentation numberPD11-07
Ai-driven risk reclassification in HR+/HER2− breast cancer: real-world comparison with the 21-gene assay
Elena Diana Chiru, Cancer Center Baselland, Liestal, Switzerland
E. D. Chiru1, L. Sojak1, J. Witowski2, K. Zeng2, C. Kurzeder3, S. Muenst4, M. Vetter1; 1Medical Oncology, Cancer Center Baselland, Liestal, SWITZERLAND, 2Medical Oncology, ATARAXIS AI, New York, NY, 3Medical Oncology, Breast Center, Basel, SWITZERLAND, 4Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, SWITZERLAND.
AI-Driven Risk Reclassification in HR+/HER2− Breast Cancer: Real-World Comparison with the 21-Gene AssayBackground Risk stratification in hormone receptor-positive breast cancer (HR+ BC) is essential for personalized therapy. While the 21-gene expression assay (Oncotype DX, ODX) guides chemotherapy (CHT) decisions, its intermediate-risk group often yields therapeutic uncertainty. The Ataraxis Breast model (ATX) is a multimodal artificial intelligence (AI) tool that integrates digital pathology with clinical variables to produce a continuous recurrence risk score. Trained on over 400 million pathology images, ATX has shown robust prognostic value in external cohorts. We evaluated ATX against ODX in real-world HR+ BC patients (pts) from Basel University Hospital (USB), focusing on the reclassification of intermediate-risk cases and overall prognostic performance. Methods We retrospectively analyzed 269 HR+/HER2- BC pts treated at USB (2010-2021) with available ODX and digitized pathology slides. The primary endpoint was disease-free interval (DFI). ATX scores stratified pts into low vs high risk. Performance metrics included hazard ratios (HR), C-index, and 10-year calibration. Risk reclassification was assessed by comparing ATX vs ODX categories. Multivariate Cox models included age, tumor grade, and ODX score. Subgroup analyses were done for node-positive and CHT-treated pts. Results ATX high-risk pts had significantly reduced DFI vs low-risk pts (HR = 2.37, p = 0.032; C-index = 0.70). In contrast, ODX intermediate-risk pts (n=167) showed poor outcome separation (HR = 1.61, p = 0.39). ATX reclassified 77% of intermediate ODX pts to low risk, and 23% to high risk. Among low ODX pts, 33% were up-classified by ATX; 44% of high ODX pts were down-classified. ATX scores aligned with clinical risk features (nodal status, tumor size) and actual outcomes. In multivariate analysis, ATX was a strong independent predictor (HR = 4.18, p < 0.001) (Table 1), whereas ODX was not.In subgroups, ATX outperformed ODX: C-index was 0.75 in node-positive pts and 0.80 in those receiving CHT. Performance remained strong across tumor stages, age, and histological types, supporting ATX’s versatility. Calibration analysis showed ATX predictions aligned closely with observed 10-year recurrence risks (R² = 0.85). Conclusions ATX showed superior prognostic accuracy and reclassification capacity vs ODX in HR+ BC pts at USB. It clarified risk for most intermediate ODX cases, supporting more precise treatment decisions. ATX may guide CHT de-escalation in low-risk pts and intensification in high-risk cases. These findings support ATX’s clinical integration and prospective validation.
Presentation numberPD11-08
Time Implications of an Oncology Intelligence Platform vs Standard Practice: A UK Single Centre Breast Cancer Multi Disciplinary Team Meeting Simulation Trial
Olubukola Ayodele, University of Leicester, Leicester, United Kingdom
J. Tan1, L. Cook2, R. Williams2, A. Khan3, A. Tiwari4, P. Garodia5, M. Hasanova6, B. Lamb7, O. Ayodele8, S. Adomah9, G. Langton10, R. Pearson11, A. Ghose12, A. Maniam2; 1School of Medical Sciences, University of Manchester, Manchester, UNITED KINGDOM, 2Cancer Services, Isle of Wight NHS Trust, Newport, UNITED KINGDOM, 3Kent and Medway Medical School, University of Kent, Canterbury, UNITED KINGDOM, 4University College London Hospital Cancer Collaborative, Princess Alexandra Hospital NHS Trust, Harlow, UNITED KINGDOM, 5Division of Medicine, University College London, London, UNITED KINGDOM, 6Chief Executive Officer, OncoFlow AI, London, UNITED KINGDOM, 7MDT Development, North East London Cancer Alliance, London, UNITED KINGDOM, 8Leicester Cancer Research Centre, University of Leicester, Leicester, UNITED KINGDOM, 9Breast Unit, The Royal Marsden NHS Foundation Trust, London, UNITED KINGDOM, 10Clatterbridge Cancer Centre, The Clatterbridge Cancer Centre NHS Foundation Trust, Wirral, UNITED KINGDOM, 11First 4 Health Group – E7 Health, Lord Lister Health Centre, London, UNITED KINGDOM, 12Chief Medical Officer, OncoFlow AI, London, UNITED KINGDOM.
Background Multi-Disciplinary Team Meetings (MDTMs) are “gold standard” in the UK cancer care continuum. The National Health Service (NHS) England conducts 55,000 MDTMs annually, consuming over 1.2 million hours of clinician time. This involves pre MDTM case preparation to discussion during MDTM. At the Isle of Wight (IOW), weekly breast cancer MDTMs typically review 30-40 cases over Microsoft (MS) Teams as standard practice, occupying ~2 hours of clinicians’ time. For the same case volumes, case preparation requires an estimated 50-80 hours per week across three key roles: surgeon, radiologist, and pathologist. The OncoflowTM platform uses Artificial Intelligence, primarily large language models (LLMs) to perform Data Extraction and Treatment Matching to assist Cancer MDTMs. A simulation trial was performed with the objective of evaluating time savings benefit of this platform against standard practice in MDTM case preparation. Methods 2 prospective, synchronous, simulation (non-EHR/Electronic Health Record integrated) breast cancer MDTMs were conducted via MS Teams in 2 phases. The MDT consisted of 1 MDT Coordinator, 1 Surgeon, 1 Medical Oncologist and Radiation Oncologist equivalent from the IOW. Phase 1 was the Standard Arm whereas Phase 2 was the Intervention Arm which implemented the OncoflowTM AI powered Cancer MDTM Coordinator CoPilot software platform, a class 1 UKCA (UK Conformity Assessed) MHRA (Medicines and Healthcare products Regulatory Agency) registered medical device (RN 32434). A set of 10 breast cancer cases each were discussed in each Arm. These were different cases but complexity matched, equal number of “simple” (5), “edge” (2) and “complex” (3). These had variable stage of disease – 5 each of early post-operative and metastatic. Results OncoFlow achieved 100% accuracy in LLM-assisted data extraction and parameter validation. Standard vs Intervention had following time implications – the entire MDTM case preparation time was 120 minutes vs 38 seconds and active MDTM case discussion lasted 29 vs 26 minutes. Dissecting further, these were 14m 7s vs 8m 17s for the 5 “Simple” cases, 5m 12s vs 5m 40s for the 2 “Edge” cases and 10m 26s vs 12m 20s for the 3 “Complex” cases. Projecting these results to the Real-World IOW Breast Cancer MDTM of 30-40 cases meant Case Preparation Time Savings of 6-8 hours/meeting and 310-414 hours annually. A paired T-Test proved that this is statistically significant vs standard practice (p=0.000000002535; 95% CI: 10.77,13.11). Using OncoFlow, time savings of 35-47 minutes/MDTM and 30-40 hours/year were noted during Simple Case active discussion along with 12-16 minutes/meeting and 10-14 hours/year of Net MDT time. Complex Case Discussions were longer, i.e., 19-25 minutes/MDTM and 16-22 hours/year vs standard. Conclusion The OncoflowTM Intelligent Platform was 190 times faster than standard manual case preparation for MDTM presentation. It also demonstrated MDTM time streamlining implications – cutting “simple” case discussion time by 40% and creating more capacity for highly robust “complex” case discussions with a 20% time surplus. These findings are aligned well with the NHS England 2020 Streamlining MDTMs guidance. Engagement with other sites nationally is ongoing for a multicentre, national, simulation MDTM experience with greater sample size.
Presentation numberPD11-09
The St. Gallen AI Consensus – Should AI have a vote?
Anton Oseledchyk, University Hospital Basel, Basel, Switzerland
A. Oseledchyk1, W. Weber2, B. Kasenda1; 1Medical Oncology, University Hospital Basel, Basel, SWITZERLAND, 2Breast Cancer Center, University Hospital Basel, Basel, SWITZERLAND.
Introduction: Early breast cancer treatment requires complex individual risk assessment integrating tumor biology, staging, and patient factors. This complexity creates heterogeneous treatment approaches globally. Expert panels provide consensus guidance, but artificial intelligence may offer advantages including real-time access to all published data and freedom from individual, institutional, cultural, and emotional biases. This study represents the first comprehensive comparison between AI-generated breast cancer treatment recommendations and expert panel consensus across early-stage breast cancer management. Methods: We analyzed 80 distinct clinical scenarios from the final voting of the 2025 St. Gallen International Breast Cancer Conference. We included breast/axillary surgery, radiation therapy, systemic treatment, elderly care, and recurrence management. We excluded scenarios focusing on genetic risk (e.g. BRCA testing) and DCIS. Clinical scenarios were presented to four Large Language Models (LLMs): Claude Sonnet 4, Google Gemini 2.5 flash, ChatGPT-4o and DeepSeek-V3. Dates of interaction were July 8th and 9th, 2025. We designed three sequential prompts to (1) answer all clinical scenarios selecting single best options, (2) compare AI responses with expert panel percentages and identify disagreements, (3) analyze disagreements with evidence-based rationales for both AI and expert positions. The primary outcome was the agreement rate between AI and expert consensus, defined as identical answers for standard questions. Results: Overall agreement rates varied substantially: ChatGPT 60.0% (48/80), Gemini 57.5% (46/80), DeepSeek 48.8% (39/80), Claude 26.3% (21/80). Agreement differed dramatically by clinical category, ranging from 70.8% (endocrine therapy, n=6) to 25.0% (radiation therapy, n=5). Further, LLMs showed 56.8% agreement in axillary surgery (n=11) and 58.3% sentinel lymph node omission (n=3) and 35.9% in regard to genomic risk scores (n=16). In a second step we unblinded each LLM to the expert opinion and answers of the other LLMs and prompted each LLM to review its discordant answers: in 50-100% (ChatGPT 50% (16/32) and Claude 100% (59/59)) of questions the LLMs revised their initial answers and accepted the panelists’ recommendation. Still, 19%-50% (DeepSeek 19% (8/41) and ChatGPT 50% (16/32)) of discordant answers remained unchanged. Further in-depth analysis regarding the evidence-based reasoning of AI will be presented at the meeting. Conclusion: There was poor alignment between AI and expert medical consensus in early breast cancer treatment decisions. This reveals current limitations in AI’s ability to integrate complex and multifactorial clinical reasoning. At the same time, AI offers an unbiased, and comprehensive data-driven perspective, effectively serving as a critical mirror for expert panels.
Presentation numberPD11-10
Optimizing HER2 Diagnostic Pathways: AI Assistance Enriches Gene Amplified Cases in Equivocal Category and Reduces Turnaround Time
Manuela Vecsler, Ibex Medical Analytics, Tel Aviv, Israel
M. Vecsler1, S. Krishnamurthy2, S. Schnitt3, A. Vincent-Salomon4, E. Provenzano5, R. Canas-Marques6, L. Arnould7, E. Shearon8, P. Chandra9, P. Borkowski10, S. Declercq11, J. Loane12, A. Gunavardhan13, L. Di Tommaso14, V. Krauss15, P. Richard16, M. Brevet17, M. Grinwald1, D. Mevorach1, R. Ziv1, S. Stein1, G. Mallel1, M. J. T. Senior18, R. J. Hill18, J. Longshore19, S. Judith1, C. Linhart1; 1-, Ibex Medical Analytics, Tel Aviv, ISRAEL, 2Department of Pathology, MD Anderson Cancer Center, TX, TX, 3Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, 4Department of Pathology, Institut Curie, Paris, FRANCE, 5Department of Histopathology, Cambridge University Hospital NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, UNITED KINGDOM, 6Department of Pathology, Champalimaud Clinical Center, Lisbon, PORTUGAL, 7Department of Pathology, Center Georges-Francois Leclerc, Dijon, FRANCE, 8Department of Pathology, Alverno Laboratories, Hammond, IN, 9Department of Pathology, PathGroup Labs, Nashville, TN, 10Department of Pathology, Quest Diagnostics, Tampa, FL, 11Department of Pathology, Ziekenhuis netwerk, Antwerpen, BELGIUM, 12Department of Pathology, NHS Greater Glasgow and Clyde, Glasgow, UNITED KINGDOM, 13Glan Clwyd Histopathology Department, Betsi Cadwaladr University Health Board, NHS, Wales, UNITED KINGDOM, 14Pathology Unit, IRCCS Humanitas Research Hospital, Milan, ITALY, 15Department of Pathology, Diagnósticos da América, S.A, BRAZIL, 16Department of Pathology, MediPath, Frejus, FRANCE, 17Department of Pathology, Cypath, Lyon, FRANCE, 18AZ, AstraZeneca, Cambridge, UNITED KINGDOM, 19AZ, AstraZeneca, Durham, UNITED KINGDOM.
BackgroundHER2 expression is a key prognostic and treatment-influencing factor in breast cancer and is assessed for all invasive breast carcinoma (BC) cases. As with all immunohistochemistry (IHC) staining, the visual interpretation of HER2 expression is subjective and semi-quantitative, which leads to intra- and inter-pathologist variability. One of the accepted reference standards for determining HER2 status in equivocal HER2 2+ cases is in situ hybridization (ISH) to assess gene amplification from FFPE tumor samples. However, assessment of gene amplification by ISH can significantly increase result turnaround times. Here, we evaluate the clinical utility (accuracy and user feedback) of an artificial intelligence (AI)-aided HER2 IHC scoring solution on whole-slide images of HER2 IHCs of breast samples. MethodsThe study cohort included both biopsies and excisions from 2,300 patients from 13 US, EU, and UK clinical laboratories, including academic medical centers and reference/private laboratories. HER2 IHC slides of diverse BC subtypes from primary and metastatic tumors were stained with anti-HER2 antibody (4B5, VENTANA) and scanned with different scanners. This observational two-arm multi-reader study compared the performance of 28 pathologists (“readers”) on HER2 scoring (each pathologist reviewed 50-200 slides) unassisted vs. aided by an AI HER2 solution (Ibex Breast HER2®). The AI tool automatically detects the invasive tumor area and on slide controls, classifies tumor cells based on their HER2 staining pattern, and derives a slide-level HER2 IHC score by applying ASCO/CAP guidelines. Both study arms were compared to the ground truth (GT), established as majority score of three expert breast pathologists who reviewed the slides manually and included ISH results for HER2 IHC 2+ slides. ResultsAcross 8,226 slide reviews, pathologists achieved a significant improvement in overall accuracy when assisted by the AI tool, increasing from 76.4% without AI assistance to 82.0% (P=0.0000 by McNemar test). Importantly, accuracy improved across all clinical HER2 categories (null, ultralow, low, positive), increasing from 79.3% to 84.4% with AI support.Interestingly, AI-assisted pathologists correctly classified fewer cases as equivocal (HER2 2+), reducing the proportion of slides requiring ISH testing from 29.0% to 18.6%, which may translate to a 35% reduction in patient turnaround time. Feedback from reader pathologists’ user survey revealed that AI-assistance leads to increased confidence in HER2 scoring accuracy and consistency. Additionally, 83% of pathologists expressed motivation to continue using AI-assisted HER2 scoring over their standard manual scoring. ConclusionsAI-assisted HER2 scoring significantly improved both diagnostic accuracy and score consistency across diverse breast cancer cases and clinical settings. Importantly, the use of AI reduced equivocal (HER2 2+) classifications, enriched HER2 gene amplification status in cases classified as HER2 2+, potentially enabling faster diagnostic turnaround. Pathologist feedback indicated increased confidence in scoring and strong motivation to adopt AI-assisted workflows over manual interpretation. These findings highlight the value of AI systems in biomarker interpretation, providing pathologists with enhanced decision-making tools with explainability at the individual cell level and improving diagnostic precision in HER2 IHC interpretation. Improved scoring accuracy and consistency can support more reliable patient stratification and treatment selection, helping to ensure that patients receive the most appropriate HER2-targeted therapies.
Presentation numberPD11-11
Multimodal Deep Learning for Recurrence Stratification for Early-Stage Breast Cancer in Resource-Constrained Environments
Mohan Uttarwar, 1Cell Ai, Foster City, CA
J. Shinde1, A. Ulle2, C. Cha Chinglemba2, Y. Thoudam2, T. Gupte2, S. PM2, G. Shafi2, H. Kothavade2, E. Gustafson3, R. Jawale4, A. Khan4, R. Kolhe5, K. Bloom6, M. Uttarwar6; 1OncoPredikt, OneCell Diagnostics Inc., New York, NY, 2OncoPredikt, 1Cell Ai, Mumbai, INDIA, 3Clinical Services, CorePlus, Carolina, PUERTO RICO, 4Department of Pathology, Baystate Health, Springfield, MA, 5Department of Pathology, Medical College of Georgia, Augusta University, Augusta, GA, 6OncoPredikt, 1Cell Ai, Foster City, CA.
Background: Oncologists currently rely on genomic assays such as Oncotype Dx, MammaPrint and EndoPredict to guide adjuvant chemotherapy decisions in early-stage breast cancer. While offering improved risk stratification, they are costly, time-consuming, and inaccessible in many settings. Whole-slide H&E images, already generated for every patient, embed complementary prognostic biology that remains under-exploited in daily practice. Advances in artificial intelligence (AI) have shown promise in predicting molecular biomarkers directly from H&E-stained histopathology slides. In this study, we validate TRINITY AI, a multimodal deep learning platform that integrates morphology, clinicopathologic variables, and AI-inferred transcriptomics from H&E slides to predict a recurrence risk score.Methods: TRINITY AI was trained on 1219 breast cancer cases from TCGA and CPTAC. A self-supervised transformer based foundation model was used to infer transcriptomic profiles from H&E whole slide image (WSI). These inferred profiles along with clinical variables (e.g. patient age, tumour size, grade, nodal status) and morphological features are embedded into a shared latent space to learn patient-level cross-modal relationships to yield a continuous risk score. External diagnostic validation employed three distinct cohorts (n = 166) with matched Oncotype Dx scores for qualified slides that underwent strict quality control and pathologist confirmation. Prognostic utility was interrogated in 1051 TCGA cases with ≥5-year distant-recurrence follow-up using multivariable Cox models (endpoint = distant recurrence-free interval [DRFI]).Results: Against the pooled external cohort, TRINITY AI classified low-risk disease with specificity 95%, negative predictive value 92%, and area-under-the-curve 0.88, outperforming clinicopathologic nomograms and approximating Oncotype Dx. In cohort-specific analysis, it achieved a negative predictive value (NPV) >87% and specificity >92%. High-risk predictions conferred a 3.80-fold increase in DRFI hazard (95 % CI 2.08-7.18) after adjustment for size, grade, nodal status, and subtype; C-index 0.698 (95% CI: 0.622-0.770). Conclusions: TRINITY AI transforms the ubiquitous H&E slide into a genomics-grade breast cancer recurrence assay that resides within the pathologist’s digital workflow. With its performance across independent cohorts, high rule-out capacity, rapid turnaround, and fractional cost make it a pragmatic solution for equitable precision oncology. Critically, it offers a new option in underdeveloped and developing countries, where patients are often forced to choose between the cost of chemotherapy and that of expensive genomic testing. Its survival performance further supports TRINITY AI as a robust, independent predictor of distant breast cancer recurrence.
| Metrics | Cohort 1 (n=65) | Cohort 2 (n=43) | Cohort 3 (n=38) | Combined (n=146) | |||||
| Sensitivity | 86.67% | 44.44% | 25% | 64.29% | |||||
| Specificity | 92% | 97.06% | 97.06% | 94.92% | |||||
| PPV | 76.47% | 80% | 50% | 75% | |||||
| NPV | 95.83% | 86.84% | 91.67% | 91.8% | |||||
| Num of samples | 65 | 43 | 38 | 146 | |||||
| Num Positive (n,%) | 15 (23.08%) | 9 (20.93%) | 4 (10.53%) | 28 (19.18%) | |||||
| Num Negative (n,%) | 50 (76.92%) | 34 (79.07%) | 34 (89.47%) | 118 (80.82%) |
Presentation numberPD11-12
Discussant: Large Language Models and Risk Stratification
Brett Beaulieu-Jones