Title
题目
AsymMirai: Interpretable Mammography-based Deep Learning Model for 1–5-year Breast Cancer Risk Prediction
AsymMirai: 可解释的基于乳腺X线摄影的深度学习模型,用于1至5年乳腺癌风险预测
Background
背景
Mirai, a state-of-the-art deep learning–based algorithm for predicting short-term breast cancer risk, outperforms standard clinical risk models. However, Mirai is a black box, risking overreliance on the algorithm and incorrect diagnoses.
Mirai 是一种最先进的深度学习算法,用于预测短期乳腺癌风险,其表现优于标准的临床风险模型。然而,Mirai 是一个“黑箱”模型,这可能导致对算法的过度依赖以及诊断错误的风险。
Method
方法
This retrospective study involved mammograms obtained from patients in the EMory BrEast imaging Dataset, known as EMBED, from January 2013 to December 2020. To approximate 1–5-year breast cancer risk predictions from Mirai, another deep learning–based model, AsymMirai, was built with an interpretable module: local bilateral dissimilarity (localized differences between left and right breast tissue). Pearson correlation coefficients were computed between the risk scores of Mirai and those of AsymMirai. Subgroup analysis was performed in patients for whom AsymMirai’s year-over-year reasoning was consistent. AsymMirai and Mirai risk scores were compared using the area under the receiver operating characteristic curve (AUC), and 95% CIs were calculated using the DeLong method.
这项回顾性研究使用了从2013年1月到2020年12月在Emory Breast Imaging Dataset(EMBED)中获得的乳腺X光检查图像。为了模拟Mirai的1至5年乳腺癌风险预测,构建了另一个基于深度学习的模型AsymMirai,该模型具有一个可解释模块:局部双侧差异性(左侧和右侧乳腺组织的局部差异)。计算了Mirai与AsymMirai风险评分之间的Pearson相关系数。对于AsymMirai年复一年推理一致的患者,进行了亚组分析。使用受试者工作特征曲线下面积(AUC)比较AsymMirai和Mirai的风险评分,并使用DeLong方法计算95%的置信区间。
Conclusion
结论
Localized bilateral dissimilarity, an imaging marker for breast cancer risk, approximated the predictive power of Mirai and was a key to Mirai’s reasoning
局部双侧差异性,作为一种乳腺癌风险的影像标记,近似于Mirai的预测能力,并且是Mirai推理的关键因素。
Results
结果
Screening mammograms (n = 210067) from 81824 patients (mean age, 59.4 years ± 11.4 [SD]) were included in the study. Deep learning–extracted bilateral dissimilarity produced similar risk scores to those of Mirai (1-year risk prediction, r = 0.6832; 4–5-year prediction, r = 0.6988) and achieved similar performance as Mirai. For AsymMirai, the 1-year breast cancer risk AUC was 0.79 (95% CI: 0.73, 0.85) (Mirai, 0.84; 95% CI: 0.79, 0.89; P = .002), and the 5-year risk AUC was 0.66 (95% CI: 0.63, 0.69) (Mirai, 0.71; 95% CI: 0.68, 0.74; P < .001). In a subgroup of 183 patients for whom AsymMirai repeatedly highlighted the same tissue over time, AsymMirai achieved a 3-year AUC of 0.92 (95% CI: 0.86, 0.97).
这项研究包含了来自81,824名患者(平均年龄59.4岁 ± 11.4 [SD])的210,067张筛查乳腺X光片。通过深度学习提取的双侧差异性产生了与Mirai相似的风险评分(1年风险预测的相关系数 r = 0.6832;4–5年预测的相关系数 r = 0.6988),并达到了与Mirai相似的性能。对于AsymMirai,1年乳腺癌风险的AUC为0.79(95% CI: 0.73, 0.85),Mirai为0.84(95% CI: 0.79, 0.89;P = .002);5年风险的AUC为0.66(95% CI: 0.63, 0.69),Mirai为0.71(95% CI: 0.68, 0.74;P < .001)。在183名患者的亚组中,AsymMirai重复突出显示相同组织,AsymMirai在3年内的AUC为0.92(95% CI: 0.86, 0.97)。
Figure
图
Figure 1: Exclusion flowchart for the validation cohort. The EMory BrEast imaging Dataset (EMBED) validation split included 23382 patients and 76373 examinations from 2013 to 2020. Examinations with data abnormalities (42 patients, 1344 examinations), examinations without two-dimensional (2D) images (88 patients, 2271 examinations), examinations without all four screening views (5810 patients, 28175 examinations), and diagnostic examinations (1228 patients, 2595 examinations) were excluded. The resulting cohort included 16314 patients with 41988 examinations. The number of patients and examinations with sufficient follow-up data to evaluate 1-year (16314 patients, 41988 examinations), 2-year (10523 patients, 28895 examinations), 3-year (8408 patients, 21274 examinations), 4-year (6807 patients, 15414 examinations), and 5-year (5419 patients, 10598 examinations) areas under the receiver operating characteristic curve are at the bottom of the figure.
图 1: 验证队列的排除流程图。EMory BrEast 成像数据集 (EMBED) 的验证分组包括 2013 年至 2020 年的 23382 名患者和 76373 次检查。排除了数据异常的检查(42 名患者,1344 次检查)、没有二维(2D)图像的检查(88 名患者,2271 次检查)、缺少所有四个筛查视图的检查(5810 名患者,28175 次检查),以及诊断性检查(1228 名患者,2595 次检查)。最终队列包括 16314 名患者和 41988 次检查。底部的结果显示了用于评估 1 年(16314 名患者,41988 次检查)、2 年(10523 名患者,28895 次检查)、3 年(8408 名患者,21274 次检查)、4 年(6807 名患者,15414 次检查)和 5 年(5419 名患者,10598 次检查)接收者操作特征曲线下面积的数据充分随访的患者和检查数量。
Figure 2: Architecture comparison of AsymMirai (left) and Mirai (right). Both models feed the four screening views into the same convolutional neural network (CNN) layers, but reasoning diverges thereafter. AsymMirai has fewer computational layers and instead calculates differences in the latent features, as shown by heat maps in the craniocaudal (CC) asymmetry and mediolateral oblique (MLO) asymmetry steps. AsymMirai then finds the prediction window containing the highest differences for each view, represented by red boxes in the Get Prediction Window step. The maximum feature differences within these windows are averaged to create a risk score. The Mirai architecture was described by Yala et al (13). AHL = additive hazard layer.
图 2: AsymMirai(左)与 Mirai(右)的架构比较。两种模型都将四个筛查视图输入相同的卷积神经网络(CNN)层,但之后的推理过程有所不同。AsymMirai 具有较少的计算层,而是计算潜在特征的差异,如在 craniocaudal(CC)不对称和 mediolateral oblique(MLO)不对称步骤中所示的热图。AsymMirai 然后在每个视图中找到包含最高差异的预测窗口,这些窗口在获取预测窗口步骤中用红色框表示。这些窗口内的最大特征差异被平均以生成风险评分。Mirai 的架构由 Yala 等人描述(13)。AHL = 加性风险层。
Figure 3: AsymMirai model outputs. Input images are full-field screening mammograms. The two bilateral screening images are overlayed within the heat map, and the prediction window (red box) indicates the area with the highest dissimilarity. The heat map and prediction window are visualizations of AsymMirai’s model outputs, not post hoc saliency maps such as GradCAM. Analyzing these outputs provides a deeper understanding of the scores, in these cases distinguishing confounded reasoning from nonconfounded reasoning for patients with macro asymmetries. (A–C) Images in patients who developed cancer within 1–5 years. (A) In a 49-year-old White woman with unilateral breast augmentation who underwent annual screening, AsymMirai predicted high risk for developing cancer. Biopsy confirmed invasive ductal carcinoma in the right breast 5 years later. The prediction window was not affected by the unilateral implant. (B) In a 43-year-old African American woman with initial screening at 42 years old, AsymMirai predicted high risk of developing cancer. The prediction window corresponds to retroareolar asymmetry. Biopsy performed 4 years later confirmed invasive ductal carcinoma in the right breast. Intramammary lymph nodes were correctly ignored. (C) In a 50-year-old African American woman with regular screening and coarse heterogenous calcifications at the 12-o’clock position, AsymMirai predicted high risk for developing cancer. Biopsy confirmed bilateral invasive ductal carcinoma 20 months later, with the cancer in the left breast occurring in the 12-o’clock position. (D–F) Images in patients who did not develop cancer but had identifiably confounded risk predictions. (D) In a 60-year-old White woman with bilateral breast augmentation and regular screening mammograms, AsymMirai predicted moderate risk for developing cancer, confounded by artificial asymmetry caused by the exclusion of the implant from the right craniocaudal view. (E) In a 73-year-old White woman with regular screening mammograms and known dystrophic calcifications in the left breast, AsymMirai predicted high risk for developing cancer, confounded by poor positioning in the left mediolateral oblique view and possible distortion in the right mediolateral oblique view. (F) In a 65-year-old African American woman with bilateral benign microcalcifications, AsymMirai predicted moderate risk for developing cancer, confounded by the calcifications. Among the patients with no cancer, Mirai correctly identified the patient in D as having a low risk for developing cancer (20th percentile risk) but also misclassified patients in Eand F (84th and 95th percentiles, respectively). These examples were chosen without knowledge of Mirai’s risk scores. Unlike when reviewing the tissue in AsymMirai prediction window, there is no way to ex ante identify the cases where Mirai was confounded because it produces only a score. CC = craniocaudal, IDC = invasive ductal carcinoma, MLO = mediolateral oblique.
图 3: AsymMirai 模型输出。输入图像为全视场筛查乳腺 X 光照片。两侧对比的筛查图像叠加在热图中,预测窗口(红框)指示出具有最高差异的区域。热图和预测窗口是 AsymMirai 模型输出的可视化结果,而非如 GradCAM 等后验显著性图。分析这些输出提供了对评分的更深入理解,在这些例子中,能够区分因宏观不对称导致的混淆推理和非混淆推理。(A–C) 乳腺癌在 1–5 年内发展的患者图像。(A) 49 岁的白人女性,单侧乳房假体,接受年度筛查,AsymMirai 预测出高风险发展癌症。5 年后活检确认右乳侵袭性导管癌。预测窗口未受单侧植入物的影响。(B) 43 岁的非裔美国女性,首次筛查时 42 岁,AsymMirai 预测出高风险发展癌症。预测窗口对应于乳晕后不对称。4 年后活检确认右乳侵袭性导管癌。乳腺内淋巴结被正确忽略。(C) 50 岁的非裔美国女性,定期筛查,12 点位置有粗糙的异质钙化,AsymMirai 预测出高风险发展癌症。20 个月后活检确认双侧侵袭性导管癌,左乳癌发生在 12 点位置。(D–F) 没有发展癌症但有明显混淆风险预测的患者图像。(D) 60 岁的白人女性,双侧乳房植入,定期筛查乳腺 X 光,AsymMirai 预测出中等风险发展癌症,混淆因右侧 craniocaudal 视图排除了植入物而导致的人工不对称。(E) 73 岁的白人女性,定期筛查乳腺 X 光,左乳已知的营养性钙化,AsymMirai 预测出高风险发展癌症,混淆因左侧 mediolateral oblique 视图的位置不佳和右侧 mediolateral oblique 视图的可能扭曲。(F) 65 岁的非裔美国女性,双侧良性微钙化,AsymMirai 预测出中等风险发展癌症,混淆因钙化。对于没有癌症的患者,Mirai 正确地将(D)的患者识别为低风险(20th 百分位风险),但也错误分类了(E)和(F)的患者(分别为84th 和95th 百分位)。这些例子在不知道 Mirai 风险评分的情况下选择。与在 AsymMirai 预测窗口中查看组织时不同,没有方法可以事先识别出 Mirai 混淆的情况,因为它只生成一个评分。CC = craniocaudal,IDC = 侵袭性导管癌,MLO = mediolateral oblique。
Figure 4: Prediction power of AsymMirai location consistency. (A) Full-field screening mammograms obtained at three time points in a White woman. AsymMirai predicted moderate risk for developing cancer, with high location consistency across three screenings. The patient was diagnosed with ductal carcinoma in situ in 2020. The location consistency is defined in Appendix S5. Consistency is expressed as the percentage of the window shift, with a shift of 100% representing no overlap from one year to the next. The red boxes are AsymMirai’s prediction windows for each examination. (B) Graph of AsymMirai 3-year risk area under the receiver operating characteristic (ROC) curve (AUC) for patient subgroups with increasing location inconsistency. The x-axis is the number of patients included in the subgroup. Model performance is highest for patients with the highest location consistency (left part of the plot), as measured by the shift from the preceding examination’s prediction window location. The shaded areas represent the 95% CIs at each threshold. (C) Graph of AsymMirai 3-year risk AUC for patient subgroups with increasing location inconsistency. Same as in B, except for the x-axis, location consistency is expressed as the window shift percentage. The dotted vertical line indicates a window shift of 50%. (D) AsymMirai ROC curves for selected location consistency thresholds as measured by the shift from the previous prediction window location. Model performance improved for patients with high location consistency between examinations, as indicated by lower window shifts. The legend contains the number of patients with an examination satisfying each threshold followed by the number of patients with at least one 3-year valid examination from each subgroup. A 3-year valid examination can include either 3 years of negative screening follow-up or a cancer diagnosis within 3 years. CC = craniocaudal, FPR = false-positive rate, MAX = maximum, MLO = mediolateral oblique, TPR = true-positive rate.
图 4: AsymMirai 位置一致性的预测能力。(A) 在三次时间点获取的全视场筛查乳腺 X 光照片,拍摄对象为一名白人女性。AsymMirai 预测出中等风险发展癌症,并在三次筛查中显示出高位置一致性。该患者在 2020 年被诊断为导管内癌。位置一致性定义在附录 S5 中。位置一致性表示为窗口偏移的百分比,100% 的偏移表示一年到下一年之间没有重叠。红框为每次检查的 AsymMirai 预测窗口。(B) AsymMirai 3 年风险的受试者工作特征曲线(ROC)下面积(AUC)图,按位置不一致性递增的患者子组。x 轴为子组中包含的患者数量。模型性能在位置一致性最高的患者(图左部分)中最优,以前一次检查预测窗口位置的偏移量来衡量。阴影区域表示各阈值的 95% 置信区间(CI)。(C) AsymMirai 3 年风险 AUC 图,按位置不一致性递增的患者子组。与(B)相同,但 x 轴表示窗口偏移百分比。虚线垂直线表示 50% 的窗口偏移。(D) AsymMirai 在选定位置一致性阈值下的 ROC 曲线,以前一个预测窗口位置的偏移量来测量。对于检查之间位置一致性高的患者,模型性能有所提升,表现为较低的窗口偏移。图例包含满足每个阈值的患者数量以及每个子组中至少有一个有效 3 年检查的患者数量。有效的 3 年检查可以包括 3 年的阴性筛查随访或 3 年内的癌症诊断。CC = craniocaudal,FPR = 假阳性率,MAX = 最大值,MLO = mediolateral oblique,TPR = 真阳性率。
Figure 5: Comparison of the performance of Mirai and AsymMirai on EMory BrEast imaging Dataset (EMBED) validation screening mammograms. (A) AsymMirai 1–5-year breast cancer risk prediction receiver operating characteristic (ROC) curves and area under the curve (AUC) values, with 95% CIs in parentheses. (B) Mirai 1–5-year breast cancer risk prediction ROC curves and AUC values, with 95% CIs in parentheses. The AUC CIs for AsymMirai and Mirai overlap for each year. (C) Density plots show prediction correlation for AsymMirai and Mirai with 1-, 3-, and 5-year risk. The Pearson correlation coefficients were 0.6832 (95% CI: 0.6780, 0.6882), 0.7011 (95% CI: 0.6962, 0.7059), and 0.6987 (95% CI: 0.6938, 0.7036) for 1-, 3-, and 5-year risk, respectively. The 2- and 4-year risks are omitted because the predictions are the same as those for the 3- and 5-year risks, respectively
图 5:Mirai 和 AsymMirai 在 EMory BrEast imaging Dataset (EMBED) 验证筛查乳腺X光照片上的性能比较
(A) AsymMirai 在 1–5 年乳腺癌风险预测中的接收操作特征(ROC)曲线及曲线下面积(AUC)值,括号中为 95% 置信区间(CI)。
(B) Mirai 在 1–5 年乳腺癌风险预测中的 ROC 曲线及 AUC 值,括号中为 95% CI。AsymMirai 和 Mirai 的 AUC 置信区间在每年之间重叠。
(C) 密度图显示了 AsymMirai 和 Mirai 对于 1 年、3 年和 5 年风险的预测相关性。Pearson 相关系数分别为 0.6832(95% CI: 0.6780, 0.6882)、0.7011(95% CI: 0.6962, 0.7059)和 0.6987(95% CI: 0.6938, 0.7036)。2 年和 4 年的风险预测因与 3 年和 5 年风险预测相同而被省略。
Table
表
Table 1: Descriptive Statistics of Patients Included in the Validation Data Set
表 1:验证数据集中纳入的患者的描述性统计
Table 2: AsymMirai and Mirai Subgroup Performance Analysis
表 2:AsymMirai 和 Mirai 子组性能分析