基于人工智能的高分辨率MRI直肠癌半自动分割技术

翻译已于 2023-03-01 20:34:32 修改 · 749 阅读

CC 4.0 BY-SA版权

原文链接：http://cwres.ncu.edu.cn/s/org/plos/journals/G.https/plosone/article?id=10.1371/journal.pone.0269931&;x-chain-id=87ggu1lmphq8

文章标签：

#人工智能 #算法

于 2023-03-01 11:15:48 首次发布

学习笔记专栏收录该内容

10 篇文章

订阅专栏

研究开发了一种基于人工智能的直肠癌分割和分期算法，使用深度学习技术，通过高分辨率MRI图像对肿瘤、直肠和直肠系膜进行自动分割。该算法的分割精度较高，DSC分别为0.727、0.930和0.917。此外，它在区分T2和T3分期的诊断准确性为0.773的敏感性和0.768的特异性。该算法有助于直肠癌的风险分层和个体化治疗策略制定，有望在个性化医疗中发挥作用。

原题：Artificial intelligence–based technology for semi-automated segmentation of rectal cancer using high-resolution MRI

作者：

Atsushi Hamabe,
Masayuki Ishii,
Rena Kamoda,
Saeko Sasuga,
Koichi Okuya,
Kenji Okita,
Emi Akizuki,
Yu Sato,
Ryo Miura,
Koichi Onodera,
Masamitsu Hatakenaka,
Ichiro Takemasa

Abstract

Aim

Although MRI has a substantial role in directing treatment decisions for locally advanced rectal cancer, precise interpretation of the findings is not necessarily available at every institution. In this study, we aimed to develop artificial intelligence-based software for the segmentation of rectal cancer that can be used for staging to optimize treatment strategy and for preoperative surgical simulation.

Method

Images from a total of 201 patients who underwent preoperative MRI were analyzed for training data. The resected specimen was processed in a circular shape in 103 cases. Using these datasets, ground-truth labels were prepared by annotating MR images with ground-truth segmentation labels of tumor area based on pathologically confirmed lesions. In addition, the areas of rectum and mesorectum were also labeled. An automatic segmentation algorithm was developed using a U-net deep neural network.

Results

The developed algorithm could estimate the area of the tumor, rectum, and mesorectum. The Dice similarity coefficients between manual and automatic segmentation were 0.727, 0.930, and 0.917 for tumor, rectum, and mesorectum, respectively. The T2/T3 diagnostic sensitivity, specificity, and overall accuracy were 0.773, 0.768, and 0.771, respectively.

Conclusion

This algorithm can provide objective analysis of MR images at any institution, and aid risk stratification in rectal cancer and the tailoring of individual treatments. Moreover, it can be used for surgical simulations.

摘要

目的

虽然 MRI 在指导局部晚期直肠癌的治疗决策方面具有重要作用，但并不是每个机构都能对其结果进行精确的解释。在这项研究中，我们的目标是开发基于人工智能的软件分割直肠癌，可用于分期优化治疗策略和术前手术模拟。

方法

对201例术前 MRI 图像进行训练数据分析。对103例切除标本进行圆形加工。利用这些数据集，根据病理证实的病变，用肿瘤区域的地面真实分割标记对 MR 图像进行注释，从而制备地面真实标记。此外，直肠和直肠系膜的区域也被标记。提出了一种基于深度神经网络的自动分割算法。

结果

该算法可以估算肿瘤、直肠和直肠系膜的面积。对于肿瘤、直肠和直肠系膜，手工分割和自动分割的 Dice 相似系数分别为0.727、0.930和0.917。T2/T3诊断敏感性、特异性和总体准确性分别为0.773、0.768和0.771。

结论

该算法可以为任何机构的 MR 图像提供客观分析，有助于直肠癌的危险分层和个体化治疗。此外，它还可以用于外科模拟。

Introduction

In rectal cancer treatment, accurate diagnosis is crucial in determining individual treatment
strategies and achieving curable resection. Multidisciplinary treatment including preoperative
chemoradiotherapy is standard therapy for locally advanced rectal cancer (LARC) to prevent
local recurrence after total mesorectal excision (TME), and here MRI has the pivotal role of
defining the baseline stage of rectal cancer [1, 2]. ESMO and NCCN guidelines recommend
MRI as a mandatory preoperative examination [3, 4].

Although the accuracy of MRI in predicting the stage of rectal cancer has been high in previous studies comparing MRI findings with histopathology in relatively small series, the MERCURY study that prospectively incorporated larger series did not replicate the prior excellent
results [5–11]. In addition, when expert radiologists interpreted the MR images according to
strictly defined protocols, satisfactory accuracy was maintained, but this is not necessarily the
practice at every institution [12]. Other possible concerns include inter-observer differences in
difficult cases, or the shortage of specialized radiologists in some developed countries [13, 14].
If a system supporting MRI diagnosis could be implemented, it would be useful in many
circumstances.

Recent progress in applied artificial intelligence (AI) has increased its importance in medical care, especially in medical image analysis [15–17]. The use of AI-based diagnostic supporting technology is enabled by advances in deep learning technology (DL). With the use of a
substantial number of high-quality training datasets, DL can make an algorithm that predicts
clinical output with high accuracy. Ronneberger et al. introduced the U-net for the segmentation of two-dimensional (2D) biomedical images [18], and Milletrai et al. extended the U-net
to three-dimensional (3D) images [19]. Regarding tumor segmentation from MR images, the
previous studies used these 2D or 3D U-nets and showed that the results of segmentation were
comparable to those achieved by human experts in multiple types of cancer [20, 21]. While
there have been several studies attempting to segment rectal cancers, the depth of tumor invasion could not be assessed or the accuracy of segmentation could stand further improvement
[22, 23]. We have performed the PRODUCT study (UMIN000034364), in which we measured
the circumferential resection margin (CRM) of LARC as a primary endpoint in laparoscopic
surgery. Resected specimens including rectal cancer were processed in a circular shape with
mesorectum attached for pathological diagnosis, though this has not been the general practice
in Japan. In addition, we started to measure CRM according to the practice in Western countries, not only in the cases enrolled in the PRODUCT study but also in other LARC cases as a
clinical practice. As a spin-off, available sections of these specimens show the areas of LARC
that correspond to the MR images, thus providing high-quality training datasets which we
consider advantageous in making ground-truth labels that can be used for DL.

Based on this background, we hypothesized that DL might resolve the difficulties related to
MRI diagnosis by using MR images annotated with ground-truth labels reflecting the pathologically proved cancer area. In this study, we aimed to develop AI-based software to support the staging diagnosis of rectal cancer and to visualize the segmentation of rectal cancer, which can be used to optimize treatment strategy and in surgical simulations.

引言

在直肠癌治疗中，准确的诊断对于确定个体治疗策略和实现可治愈的切除至关重要。包括术前化疗在内的多学科治疗是局部晚期直肠癌（LARC）的标准疗法，以防止全直肠系膜切除术（TME）后的局部复发，在此，MRI对于确定直肠癌的基线阶段具有关键作用[1, 2]。ESMO和NCCN指南推荐MRI作为术前必做的检查[3, 4]。

尽管在以前的研究中，MRI预测直肠癌分期的准确性很高，在相对较小的系列中比较MRI结果和组织病理学，但MERCURY研究前瞻性地纳入了较大的系列，没有复制之前的优秀结果[5-11]。此外，当专家放射科医生根据严格定义的方案解释MR图像时，保持了令人满意的准确性，但这不一定是每个机构的做法[12]。其他可能的顾虑包括疑难病例的观察者之间的差异，或者一些发达国家专业放射医师的短缺[13, 14]。如果能够实现支持MRI诊断的系统，它在很多情况下都是有用的。

最近，应用人工智能（AI）的进展增加了其在医疗方面的重要性，特别是在医学图像分析方面[15-17]。基于人工智能的诊断支持技术的使用是由深度学习技术（DL）的进展促成的。通过使用大量高质量的训练数据集，DL可以做出预测临床输出的高精确度的算法。Ronneberger等人介绍了用于二维（2D）生物医学图像分割的U-net[18]，Milletrai等人将U-net扩展到三维（3D）图像[19]。关于MR图像中的肿瘤分割，以前的研究使用了这些二维或三维U型网，并显示在多种类型的癌症中，分割的结果与人类专家取得的结果相当[20, 21]。虽然有一些研究试图对直肠癌进行分割，但无法评估肿瘤侵犯的深度，或者分割的准确性有待进一步提高[22, 23]。我们进行了PRODUCT研究（UMIN000034364），其中我们测量了LARC的周向切除边缘（CRM）作为腹腔镜手术的主要终点。包括直肠癌在内的切除标本被处理成圆形，并将直肠系膜连接起来进行病理诊断，尽管这并不是日本的普遍做法。此外，我们开始按照西方国家的做法测量CRM，不仅是在PRODUCT研究中加入的病例，而且还在其他LARC病例中作为一种临床实践。作为一项附带成果，这些标本的现有切片显示了与MR图像相对应的LARC区域，从而提供了高质量的训练数据集，我们认为这对制作可用于DL的地面真实标签是有利的。

基于这一背景，我们假设DL可能通过使用反映病理证明的癌症区域的、带有地面真实标签的MR图像来解决与MRI诊断相关的困难。在这项研究中，我们旨在开发基于人工智能的软件，以支持直肠癌的分期诊断，并将直肠癌的分割可视化，这可用于优化治疗策略和手术模拟中。

Materials and methods

Patients

The patients who underwent surgery for rectal cancer between January 2016 and July 2020 in our institution were retrospectively analyzed (Fig 1). A total of 201 MRI exams were used for training data (Table 1). Of these, a resected specimen was processed in a circular shape in 103 cases, and neoadjuvant treatment was administered in 55 cases. A total of 98 opened specimens in which mesorectum was detached according to the standard Japanese procedure were included in the analysis. The protocol for this research project was approved by the Ethics Committee of Sapporo Medical University. Informed consent was not required due to the fact that data was anonymized. The procedures were in accordance with the provisions of the Declaration of Helsinki of 1995 (as revised in Brazil, 2013).

材料和方法

患者

对我院2016年1月至2020年7月期间接受直肠癌手术的患者进行回顾性分析（图1）。共有201个MRI检查被用于训练数据（表1）。其中，103例切除的标本被处理成圆形，55例进行了新辅助治疗。共有98例按照日本的标准程序分离直肠系膜的开放标本被纳入分析。本研究项目的方案得到了札幌医科大学伦理委员会的批准。由于数据是匿名的，所以不需要知情同意。该程序符合1995年《赫尔辛基宣言》的规定（2013年在巴西修订）。

Fig 1. Details of a total of 201 cases used as training data. Group 1 images were used to prepare ground-truth labels for segmentation. Group 2
images were used as ground-truth labels having pathological information of T staging alone.
图1。总共201个案例的详细信息用作培训数据。第1组图像用于准备用于分割的地面真相标签。第2组图像被用作仅具有T分期的病理信息的基本事实标记。

Table 1. Summary of the analyzed cases.

表1。分析案例汇总

Magnetic resonance imaging

MR images were acquired using a 3.0-T (N = 93) or 1.5-T (N = 108) MR scanner (Ingenia; Philips Healthcare, Best, the Netherlands). A phased-array coil (dStream Torso coil; Philips Healthcare, Best, the Netherlands) was used for signal reception. In 4 patients who were referred from the other hospitals, different MR scanners were used (3.0-T Skyra; Siemens, Erlangen, Germany in 2 and 1.5-T Signa HDXt; GE Healthcare, Cleveland, OH, USA in 2, respectively). Before examination, bowel peristalsis was prevented by intramuscular injection of butylscopolamine if possible. Neither bowel preparation nor air insufflation was performed. After identifying the tumor on sagittal T2-weighted images, axial T2-weighted images were acquired in which the angle of the plane was made perpendicular to the long axis of the tumor (TR/TE, 4000/90 ms; 3-mm slice thickness; 0.5-mm interslice gap; 150-mm field of view; 288 × 288 matrix; spatial resolution, 0.52 × 0.52 pixel size). Three-dimensional isotropic T2-weighted fast spin-echo was also acquired routinely since October 2018 (TR/TE, 1500/200 ms; 256-mm field of view; 288 × 288 matrix; spatial resolution 0.89 × 0.89 mm).

Processing of resected specimen

In the PRODUCT study, we developed a new method to precisely measure the pathological CRM, which we named “transverse slicing of a semi-opened rectal specimen” [24]. First, the anterior side of the rectum is opened longitudinally from the oral stump to the anal side up to 2 cm oral to the tumor border. Similarly, the rectum is opened on the anal side to the tumor if sufficient distal margin is resected. That is, the area of rectum between 2 cm above and below the borders of the rectal cancer is not incised. The mesorectum attached to the opened region of the rectum is removed to harvest embedded lymph nodes, while the mesorectum is left attached where the rectum is not opened. After the removal of the mesorectum, the dissection plane is marked using India ink for the purpose of demarcating it and supporting CRM measurement. Next to the inking, a piece of soft sponge is inserted in the rectal lumen to keep the in situ circular shape and the specimen is pinned to a cork board under gentle tension, followed by fixation in 10% formalin. After fixation, a circular area of the rectum is transversely sliced as thinly as possible. Pathologists analyzed all sections after staining with hematoxylin-eosin and diagnosed pathological findings.

Ground-truth label

Since we use a supervised training method to develop automatic segmentation algorithms, ground-truth labels were required. For all 201 cases, baseline T stages were labeled based on the pathological diagnosis or on the assessment of pathological sections if the patients had undergone neoadjuvant treatment. Segmentation labels, which represent whether each voxel of an MR image belongs to the target subject or not, were prepared for 135 of the 201 cases by two surgeons (AH and MI) who each has more than 10 years’ clinical experience treating colorectal cancer. Before starting the analysis, they received several lectures from a qualified pathologist to train them to find the area of rectal cancer or to predict the baseline area of rectal cancer before neoadjuvant treatment by discriminating fibrosis or necrosis on hematoxylin and eosin sections. These surgeons created MR images annotated with ground-truth segmentation labels, including the areas of tumor, rectum, and mesorectum, using 3D MRI analysis software (Fig 2). The rectal area was defined as the area within the muscularis propria.

磁共振成像

使用3.0T（N = 93）或1.5T（N = 108）MR扫描仪（Ingenia；Philips Healthcare，Best，荷兰）获取MR图像。一个相控阵线圈（dStream Torso线圈；Philips Healthcare, Best, the Netherlands）被用于信号接收。4名从其他医院转来的患者使用了不同的MR扫描仪（3.0-T Skyra; Siemens, Erlangen, Germany的2台和1.5-T Signa HDXt; GE Healthcare, Cleveland, OH, USA的2台，分别）。在检查前，如果可能的话，通过肌肉注射丁酰东莨菪碱来防止肠道蠕动。既不进行肠道准备，也不进行空气灌注。在矢状面T2加权图像上确定肿瘤后，获取轴向T2加权图像，其中平面角度垂直于肿瘤的长轴（TR/TE，4000/90毫秒；3毫米切片厚度；0.5毫米片间间隙；150毫米视场；288×288矩阵；空间分辨率，0.52×0.52像素大小）。自2018年10月起，还常规采集了三维各向同性T2加权快速自旋回波（TR/TE，1500/200 ms；256-mm视野；288×288矩阵；空间分辨率0.89×0.89 mm）。

对切除的标本进行处理

在PRODUCT研究中，我们开发了一种精确测量病理CRM的新方法，我们将其命名为 "半开式直肠标本的横向切片"[24]。首先，直肠前侧从口腔残端向肛门一侧纵向打开，直至肿瘤边界的口腔2厘米。同样地，如果切除了足够的远端边缘，则在肛门一侧打开直肠至肿瘤。也就是说，直肠癌边界上下2厘米之间的直肠区域不被切开。附着在直肠开放区域的直肠系膜被切除，以收获嵌入的淋巴结，而在直肠未开放的地方，直肠系膜则被留下。在切除直肠系膜后，用印度墨水标记剥离平面，以划定其界限并支持CRM测量。在墨迹旁边，在直肠腔内插入一块柔软的海绵，以保持原位的圆形，在轻柔的张力下将标本钉在软木板上，然后用10%福尔马林固定。固定后，将直肠的一个圆形区域横向切成尽可能薄的片状。病理学家在用苏木精-伊红染色后对所有切片进行分析，并诊断出病理结果。

地面真相标签

由于我们使用监督训练方法来开发自动分割算法，所以需要地面真实的标签。对于所有的201个病例，基线T分期是根据病理诊断或病理切片的评估（如果患者接受了新辅助治疗）来标记的。两个外科医生（AH和MI）为201个病例中的135个准备了分割标签，这些标签代表了MR图像的每个体素是否属于目标对象，他们各自有超过10年的治疗结直肠癌的临床经验。在开始分析之前，他们接受了一位合格的病理学家的几次讲座，以训练他们找到直肠癌的区域，或通过区分苏木精和伊红切片上的纤维化或坏死来预测新辅助治疗前直肠癌的基线区域。这些外科医生使用三维MRI分析软件创建了带有地面真实分割标签的MR图像，包括肿瘤、直肠和直肠系膜的区域（图2）。直肠区域被定义为固有肌层内的区域。

Fig 2. Preparation for ground-truth segmentation labels. (a) Section of a circular specimen. (b) Pathological section of the specimen
stained with hematoxylin-eosin revealing areas of tumor, rectum, and mesorectum. (c) Axial MR image of the rectal cancer. (d) Groundtruth segmentation labels were used to annotate the MR images. The areas colored magenta, yellow, and cyan represent tumor, rectum,
and mesorectum, respectively.

图2。地面真相分割标签的准备。（a）圆形试样的截面。（b）用苏木精-伊红染色的标本病理切片显示肿瘤、直肠和直肠系膜区域。（c）癌症的轴向MR图像。（d）地面真相分割标签用于注释MR图像。洋红色、黄色和青色区域分别代表肿瘤、直肠和直肠系膜。

Automatic segmentation algorithm

We developed an automatic segmentation algorithm that extracts the tumor, rectum, and mesorectum areas in 3D from T2-weighted MR images using a deep neural network. The network architecture is a 3D variant of U-net, which is popular for biomedical image segmentation [18]. It consists of encoder and decoder parts with skip connections (Fig 3). The convolutional block in each encoder and decoder consists of a 3 × 3 × 3 or 1 × 3 × 3 convolution layer, a batch normalization layer, and rectified linear unit operations. The deconvolution blocks are transposed convolutional operators with a kernel size of 4 × 4 × 4 voxels. The skip connections include a 1 × 1 × 1 convolution layer, a batch normalization layer, and rectified linear unit operations. The input to the network is a 3D MR image. The output has same spatial dimensions as the input, with 3 channels each for the mesorectum area, rectum, and tumor area probabilities. The last three channels have values from 0.0 to 1.0 with application of the sigmoid function. Final segmentation results were obtained by binarizing the values, using a threshold of 0.5.

自动分割算法

我们开发了一种自动分割算法，该算法使用深度神经网络从T2加权MR图像中提取3D中的肿瘤、直肠和直肠系膜区域。网络架构是U-net的3D变体，在生物医学图像分割中很流行[18]。它由编码器和解码器部分组成，带有跳过连接（图3）。每个编码器和解码器中的卷积块由3×。去卷积块是具有4×4×4体素的核大小的转置卷积算子。跳跃连接包括1×1×1卷积层、批处理归一化层和校正线性单元操作。网络的输入是3D MR图像。输出具有与输入相同的空间维度，每个通道具有3个用于直肠系膜区域、直肠和肿瘤区域概率的通道。应用S形函数后，最后三个通道的值从0.0到1.0。通过使用阈值0.5对值进行二值化来获得最终分割结果。

Fig 3. U-net.

The architecture of the segmentation network for the areas of tumor, rectum, and mesorectum.

图3. U-net。肿瘤、直肠和中直肠区域的分割网络的结构。

Our algorithm calculates the T stage, following the binary segmentation results. The case is classified as T2 or below when the tumor area is not in contact with the contour of the area of the rectum and completely included in the area of the rectum. Otherwise, the case is classified as T3 or above when at least a part of the tumor area is outside the rectum. This rule exactly follows the T staging rules of tumor invasion into the area of the rectum (Fig 4). Generally, the DL-based segmentation method works to maximize the volume overlap between the segmentation result and the ground-truth label image. However, the risk of disagreement for T-staging would be inherent if T-staging were based on segmentation results of tumor and rectum that were mutually independent. To deal with this concern, we introduced a novel loss that can directly maximize T-staging accuracies in model training. The loss consists of two terms, as follows. The first term is so-called Dice loss [19], which for segmentation purposes is defined as follows: where N is the number of voxels, p is the probability that is outputted by the network, and g is the ground-truth label. This term works to maximize the overlap between the ground-truth label and the probability maps.

我们的算法是按照二元分割的结果计算T阶段。当肿瘤区域不与直肠区域的轮廓接触并完全包括在直肠区域内时，该病例被分类为T2或以下。否则，当肿瘤区域至少有一部分在直肠外时，该病例被归为T3或以上。这一规则完全遵循肿瘤侵入直肠区域的T分期规则（图4）。一般来说，基于DL的分割方法的工作是使分割结果和地面真实标签图像之间的体积重合度最大化。然而，如果T-分期是基于肿瘤和直肠的分割结果，那么T-分期的分歧风险将是固有的。为了解决这个问题，我们引入了一种新的损失，可以在模型训练中直接最大化T-分期的准确性。该损失由两个项组成，如下所示。第一个项是所谓的Dice损失[19]，对于分割的目的定义如下：其中N是体素的数量，p是网络输出的概率，g是地面真实标签。这个术语的作用是使地面真实标签和概率图之间的重合度最大化。

Fig 4. Staging algorithm.

Left, T2 case, and right, T3 case. The magenta, yellow, and cyan areas represent tumor, rectum and mesorectum, respectively.

图4. 分期算法。左为T2病例，右为T3病例。洋红色、黄色和青色区域分别代表肿瘤、直肠和直肠系膜。

The second term of the loss function is cross entropy loss, which for accurate staging purposes is defined as follows: where

损失函数的第二项是交叉熵损失，为了准确分期，它定义如下：其中

pcancer and prectal tube represent the probability maps of the tumor and rectum, respectively. pstaging indicates the probability of the predicted staging. It takes a high number when there is any voxel simultaneously having low rectum probability and high tumor probability. This term works to reduce the tumor area outside of the rectum for T2 cases. On the other hand, it works to increase the tumor area outside of the rectum for T3 cases.

To summarize, we minimize the loss function to train the network: λ is a parameter used to balance the two terms and it was experimentally determined to be 0.02. During the training, Loss

前列腺癌和直肠管分别代表肿瘤和直肠的概率图。分期表示预测分期的可能性。当有一个体素同时具有低的直肠概率和高的肿瘤概率时，它需要一个高的数字。这个术语的作用是减少 T2病例直肠外的肿瘤面积。另一方面，增加 T3病例直肠外的肿瘤面积。

总之，我们用最小化损失函数来训练网络: λ 是用来平衡两项的参数，实验确定为0.02。在训练期间，loss

SEG is evaluated only for the cases with ground-truth segmentation labels, while LossSTG is evaluated for all cases. We used the Adam optimizer to minimize the loss function, with the following parameters: base learning rate, 0.003; beta1, 0.9; beta2, 0.999; and epsilon, 1 × 10−8. The batch size was 5 samples, including 3 cases with ground-truth segmentation labels and 2 cases with only ground-truth staging. All experiments were conducted on an NVIDIA DGX-2 machine using the NVIDIA V100 GPU with 80 GB of memory.

In the network training, each training image is augmented by several image-processing techniques such as scaling, rotation, and slice thickness conversion to improve segmentation accuracies. Also, the input image is cropped around the tumor area and rescaled to a 0.5 mm3 isotropic voxel size and 256 × 256 × 128 voxel number. In the test phase, a user inputs an estimated center position of the tumor, and then the image around the tumor position is processed.

SEG 仅对带有地面真实分割标签的病例进行评估，而损失 STG 则对所有病例进行评估。我们使用 Adam 优化器来最小化损失函数，具有以下参数: 基础学习率，0.003; beta1,0.9; beta2,0.999; 和 epsilon，1 × 10-8。批量样本5例，其中3例有地面真相分段标记，2例仅有地面真相分期。所有的实验都是在 NVIDIA DGX-2机器上进行的，该机器使用了具有80GB 内存的 NVIDIA V100图形处理器。

在网络训练中，通过缩放、旋转、切片厚度转换等图像处理技术对每幅训练图像进行增强，以提高分割精度。此外，在肿瘤区域周围裁剪输入图像并重新缩放到0.5 mm 3各向同性体素大小和256 × 256 × 128体素数。在测试阶段，用户输入肿瘤的估计中心位置，然后对肿瘤位置周围的图像进行处理。

Workflow for evaluation and statistical analysis

We evaluated two aspects of the algorithm: segmentation accuracy and staging accuracy. Ten-fold cross validation was conducted. The data were randomly divided into 10 datasets. Eight datasets out of 10 were used for training the network parameters. The remaining two datasets were used for validation and evaluation, respectively. During the training iteration, the performance of the network was evaluated at every 100th iteration on the validation dataset. We chose the best network parameter for the validation dataset, using the sum of the dice score, sensitivity, and specificity, and then applied it to the evaluation dataset. We repeated this procedure ten times, changing the role of training, validation, and evaluation of each dataset.

Regarding the segmentation accuracy, we calculated the Dice similarity coefficients (DSC) between manual segmentation and automatic segmentation [25]. The DSC is defined as follows: where P is the segmentation result and G is the ground truth. The DSC ranges from 0.0 to 1.0, and DSC = 1.0 means that the results overlap completely. Note that, since not all of the training data have corresponding ground-truth segmentation, we evaluated the segmentation accuracies using 135 cases.

评估和统计分析工作流程

我们评估了该算法的两个方面：分割精度和分期精度。进行了十倍交叉验证。数据被随机分为10个数据集。10个数据集中的8个数据集被用于训练网络参数。其余两个数据集分别用于验证和评估。在训练迭代过程中，在验证数据集上每迭代100次，就对网络的性能进行一次评估。我们为验证数据集选择最佳的网络参数，使用骰子得分、灵敏度和特异性之和，然后将其应用于评估数据集。我们重复这个程序十次，改变每个数据集的训练、验证和评估的作用。

关于分割的准确性，我们计算了人工分割和自动分割之间的Dice相似性系数（DSC）[25]。DSC的定义如下：其中P是分割结果，G是地面真相。DSC的范围是0.0到1.0，DSC=1.0意味着结果完全重合。请注意，由于不是所有的训练数据都有相应的地面实况分割，我们用135个案例评估了分割精度。

Next, the T staging accuracies were evaluated with all 201 cases by calculating the sensitivity and specificity. The sensitivity is defined as follows: where P

接下来，通过计算敏感性和特异性来评估所有201例患者的T分期准确性。灵敏度定义如下：其中P

T3 represents the predicted T stage as being over T3. GT3 represents the ground-truth T stage as being over T3. Specificity is defined as follows: where P

T3表示预测的T阶段超过T3。GT3表示地面真相T阶段超过T3。特异性定义如下：其中P

T2 means the predicted T stage is under T2 and GT2 is means the ground-truth T stage is under T2.

T2表示预测的T阶段在T2之下，GT2表示地面真实的T阶段在T2之下。

Results are presented as the number of cases evaluated for categorical data and expressed as the median and interquartile range (IQR) for quantitative data. Univariate analysis was performed using the Wilcoxon rank-sum test. Statistical analyses were performed using JMP Pro 15.1.0 software (SAS Institute, Cary, NC, USA).

对于分类数据，结果以评估的病例数表示，对于定量数据，结果以中位数和四分位数范围（IQR）表示。单变量分析采用Wilcoxon排名和检验。统计分析使用JMP Pro 15.1.0软件（SAS Institute, Cary, NC, USA）进行。

Results

Segmentation accuracy

The developed algorithm could successfully estimate the areas of the tumor, rectum, and mesorectum, in which the ground-truth labels and segmentation results of typical cases corresponded well (Fig 5a). The summary of evaluation results regarding the segmentation accuracy demonstrated that the median DSCs for tumor, rectum, and mesorectum were 0.727, 0.930, and 0.917, respectively (Fig 5b). Mucinous cancer exhibits high intensity on T2 in contrast to the most common histology of adenocarcinoma. Therefore, we investigated DSCs in mucinous cancer patients (N = 6) to analyze whether this feature affects segmentation accuracy. As a result, the DSC was lower in the cases of mucinous cancer compared with those of the other histology (0.358 [0.167–0.596] vs 0.736 [0.605–0.801], P = 0.0024). In addition, on the assumption that the DSC of the tumor might easily have been lowered by a slight positional deviation in the smaller tumor, the correlation between the DSC and the diameter of the tumor was investigated after excluding mucinous cancer (Fig 5c). We then observed a significant correlation between the two values (Pearson correlation coefficient = 0.2418; P = 0.0081). After excluding cancers of diameters less than 20 mm, the median DSC of the tumor was slightly elevated, to 0.739 [0.615–0.801].

结果
分割精度

所开发的算法可以成功地估计出肿瘤、直肠和直肠系膜的面积，其中典型病例的地面实证标签和分割结果对应良好（图5a）。关于分割精度的评估结果总结表明，肿瘤、直肠和直肠系膜的DSCs中值分别为0.727、0.930和0.917（图5b）。粘液腺癌在T2上表现出高的强度，与最常见的腺癌组织学形成对比。因此，我们调查了粘液腺癌患者（N = 6）的DSC，以分析这一特征是否会影响分割的准确性。结果，与其他组织学的病例相比，粘液腺癌的DSC较低（0.358 [0.167-0.596] vs 0.736 [0.605-0.801], P = 0.0024）。此外，基于肿瘤的DSC可能很容易被较小的肿瘤的轻微位置偏差所降低的假设，在排除粘液性癌症后，调查了DSC与肿瘤直径之间的相关性（图5c）。然后我们观察到这两个值之间有明显的相关性（Pearson相关系数=0.2418；P=0.0081）。在排除了直径小于20毫米的癌症后，肿瘤的DSC中位数略有升高，达到0.739[0.615-0.801]。

Fig 5. Results of segmentation accuracy.

(a) Representative images of MRI, the ground-truth segmentation labels, and AI-predicated segmentations. (b) Summary of evaluation results regarding the segmentation accuracy. (c) Scatter plots showing the relationships between tumor diameter and the Dice similarity coefficients.

图5。分割精度的结果。

（a） MRI的代表性图像、地面真相分割标签和AI预测分割。（b）关于分割精度的评估结果汇总。（c）散点图显示了肿瘤直径和Dice相似系数之间的关系。

Correlation between pathological and AI T stage

The guidelines used worldwide regard distinguishing between T2 and T3 as one of the important factors directing treatment decisions. Therefore, we investigated our method’s diagnostic accuracy in discriminating T2 from T3 as an initial assessment. The summary of correlation between pathological T stage and AI-predicted T stage was analyzed (Table 2). The T-staging sensitivity, specificity, and overall accuracy were 0.773, 0.768, and 0.771, respectively. For comparison, we evaluated a baseline model that was trained by using a standard dice loss with only ground-truth segmentation labels. The baseline model obtained a sensitivity, specificity, and overall accuracy of 0.765, 0.756, and 0.761, showing that the AI developed in this study could achieve better performance in T-staging. As in the analysis of segmentation accuracy, the diagnostic accuracy was recalculated after the exclusion of small cancers and mucinous cancer. As a result, the T-staging sensitivity, specificity, and overall accuracy were 0.789, 0.714, and 0.762, respectively.

病理和AI T阶段之间的相关性

世界范围内使用的指南认为区分T2和T3是指导治疗决定的重要因素之一。因此，我们研究了我们的方法在区分T2和T3方面的诊断准确性，作为初步评估。我们分析了病理T期和AI预测的T期之间的相关性（表2）。T分期的敏感性、特异性和总体准确性分别为0.773、0.768和0.771。为了进行比较，我们评估了一个基线模型，该模型是通过使用标准的骰子损失来训练的，只有地面真实的分割标签。基线模型获得了0.765、0.756和0.761的敏感性、特异性和总体准确性，表明本研究中开发的人工智能在T分期中可以取得更好的性能。与分割准确性的分析一样，在排除了小癌和粘液癌后，重新计算了诊断准确性。结果，T分期的敏感性、特异性和总体准确性分别为0.789、0.714和0.762。

Table 2. Summary of pathological T stage and AI-predicted T stage.

表2.病理T期和AI预测的T期汇总。

Discussion

讨论

In this study, an algorithm for diagnosing and staging rectal cancer was successfully developed using DL technology. It could be used in future semi-automation software to aid physicians. The characteristic feature of this algorithm is that it can output the segmentation that visualizes the areas of tumor, rectum, and mesorectum. This could be used not only for T-factor staging, but also for preoperative surgical simulation. In the future, based on the provided visual information, we will be able to choose the surgical plane to be dissected or decide whether the combined resection of an adjacent organ is necessary. In addition, we think the algorithm will also help multidisciplinary teams tailor treatment to individual patients.

在这项研究中，利用DL技术成功开发了一种直肠癌诊断和分期的算法。它可用于未来的半自动软件，以帮助医生。该算法的特点是，它可以输出可视化的肿瘤、直肠和直肠系膜区域的分割。这不仅可以用于T因子分期，还可以用于术前手术模拟。将来，根据所提供的视觉信息，我们将能够选择要解剖的手术平面，或决定是否有必要联合切除邻近器官。此外，我们认为该算法还将帮助多学科团队针对个别病人进行治疗。

Two meta-analyses have investigated the diagnostic accuracy of MRI and shown favorable results, with about 85% sensitivity and 75% specificity for diagnosing tumor invasion beyond the muscularis propria [10, 11]. However, these results are subject to substantial selection bias, which can be associated with higher reported than actual accuracy. This is partly reflected by the fact that the carefully designed prospective study, MERCURY, demonstrated diagnostic accuracy that was acceptable but that did not reach the values reported in the meta-analyses. Accurately diagnosing rectal cancer using MRI would, in reality, not be easy. Furthermore, although MRI scanners are plentiful in Japan, certified radiologists are in quite short supply, leaving individual radiologists with excessive workloads. This is also the case in other developed countries [13, 14]. Given this situation, a method that can improve the acquisition of objective MRI findings at every institution is needed. We think the current algorithm might play a substantial role in providing equal access to MRI diagnosis in institutions or regions where there are shortages of trained personnel.

两项荟萃分析调查了MRI的诊断准确性，显示出良好的结果，诊断肿瘤侵入固有肌层之外的敏感性约为85%，特异性为75%[10, 11]。然而，这些结果受制于大量的选择偏倚，这可能与报告的准确性高于实际准确性有关。这部分反映在精心设计的前瞻性研究MERCURY所显示的诊断准确性是可以接受的，但没有达到荟萃分析中报告的数值。使用MRI准确诊断直肠癌，在现实中并不容易。此外，尽管日本有大量的MRI扫描仪，但有资质的放射科医生却相当短缺，使个别放射科医生的工作量过大。这在其他发达国家也是如此[13, 14]。鉴于这种情况，需要一种能够改善每个机构获取客观MRI结果的方法。我们认为目前的算法可能在为缺乏训练有素人员的机构或地区提供平等的MRI诊断方面发挥实质性作用。

As MRI technology has advanced in recent decades, it is important to re-evaluate the accuracy of MRI. Since neoadjuvant CRT was established as a standard treatment in Western countries, it has become difficult to validate the accuracy of baseline MRI findings by simply comparing them with the corresponding pathology. In the current study, we made a training dataset by annotating the pathologically proven tumor areas on MRI images. In the cases with neoadjuvant therapy, the baseline area of the tumor was predicted by the pathological evidence of fibrosis or necrosis. These processes might be useful in making reliable training datasets even in cases with neoadjuvant treatment, suggesting that the algorithm for segmentation might reflect the typical results of MRI today.

随着近几十年来MRI技术的发展，重新评估MRI的准确性是很重要的。自从新辅助CRT在西方国家被确立为标准治疗方法以来，仅仅通过与相应的病理学比较来验证基线MRI结果的准确性已经变得很困难。在目前的研究中，我们通过注释MRI图像上的病理证明的肿瘤区域，制作了一个训练数据集。在新辅助治疗的病例中，肿瘤的基线面积是由纤维化或坏死的病理证据预测的。这些过程可能有助于制作可靠的训练数据集，即使在新辅助治疗的病例中也是如此，这表明分割的算法可能反映了今天MRI的典型结果。

Some recent studies have tried to estimate rectal cancer–related parameters on preoperative MR images using AI, and have shown that the accuracy was acceptable [22, 26–28]. However, these studies had several limitations: tumor tissue was not visualized on the MR image, the relationship of the tumor with the mesorectal fascia was difficult to assess, the results were not based on high-resolution MRI, or the ground-truth labels were not based on pathological assessment, the last issue being the one we consider to be most critical. We think there is much room for improvement in the clinical application of AI. However, the software developed in this study has various strengths. First, the ground-truth labels are based on the pathological findings in circular specimens, providing the high-quality training datasets that are essential in establishing a reliable algorithm. Second, the algorithm can output the segmentation of the tumor, rectum, and mesorectum. This feature is valuable for staging the tumor, for individual multidisciplinary treatment decision making, and for the preoperative simulation that is required by colorectal surgeons in order to obtain curative resection. Third, we used high-resolution MRI in this analysis, though the MRI acquisition protocols differ from those used in the MERCURY study. Thus, this system can be applied anywhere if the appropriate protocol and an adequate scanner are used for image acquisition. We note that the accuracy of our algorithm was insufficient in analyzing some types of tumors, including mucinous cancer and small tumors. Although the quality of segmentation can also be regarded as favorable as a whole, it would be ideal if these hurdles were cleared with future refinement. However, because these small tumors rarely infiltrate the mesorectum or surrounding tissues, this algorithm can still be regarded as useful for diagnosing locally advanced rectal cancers.

最近的一些研究试图用人工智能估计术前MR图像上的直肠癌相关参数，结果显示准确性可以接受[22, 26-28]。然而，这些研究有几个局限性：肿瘤组织在MR图像上不明显，肿瘤与直肠系膜的关系难以评估，结果不是基于高分辨率的MRI，或者基础真实标签不是基于病理评估，最后一个问题是我们认为最关键的。我们认为在人工智能的临床应用方面还有很大的改进空间。然而，本研究中开发的软件有各种优势。首先，地面真实标签是基于圆形标本的病理结果，提供了高质量的训练数据集，这对建立一个可靠的算法至关重要。其次，该算法可以输出肿瘤、直肠和直肠系膜的分割。这一特点对于肿瘤的分期、个人多学科治疗决策以及结直肠外科医生为获得治愈性切除所需的术前模拟都很有价值。第三，我们在这个分析中使用了高分辨率的MRI，尽管MRI的采集方案与MERCURY研究中使用的不同。因此，如果使用适当的方案和足够的扫描仪进行图像采集，这个系统可以应用于任何地方。我们注意到，在分析某些类型的肿瘤时，我们的算法的准确性是不够的，包括粘液性癌症和小肿瘤。虽然从整体上看，分割的质量也可以说是有利的，但如果这些障碍能在未来的完善中被清除，那就更理想了。然而，由于这些小肿瘤很少浸润直肠系膜或周围组织，该算法仍可被视为对诊断局部晚期直肠癌有用。

The current study has several limitations. First, validation using the test data acquired in various conditions should be performed to confirm the generalizability of the algorithm. Currently, we are planning a validation study using an independent large series to investigate the algorithm’s effectiveness. Simultaneously, we will continue to improve the software’s performance in assessing other important factors, including mesorectal fascia involvement. Second, the workload involved in preparing individual ground-truth labels is too heavy for the number of training sets to be readily increased. Third, as explained in the results, the accuracy of this system is still insufficient to be used for mucinous tumors and it is not able to estimate the shape of small tumors. We think this limitation can be overcome with the use of more training datasets in the future.

目前的研究有几个局限性。首先，应利用在各种条件下获得的测试数据进行验证，以确认该算法的通用性。目前，我们正在计划使用一个独立的大系列进行验证研究，以调查该算法的有效性。同时，我们将继续提高该软件在评估其他重要因素方面的性能，包括直肠系膜的参与。第二，准备单个真实标签的工作量太大，训练集的数量不能轻易增加。第三，正如结果中所解释的，这个系统的准确性仍然不足以用于粘液性肿瘤，而且它不能估计小肿瘤的形状。我们认为这一局限性可以在将来使用更多的训练数据集来克服。

In conclusion, we have successfully developed the first AI-based algorithm for segmenting rectal cancer. This system can provide stable results at any institution and contribute to rectal cancer risk stratification and the tailoring of individual treatments, and is likely to gain importance in the era of individualized medical care.

总之，我们成功地开发了第一个基于人工智能的直肠癌分割算法。该系统可以在任何机构提供稳定的结果，有助于直肠癌风险分层和个性化治疗的定制，并可能在个性化医疗时代发挥重要作用。