【无标题】

最新推荐文章于 2024-10-13 16:55:28 发布

qq_39472939

最新推荐文章于 2024-10-13 16:55:28 发布

阅读量273

点赞数 5

文章标签：深度学习人工智能计算机视觉

本文链接：https://blog.csdn.net/qq_39472939/article/details/128292863

版权

使用改进的UNet++架构对疾病进行自动显微镜诊断

Automatic microscopic diagnosis of diseases using an improved

UNet++ architecture

炭疽热是由炭疽杆菌引起的严重传染病。本文旨在设计并实现一种快速、可靠的基于患者组织样本显微图像处理的系统，用于炭疽等组织疾病的自动诊断、转移检测、患者预后等方面。提出了一种改进的UNet++体系结构来分割患者组织样本的显微镜图像。该模型通过在两条路径上添加跳跃连接，结合了多尺度特征；从编码器到解码器的前向路径和解码器到输出的路径。这些新的连接提高了UNet++的性能。在新的跳过连接中，压缩和激励初始块的集成为网络提供了不同规模、不同内核大小的特性。几个卷积网络被用作主干，在编码器部分提取强大的表示。在该模型中使用批归一化、退出技术和LRelu激活函数加速了模型的收敛速度，提高了模型的泛化能力

ABSTRACT

Anthrax is a severe infectious disease caused by the Bacillus anthracis bacterium. This paper aims to design and implement a fast and reliable system based on microscopic image processing of patient tissue samples for the automatic diagnosis of anthrax and other tissues diseases, metastasis detection, patient prognosis, etc. An improved UNet++ architecture is proposed to segment microscopic images of patient tissue samples. The proposed model combines multi-scale features by adding skip connections in two paths; the forward path from the encoder to the decoder and the decoder path to the output. These new connections improve the performance of the UNet++. Integration of the squeeze and excitation-inception blocks in the new skip connections provides the network with features at different scales with different kernel sizes. Several convolutional networks are used as the backbone to extract powerful representations in the encoder section. The use of batch normalization, dropout technique, and LRelu activation function in this model accelerates convergence and increases the generalization

power of the model. To overcome the problem of data imbalance of different classes, a weighted hybrid loss function is proposed, which further improved segmentation efficiency. The semantic segmentation results are converted to the instance segmentation using the marker-based watershed algorithm. Experimental results show that despite many challenges of microscopic image analysis, the proposed model is a reliable system for the automatic diagnosis of anthrax and other tissues diseases. It produces better results than state-of-the-art architectures.、、

炭疽热是人类和动物之间基本的常见疾病之一，由被称为炭疽杆菌的革兰氏阳性杆菌孢子引起。该疾病仍然是发展中国家的健康问题之一。大多数炭疽热的病例都是由职业伤害引起的，比如那些让野生动物面临危险的人。人类通过接触患有炭疽热的动物或与他们的产品，如羊毛有联系而被感染。全球95%以上的炭疽感染是皮肤型的，由细菌孢子进入伤口或皮肤划伤引起（Misgie et al.，2015）。炭疽热的初步诊断通常采用直接观察法和细菌培养法。最终诊断是可能在参考实验室或通过先进的分子方法，如聚合酶链反应（PCR）。此外，这种疾病的一种重要的诊断方法是使用染色的病理标本和观察革兰氏阳性、厚、长和四边形杆菌，

1. Introduction

Anthrax is one of the essential common diseases between humans

and animals, caused by the gram-positive bacillus spores called

B. anthracis. The disease is still one of the health problems of developing

countries. Most cases of anthrax are caused by occupational injuries,

such as people who keep wild animals at risk. Humans become infected

by contact with animals suffering from anthrax or by connection with

their products, such as wool. More than 95% of anthrax infections

worldwide are of the cutaneous type caused by the entry of bacterial

spores into wounds or skin scratches (Misgie et al., 2015).

The direct observation method and bacterial culture are commonly

used for the initial diagnosis of anthrax. Definitive diagnosis is possible

in a reference laboratory or by advanced molecular methods such as

polymerase chain reaction (PCR). Also, an important diagnostic method

for this disease is the use of stained pathology specimens and the

observation of gram-positive, thick, long, and quadrilateral bacillus,

有时会变成长链。检查大量包含组织样本的染色载玻片来诊断疾病是一项困难的任务，这取决于专家的视觉检查结果。另一方面，不同专家获得的结果很少相同（Yang et al.，2020；Doganay等人，2010）。如今，人工智能的重要性并不向任何人隐藏出来。同时，机器学习作为这一领域最重要的分支之一，在所有科学中都占有特殊的地位。自动化方法显著提高了显微图像中目标分类、检测和分割的速度和准确性。机器学习技术已广泛应用于医学和生物领域，并已取得非常成功（Goodfelletal.，2016）。深度学习通过解决医学图像处理领域的许多问题，如计算机断层扫描（CT）扫描、磁共振成像（MRI）、超声和病理图像。它是一种通过直接处理来提取适当特征的学习方法

which sometimes turns into long chains. Examining a large number of

stained slides containing tissue samples to diagnose the disease is a

difficult task that depends on the results of a visual examination by a

specialist. On the other hand, the results obtained by different specialists

are rarely the same (Yang et al., 2020; Doganay et al., 2010).

Today, the importance of artificial intelligence is not hidden from

anyone. Meanwhile, as one of the most important branches of this field,

machine learning has a special place in all sciences. Automated methods

have significantly improved the speed and accuracy of object classification, detection, and segmentation in microscopic images. Machine

learning techniques have been widely used in medical and biological

fields and have been very successful (Goodfellow et al., 2016). Deep

learning has demonstrated its power by solving many problems in the

field of medical image processing, such as classification, detection, and

segmentation of Computerized Tomography (CT)-scan, Magnetic Resonance Imaging (MRI), ultrasound, and pathology images. It is a learning

method that extracts appropriate features by directly processing the raw

用于检测对象、图像分割或分类的数据。该方法学习表示法并自动提取适当的特征，并得到更好的结果（LeCun et al.，2015）。深度学习在显微图像分析中引起了广泛的关注，包括细胞核检测、细胞分割、组织分割、图像分类等（西里尼昆瓦塔纳et al.，2016）。的确，计算机辅助显微镜成像在疾病的诊断和预后中起着重要的作用。显微镜图像分析已经能够帮助诊断各种疾病，包括乳腺癌、肺癌、脑瘤等。由于图像数据量大，手工处理很困难，有时甚至不可能。检查大量包含彩色样本的幻灯片是一项无聊的任务，这取决于专家的视觉检查结果。此外，一些专家的检查结果很少相同（Sommer和Gerlich，2013；Nie等人，2016；Grana和Chyzhyk，2015）

data to detect objects, image segmentation, or classification. This

method learns representation and automatically extracts appropriate

features, and leads to much better results (LeCun et al., 2015). Deep

learning has attracted a lot of attention in the analysis of microscopic

images, including nucleus detection, cell segmentation, tissue segmentation, image classification, and so on (Sirinukunwattana et al., 2016).

Indeed, computer-assisted microscopic imaging plays an important

role in diagnosing and prognosis of diseases. Microscopic image analysis

has been able to help diagnose various diseases, including breast cancer,

lung cancer, brain tumors, and more. Due to the high volume of image

data, manual processing is difficult and sometimes even impossible.

Examining a large number of slides containing colored samples is a

boring task, depending on a specialist’s visual examination results. Also,

the results of such examinations by several specialists are rarely the

same (Sommer and Gerlich, 2013; Nie et al., 2016; Grana and Chyzhyk,

2015)

此外，与MRI、超声等其他成像技术相比，显微成像具有更复杂的特征。高分辨率的显微镜图像包含复杂的模式和依赖关系，因此它们与疾病的类型或图像的标签有非常复杂的关系。在数字病理学中，图像数据通常使用一种特定的染色方法生成，该方法存在许多处理挑战，包括背景的复杂性、不均匀强度、细胞/核接触或重叠（Xing和Yang，2016；McCann et al.，2014）。

Furthermore, microscopic imaging has more complex features compared to other imaging techniques such as MRI, ultrasound, and more. High-resolution microscopic images contain complex patterns and dependencies so that they have very complex relationships with the type

of disease or the label of the images. In digital pathology, image data is usually generated using a specific staining method that has many processing challenges, including the complexity of background, nonuniform intensity, cell/nuclei contact, or overlapping (Xing and Yang, 2016; McCann et al., 2014).

在包括显微图像在内的最先进的研究工作中，使用深度学习进行医学图像分割，已经带来了有希望的结果。本研究中提出的分割方法一般分为语义分割和实例分割两类。语义分割是为所考虑的图像中的每个像素分配一个类标签的过程。卷积神经网络（CNNs）作为第一个语义分割网络，它认为这是一个在像素级对图像进行分类的过程，可以通过考虑从特定像素周围提取的补丁来使用。这样，在输入图像上使用一个滑动窗口来创建概率映射，然后通过阈值分割对图像进行分割。为了在显微图像中分割神经膜，Ciresan等人（2012）使用cnn对每个像素进行分类。cnn还被用于宫颈图像的分割（Song等人，2015），组织病理学图像中与乳腺癌相关的区域（Su等人，2015），以及骨骼图像中肌肉溢价的分割（Sapkota等人，2015）

The use of deep learning for medical image segmentation in state-ofthe-art research works, including microscopic images, has led to promising results. Generally, segmentation methods presented in the research are divided into two categories: semantic segmentation and instance

segmentation. Semantic segmentation is the process of assigning a class label to each pixel in the considered image. Convolutional Neural Networks (CNNs), as the first semantic segmentation networks, consider this as a process of classifying an image at the pixel level and can be used by considering the patches extracted from around a particular pixel. In this

way, a sliding window on the input images is used to create probabilistic maps, and then the image is segmented by thresholding. To segment the neural membrane in microscopic images, Ciresan et al. (2012), have used CNNs to classify each pixel. CNNs have also been employed for the segmentation of cervical images (Song et al., 2015), the regions related to breast cancer in histopathological images (Su et al., 2015), and the segmentation of muscle premium in skeletal images (Sapkota et al., 2015)

然后提出了全卷积神经网络（FCNN），将卷积层替换为全连接层，可以通过端到端训练创建概率映射，大大提高了图像分割的计算效率（Long et al.，2015）。为了进一步提高分割性能，人们对FCNN进行了各种修改。例如，引入了UNet架构，它包括一个典型的FCNN和随后的上采样块，向上采样特征映射，其中增量卷积与展开和收缩路径成正比，以增加图像大小，以便网络可以使用上下文信息。在对医学图像的分析中，最著名的新卷积网络架构是UNet（Ronneberger et al.，2015）。UNet还被用于在显微镜图像下分割神经膜（Arganda-Carreras，2015）、胶质母细胞瘤-星形细胞瘤和HeLa细胞（Maˇska，2014）。在（Drozdzal et al.，2016）中，研究了在UNet中除了使用长跳跃路径外，还使用了类似于ResNet的短跳跃路径。在随后的研究中，人们通过改变UNet结构来提高分割性能

The Fully Convolutional Neural Network (FCNN) was then proposed by replacing fully connected layers with the convolutional layers, which could be used to create probabilistic mappings by end-to-end training and greatly increase the computational efficiency of image segmentation (Long et al., 2015). Various modifications have been made to the FCNN to further improve segmentation performance. For example, UNet architecture was introduced, which includes a typical FCNN and subsequently an upsampling block that up samples feature maps, in which incremental convolutions are proportional to the expansion and contraction paths are used to increase image size so that the network can use context information. In the analysis of medical images, the most well-known architecture of new convolutional networks is UNet (Ronneberger et al., 2015). UNet has also been used to segment nerve membranes (Arganda-Carreras, 2015), glioblastoma-astrocytoma, and HeLa cells in microscopic images (Maˇska, 2014). In (Drozdzal et al., 2016), the use of short skip paths similar to ResNet in addition to long skip paths in UNet has been investigated. In subsequent studies, efforts have been made to improve segmentation performance by making changes in the UNet structure

例如，SegNet改变卷积层的数量根据指数池层（陈etal.，2017），DeepLab取代反褶积层与条件随机场（CRF）是一个概率图形模型（Xie et al.，2016）。HRNet（Yuan等人，2019），效率网（Tan和Le，2019），注意单元网（Oktay，2018），

For example, SegNet has made changes in the number of convolution layers according to the index of the pooling layers (Chen et al., 2017), and DeepLab has replaced the deconvolution layers with the Conditional Random Field (CRF) which is a probabilistic graphical model (Xie et al., 2016). HRNet (Yuan et al., 2019), EfficientNet (Tan and Le, 2019), Attention UNet (Oktay, 2018), LinkNet

（乔拉西亚和库鲁西罗，2017）、PSPNet（Zhao等人，2017）、FPN（马丁森和莫格伦，2019）和UNet++（Zhou等人，2018）是最近被提出的用于语义分割的模型。

(Chaurasia and Culurciello, 2017), PSPNet (Zhao et al., 2017), FPN (Martinsson and Mogren, 2019) and UNet++ (Zhou et al., 2018) are recent models which have been proposed for semantic segmentation.

LinkNet For example, SegNet has made changes in the number of convolution layers according to the index of the pooling layers (Chen et al., 2017), and DeepLab has replaced the deconvolution layers with the Conditional Random Field (CRF) which is a probabilistic graphical model (Xie et al., 2016). HRNet (Yuan et al., 2019), EfficientNet (Tan and Le, 2019), Attention UNet (Oktay, 2018), LinkNet

(Chaurasia and Culurciello, 2017), PSPNet (Zhao et al., 2017), FPN (Martinsson and Mogren, 2019) and UNet++ (Zhou et al., 2018) are recent models which have been proposed for semantic segmentation.

实例分割比语义分割领先一步。除了检测和分割图像中的对象外，它还可以分割不同的对象，无论它们是否属于一个类。在实例分割中，如果属于一个类的几个对象彼此接近或重叠，则必须精确地定义每个对象的边界。但在语义分割中，其目标是确定由所有这些对象组成的整个区域的边界（Liu et al.，2017）。一些架构，如Mask-RCNN（约翰逊，2018）、ResNeSt（Zhang，2020）、DetectoRS（Qiao等人，2020）和张量面具（Chen等人，2019）已经被用于分割实例。Mask-RCNN是一种先进的深度网络，最近被用于各种目的，包括检测、定位和实例分割。在各种挑战中，它在所有任务中表现更好，如2016年COCOCO目标检测、实例分割和字幕挑战（Johnson，2018）。

Instance segmentation is one step ahead of semantic segmentation. In addition to detecting and segmenting objects in the image, it can segment different objects, whether they belong to a class or not, with higher complexity. In instance segmentation, if several objects belonging to a class are close to each other or overlap, the boundary of each must be precisely defined. But in semantic segmentation, the goal is to determine the boundary of the whole area consisting of all these objects (Liu et al., 2017). Some architectures such as Mask-RCNN (Johnson, 2018), ResNeSt (Zhang, 2020), DetectoRS (Qiao et al., 2020), and TensorMask (Chen et al., 2019) have been used for instance segmentation. Mask-RCNN is an advanced deep network that has recently been used for various purposes, including detection, localization, and instance segmentation. It performed better than other models offered in all tasks in various challenges, such as the 2016 COCO object detection, instance segmentation, and captioning challenges (Johnson, 2018).

除了少数关于炭疽杆菌（而不是免疫系统细胞）的检测和分割的研究外，在该领域尚未发表其他论文。Zhao（2020）、Zhao等人（2019）和Wang（2019）是在孢子培养容器制备的显微图像中对炭疽孢子（与作物、水果、蔬菜有关，而不是人类）进行分割的论文。此外，在我们之前的论文（Hoorali et al.，2020）中，在类似的显微镜图像中对炭疽杆菌的检测和分割。

Apart from a few studies on the detection and segmentation of only B. anthracis bacteria (not the immune system cells), no other papers have been published in this field. Zhao (2020), Zhao et al. (2019) and Wang (2019) are the papers in which the segmentation of anthrax spores

(related to crops, fruits, and vegetables, not humans) in the prepared microscopic images from spore culture vessel has been done. Also, in our previous paper (Hoorali et al., 2020) the detection and segmentation of B. anthracis bacteria in similar microscopic images has been examined.

与自然图像相比，医学图像（特别是显微图像）的分割需要更精确，才能准确诊断疾病的类型和分期。在显微图像分析领域存在着许多挑战，如高图像分辨率、伪影、物体拥挤、重叠和不同细胞类型之间的相似性。鉴于不同尺度的特征映射包含不同的信息，即较大的尺度包含低级的信息，如对象的边、角，较小的尺度包含高级的语义信息，并确定对象的位置。因此，即使在复杂的背景下，结合与不同层次和尺度相关的特征也会提高分割精度（Zhou et al.，2018；Lin等人，2017）。在最近的一些研究中，结合多尺度特征已被用来提高模型的图像分割性能。在Kushnure等人（2021年）的研究中，

Compared to natural images, segmentation of medical images (especially microscopic images) requires more precision to accurately diagnose the type and stage of the disease. There are many challenges in the field of microscopic image analysis, such as high image resolution, artifacts, object crowding, overlapping, and similarity of different cell types to each other. Given that feature maps at different scales contain different information, i.e., larger scales contain low-level information such as edges and corners of the objects, and smaller scales contain highlevel semantic information and determine the location of the objects. Therefore, combining features related to different layers and scales will improve segmentation accuracy even on complex backgrounds (Zhou et al., 2018; Lin et al., 2017). In some recent researches, a combination of multi-scale features has been used to improve the performance of models in image segmentation. In Kushnure et al. (2021),

a multi-scale

提出了一种具有特征重新校准的多尺度UNet（MS-UNet），利用多尺度Res2Net模块来增强CNN的接受域，提高网络学习能力。此外，挤压和激励（SE）网络已经执行了通道级的多尺度特征重新校准。该多尺度方法提高了分割效率，降低了网络的计算复杂度。最近，一种改进的多尺度UNet++（M2UNet++）架构被提出用于肝脏自动分割，其中通过通道特征图自适应重新校准来修改多尺度特征。实际上，跳跃连接和使用压缩和激励网络（SENet）对嵌套卷积层的特征进行了修改。利用这些变化，高级信息表示和分割效率得到了提高（Kushnure和Talbar，2022）。

UNet (MS-UNet) with feature recalibration has been proposed in which the multi-scale Res2Net module has been used to enhance the receptive field of CNN and improve the network learning ability. Further, the squeeze and excitation (SE) network has performed channel-wise multi-scale features recalibration. This multi-scale approach has improved the segmentation efficiency and reduced the computational complexity of the network. Recently, a modified multi-scale UNet++ (M2UNet++) architecture has been proposed for automatic liver segmentation in which multi-scale features have been modified by channel-wise adaptively feature maps recalibration. Indeed, the features of skip connections and nested convolution layers have been modified using the squeeze and excitation network (SENet). Using these changes, high-level information representation and segmentation efficiency have been improved (Kushnure and Talbar, 2022).

在本文中，我们提出了一种改进的UNet++架构，将炭疽杆菌和重要的免疫细胞（包括巨噬细胞、中性粒细胞和淋巴细胞）在炭疽患者相关的组织样本图像中分割，在2.1节所述的炭疽数据集上。该模型的性能也在MoNuSAC数据集上进行了评估，用于分割不同的细胞类型（上皮细胞、巨噬细胞、中性粒细胞和淋巴细胞）。与UNet++相比，该模型具有四个优点：1)在UNet++的解码器部分和解码器部分之间的连接中使用更多的跳跃连接来结合多尺度特征，提高其性能；2)使用

In this paper, we proposed an improved UNet++ architecture to

segment the bacterium B. anthracis and important immune cells

(including macrophages, neutrophils, and lymphocytes) in microscopic

images of the tissue sample related to anthrax patients on two challenging prepared anthrax datasets presented in Section 2.1. The model’s

performance has also been evaluated on MoNuSAC dataset for segmenting different cell types (epithelial, macrophage, neutrophil, and lymphocyte). The proposed model has four advantages over the UNet++:1) Using more skip connections to combine multi-scale features in the decoder section and in the connections between encoder and decoder sections of UNet++, which improves its performance; 2) Using

图1。设置显微镜和摄像机，用于从患者的载玻片中获取RGB图像。

Fig. 1. Setup of microscope and camera for the acquisition of RGB images from

the patient slides.

在编码器和解码器之间添加的跳过连接中的压缩和激发起始块；3)评估作为几种架构的编码器的主干，以提取强大的表示

the squeeze and excitation-inception blocks in the added skip connections between encoder and decoder; 3) Evaluating several architectures as the encoder’s backbone to extract powerful representations in the

并研究其对分割结果的影响；4)提出一种混合加权损失函数来处理与不同类相关的数据不平衡，从而提高分割效率。提出的模型的性能与几种最先进的分割架构相比，即基于分割的模型，包括UNet++（周等人，2018）、注意UNet（奥克泰，2018）、林科网（欧亚和文化，2017）、PSPNet（赵等人，2017）、FPN（马丁森和莫格伦，2019）和基于检测的模型（Mask-RCNN（李等人，2017））。利用基于标记的分水岭算法，将基于分割的模型的语义分割结果转换为实例分割。

材料和方法2.1。本文中使用的炭疽病数据集是通过从不同的切片区域进行成像获得的，包括炭疽病患者的组织样本（图1）。这些幻灯片被提供在

图2。本文使用的炭疽数据集的两个显微图像。

Fig. 2. Two microscopic images of the prepared anthrax datasets used in this paper.

图3。从与MoNuSAC数据集相关的整个幻灯片图像中裁剪出的两个子图像及其掩膜——上皮细胞为红色，淋巴细胞为黄色，巨噬细胞为绿色，中性粒细胞为蓝色。

Fig. 3. Two sub-images cropped from whole slide images related to MoNuSAC dataset and masks of them - epithelial cells are shown in red, lymphocytes in yellow, macrophages in green, and neutrophils in blue.

与埃斯法拉延医学科学学院在生物化学和血清学实验室的副校长合作。准备评估数据集的方法在我们之前的论文中已经提到过（Hoorali et al.，2020）。我们使用了两个不同的炭疽热数据集来训练和评估所提出的模型和其他架构。为了准备第一个数据集，我们使用了48张载玻片，包括来自炭疽热患者的组织样本。通过对载玻片不同区域的成像，获得了213张图像。第二个数据集的图像中含有更多的细菌，而且比第一个数据集更具挑战性，它是通过从55张包含患者组织样本的幻灯片中提取243张图像而准备好的。图2显示了两个准备好的图像的例子，其中确定了一些重要的成分。在本研究中，我们在检测和分割细菌和细胞方面面临着许多挑战。从图2的显微镜图像中可以看出，炭疽杆菌和一些主要的免疫细胞，包括巨噬细胞、中性粒细胞和淋巴细胞，都可以以任何形状、角度和颜色强度出现。不同区域的重叠，重要的免疫c的相似性

collaboration with the vice-chancellor of Esfarayen faculty of medical sciences in biochemistry and serology laboratory. The method of preparing the evaluation datasets is mentioned in our previous paper (Hoorali et al., 2020). We have used two different anthrax datasets to train and evaluate the proposed model and other architectures. To prepare the first dataset, 48 slides, including tissue samples from anthrax patients have been used. By imaging from different regions of the slides, 213 images were obtained. The second dataset, which has more bacteria in its images and is more challenging than the first dataset, has been prepared by taking 243

images from 55 slides containing patient tissue samples. Fig. 2 shows two examples of the prepared images in which some important components are determined. We faced many challenges in detecting and segmenting bacteria and cells in this research. As can be seen in

microscopic images of Fig. 2, B. anthracis bacteria and some major immune cells, including macrophages, neutrophils, and lymphocytes, can appear in any shape, angle, and color intensity. Overlapping of the different regions, the similarity of important immune cells, staining spots, and tissue filaments in parts of the image are other challenges that may reduce detection accuracy and segmentation. In addition to the prepared datasets related to anthrax disease, to make the experimental results more objective (to evaluate the proposed model and compare it with other state-of-the-art models on more datasets), multi-organ nuclei segmentation and classification (MoNuSAC) dataset (Grand Challenge) was also used. This publicly available dataset

数据集（大挑战）。这个公开的数据集

包括细胞核边界和细胞类型（上皮细胞、巨噬细胞、淋巴细胞和中性粒细胞）注释的四个器官（乳腺、肾脏、肺和前列腺）的全幻灯片图像。一些标注数据集的示例如图3所示。考虑到基于分割的模型的输出为输入图像的语义分割掩模，我们使用基于标记的分水岭变换和形态学操作将其转换为实例分割掩模。

includes nuclei boundaries, and cell types (epithelial, macrophage, lymphocyte, and neutrophil cells) annotated H&E stained whole slide images of four organs (breast, kidney, lung, and prostate). Some example annotated images of this dataset are shown in Fig. 3. Given that the outputs of the segmentation-based models are semantic segmentation masks for the input images, we used the markerbased watershed transform and morphological operations to convert them to instance segmentation masks.

2.2.使用UNet++和改进的UNet++ 2.2.1进行实例分割。UNet++近年来，人们提出了许多深度架构，并对UNet结构进行了改进，以提高分割性能。其中一种架构是UNetu++，由Zhou等人提出（Zhou等人，2018）。UNet++改变UNet的跳跃连接部分，对医学图像进行准确分割。UNet中使用的跳过连接直接连接编码器和解码器子网，导致了不同语义特征的组合，在检索对象细节和创建准确的分割掩码方面发挥了重要作用。然而，在医学图像中，特别是在拥挤的图像中，UNet的表现并不是很好。在UNet++中，与UNet不同的是，跳过连接并不直接连接编码器和解码器部分。密集卷积块

2.2. Instance segmentation using UNet++ and improved UNet++

2.2.1. UNet++

In recent years, many deep architectures have been proposed with modifications to the UNet structure to improve segmentation performance. One of these architectures is UNet++, which was proposed by Zhou et al. (Zhou et al., 2018). UNet++ is provided with changes in the skip connections section of UNet, to make an accurate segmentation of medical images. The skip connections used in UNet connect the encoder and the decoder subnetworks directly, which leads to a combination of different semantic features and play an important role in retrieving object details and creating an accurate segmentation mask. However, in medical images, especially in crowded images, UNet does not perform very well. In UNet++, unlike UNet, the skip connections do not directly connect the encoder and decoder sections. Dense convolution blocks which

在这两个部分之间使用，包括几个卷积层。如图4所示，UNet和UNet++之间的区别在于使用密集的跳过连接来连接编码器和解码器子网，以及使用深度监督。

are used between the two sections include several convolution layers. As shown in Fig. 4, the difference between UNet and UNet++ is the use of dense skip connections to connect the encoder and decoder subnetworks, as well as the use of deep supervision.

通过应用密集的跳跃连接，改善了梯度流，并通过深度监督，在两种不同的精度和速度模式下提高了网络性能。为了提高精度，对所有分支的输出分割进行平均，并提高速度，从其中一个分割分支中选择最终的分割图（Zhou et al.，2018）。Zhou等人（2018）中使用的损失函数用等式表示 1.L (Y, Y) = − 1N∑Nb=1(12.Yb.logYb + 2.Yb.Yb Yb + Yb) (1)，其中N、Yb、Yb分别表示bth图像的批处理大小、预测概率和地面真实值。它使用二进制交叉熵和骰子系数的组合，每个语义级别的交叉熵和嵌套跳跃路径产生的不同分辨率特征映射。

By applying dense skip connections, the gradient flow improves, and via deep supervision, the network performance is enhanced in two different modes of accuracy and speed. To increase the accuracy, the output segmentation of all branches is averaged, and to increase the speed, the final segmentation map is selected from one of the segmentation branches (Zhou et al., 2018). The loss function used in Zhou et al. (2018) is represented as Eq. 1. L(Y, ̂Y) = − 1N∑Nb=1(12.Yb.loĝYb + 2.Yb.̂Yb Yb + ̂Yb)(1) Where N, Ŷb, Yb denote the batch size, predicted probabilities, and ground truth of the bth image, respectively. It uses a combination of binary cross-entropy and dice coefficient for each of the semantic levels associated with the different resolution feature maps produced by nested skip paths.

其中，xij是节点xij的输出，该节点将编码器上采样层i连接到密集块的卷积层j。在所提出的模型中，H表示卷积操作、批处理归一化、激活函数和退出，以及u（.）， D (.)和[]分别表示上采样层、下采样层和连接层。与UNet++模型相比，该架构的一个优点是在上采样和下采样路径中使用了批处理归一化和退出技术。批归一化加速了收敛，防止了梯度消失，使用辍学技术防止了过拟合，提高了模型的泛化能力。LRelu激活也被用于该体系结构中，而不是Relu，以防止梯度消失问题。根据等式的方法，得到了与该模型的解码器部分相关的特征图 3.

Where xij is the output of node Xij which connects the encoder upsampling layer i to the dense block’s convolution layer j. In the proposed model, H represents convolution operation, batch normalization, activation function and dropout, and u(.), D(.) and [] represent an upsampling layer, a down-sampling layer and concatenation, respectively. An advantage of the proposed architecture compared to the UNet++ model is the use of batch normalization and dropout technique in up-sampling and down-sampling paths. Batch normalization accelerates convergence and prevents vanishing gradient, and using the dropout technique prevents overfitting and increases the generalization power of the model. LRelu activation has also been used in the proposed architecture instead of Relu to prevent gradient vanishing problem. The feature maps related to the decoder section of the proposed model are obtained according to Eq. 3.

2.2.2.改进的UNet++为了精确分割，需要结合不同尺度的特征。虽然UNet++在一定程度上解决了这个问题，但也可以在更不同的尺度上组合特征图。鉴于此，本文的目的是对微观图像进行分割，在这一领域存在许多挑战，如高图像分辨率、伪影、物体拥挤和重叠。我们对UNet++的结构进行了修改，以便我们可以利用全尺度的特征图组合来提高分割效率。所提出的基于UNet++的模型如图5所示。

2.2.2. Improved UNet++

For accurate segmentation, it is necessary to combine features at

different scales. Although UNet++ solves this problem to some extent, it

is possible to combine feature maps at more different scales. Given that, our purpose in this paper is the segmentation of microscopic images, and many challenges exist in this field, such as high image resolution, artifacts, object crowding, and overlapping. We made changes in the structure of UNet++ so that we can take advantage of full-scale feature maps combination to improve the segmentation efficiency. An illustration of the proposed UNet++-based model is shown in Fig. 5.

我们知道，不同尺度上的特征映射包含不同的信息；将这些特征结合起来是更好地表示的一个好主意。事实上，较大尺度的特征地图包含低级信息，如物体的边和角，而较小尺度的特征地图包含决定物体位置的高级语义信息（Zhooetal.，2018；Lin等人，2017）。为了保持所有尺度的特征，我们通过在体系结构中添加跳过连接，使用了不同尺度的特征组合（见图5中的绿色和黄色箭头）。在解码器部分，绿色跳过连接允许任何尺度的特征图访问所有较低尺度的特征图。黄色连接还允许每个级别的特征地图访问更高尺度的特征地图（每个尺度都是更低尺度的特征地图的组合）。

We know that feature maps at different scales contain different information; combining these features is a good idea for better representations. Indeed, feature maps in larger scales contain low-level information such as edges and corners of the objects, and in smaller scales contain high-level semantic information which determines the location of the objects (Zhou et al., 2018; Lin et al., 2017). To keep the features of all scales, we used the combination of features at different scales by adding skip connections in the architecture (see green and yellow arrows in Fig. 5). In the decoder section, green skip connections allow feature maps at any scale to access feature maps at all lower scales. Yellow connections also allow feature maps at each level to access feature maps at higher scales (which each of them is a combination offeature maps at lower scales).

此外，在添加的跳跃连接中，挤压和激发初始块的整合为网络提供了不同尺度和不同内核大小的特征（穆罕默德和Aramvith，2019）。挤压和激励技术通过建模特征映射通道之间的依赖关系，提高了由初始块产生的表示的质量。该技术基于内容感知机制自适应地权衡每个通道。它强调了包含更多信息而抑制较不有用的特性（Hu et al.，2018）。这些变化极大地有助于提高分割的精度和效率。跳过连接的另一个重要作用是增加编码器特征映射与解码器特征映射的语义相似性，从而使优化器更容易解决优化问题。

Furthermore, integration of the squeeze and excitation-inception blocks in the added skip connections provides the network with features at different scales and with different kernel sizes (Muhammad and Aramvith, 2019). The squeeze and excitation technique improves the quality of representations produced by inception block via modeling dependencies between feature map channels. This technique weighs each channel adaptively based on a content-aware mechanism. It does this with emphasis on features that contain more information and suppress less useful features (Hu et al., 2018). These changes greatly help to improve segmentation accuracy and efficiency. Another important role of skip connections is to increase the semantic similarity of the encoder feature map to the decoder feature map, thus making it easier for the optimizer to solve the optimization problem.

在编码器部分和编码器与解码器之间的层中，由xi、j表示的特征图堆栈计算如（Zhou et al.，2018）所述：

The stack of feature maps represented by xi,j in the encoder section and in the layers between encoder and decoder is computed as stated in (Zhou et al., 2018):

其中，SE_I，ui（.）和Di（.）分别表示挤压和激发初始块，向上采样i次，向下采样i次。如前所述，挤压和激励是一种特征重新校准的机制，通过它，使用一般信息来强调包含更重要信息的特征。在挤压阶段，通过特殊维数收缩大小为W×H×C（XC∈RH×W）的输入特征图，利用全局池化操作得到信道响应的全局分布如下(Eq。4),

Where SE_I, ui(.) and Di(.) represent the squeeze and excitationinception block, up-sampling by i times and down-sampling by i times, respectively. As mentioned, squeeze and excitation is a mechanism for feature recalibrating through which general information is used to emphasize features that contain more important information. In the squeeze stage, input feature maps of size W×H×C (XC ∈ RH×W) through special dimension are shrunk, and global distribution of channel-wise responses is obtained using global pooling operation as follows (Eq. 4),

式中，C为对输入端应用变换后的特征映射的通道数，VC表示大小为RC的一维向量V。在建模通道关联的激励阶段，通过学习来确定Wis，然后使用门控机制来产生每个通道的权重。的确，在这一阶段，为了提高网络的泛化能力，输入向量(V)被通道隔离，并采用了两个具有非线性激活函数的全连接层（ReLU和sigmoid），如下式所示。

Where C is the number of feature maps’ channels after applying the transformation on the input and VC represents the one-dimensional vector V of size RC. In the excitation stage for modeling channel association, Wis determined via learning, and then the gating mechanism is

used to produce the weight of each channel. Indeed in this stage, to improve the generalization capability of the network, the input vector (V) is isolated channel-wise, and the gating mechanism is employed using two fully connected layers with nonlinear activation functions (ReLU and sigmoid), as shown in the following equation.

E = FEx（V，W）= σ（g（V，W））(5)其中g（V，W）=W2δ（W1W）)、W1∈RCr×C和W2∈RC×Cr与第一和第二全连接层相关，r是控制全连接层计算成本的降维因子。如（Rundo，2019）中所述，通过使用r = 8实现了医学图像的最佳分割性能。最后，再次重新加权特征图。在这一阶段，特征映射为用等式表示的通道 6.XC=FScale（XC，EC）=EC×XC(6)，其中XC∈RH×W和EC∈[0,1]。在这一步中，通过分配更大的权重，重点关注包含更重要信息的特征，并将较小的权重分配给较不重要的特征（Hu et al.，2018）。挤压和激发起始块的结构和

正在上传…重新上传取消

E = FEx(V, W) = σ(g(V, W)) (5) Where g(V, W) = W2δ(W1V)), W1 ∈ RCr×C and W2 ∈ RC×Cr related to the first and second fully connected layers and r is the dimensionality reduction factor that controls the computational cost of the fully connected layers. As mentioned in (Rundo, 2019), the best segmentation performance for the medical images is achieved by employing r = 8. Finally, the feature maps are reweighted again. In this stage, the feature maps recalibrated channel-wise as expressed in Eq. 6. XC = FScale(XC, EC) = EC×XC (6) In which XC ∈ RH×W and EC ∈ [0, 1]. In this step, emphasis is placed on the features that contain more important information by assigning larger weights, and smaller weights are assigned to less important features (Hu et al., 2018). The structures of the squeeze and excitation-inception block and

Fig. 6. (a) Inception block and (b) Squeeze and Excitation-Inception block used in the proposed improved UNet++ model.

图6。在提出的改进UNet++模型中使用的(a)初始块和(b)挤压和激发起始块。

初始块集成在所提模型中添加的跳跃连接中，如图6所示。图中指定了在初始块中使用的过滤器的数量和大小。为了提取更高效和更合适的特征，我们没有使用原始改进的UNet++架构的通用编码器，而是检查了各种深度网络，如VGG、ResNet、DenseNet等作为编码器。在这种情况下，X0、0、X1、0、X2、0和X3、0是不同规模的建议架构的编码器主干的输出，X4、0使用X3、0得到，如图5所示。实验结果部分给出了不同骨干的分割结果。具有25621641个参数的建议结构的细节（通过考虑效率netb6作为编码器的主干）如表1所示。由于层的数量和表大小的限制，只显示最重要的层。这些层包括输入层、从中提取特征映射的主干层、解码器

inception block, which are integrated in the added skip connections of the proposed model, are shown in Fig. 6. The number and size of filters used in the inception block are specified in the figure. In order to extract more efficient and appropriate features, instead of using the common encoder of the original improved UNet++ architecture, we examined various deep networks such as VGG, ResNet, DenseNet, etc as the encoder. In this case, X0,0, X1,0, X2,0 and X3,0 are the outputs of the encoder’s backbone of the proposed architecture at different scales, and X4,0 is obtained using X3,0 as shown in Fig. 5. The segmentation results of different backbones are given in the experimental results section. Details of the proposed structure with 25,621,641 parameters (by considering EfficientNetB6 as the encoder’s backbone) are shown in Table 1. Due to the large number of layers and the table size limitation, only the most important layers are shown. These include the input layer, the backbone layers from which the feature maps are

extracted, the decoder

和输出层。每层的规格包括层的名称和类型、输出张量的形状、层参数的数量及其连接的层的名称。考虑到基于分割的模型的输出是输入图像的语义分割掩模，我们对其输出应用基于标记的分水岭变换，将其转换为实例分割掩模。

and output layers. The specifications of each layer include the name and type of layer, the shape of the output tensor, the number of layer parameters, and the name of the layers connected to

it are given in the table. Given that the outputs of segmentation-based models are semantic segmentation masks for the input images, we applied the marker-based watershed transform on their outputs to convert them to instance segmentation masks.

2.2.3.本文采用基于标记的分水岭方法，将基于分割的模型的语义分割结果转换为实例分割结果。此方法可组合使用

2.2.3. Marker-based watershed transform and morphological operations for instance segmentation In this paper, the marker-based watershed method is used to convert the semantic segmentation results of the segmentation-based models to the instance segmentation results. This method uses a combination of

流域变换和形态操作。这样，首先，通过形态学操作来确定标记物的位置；然后，利用分水岭算法，确定了进行实例分割的对象的精确边界。

流域变换是一种将每个图像视为一个地形表面的技术。为了将图像分割成不同的区域（集水盆地），假设最小值的表面被淹没，而建筑屏障阻止了来自不同盆地的水的整合。最后，屏障决定了不同区域之间的边界。分水岭变换的主要问题是图像的过分割，这通常是由于图像中存在的小能量干扰、噪声或任何其他不规则现象造成的。流域转换通常应用于图像梯度，估计尖锐的梯度来确定不同区域之间的边界是困难和具有挑战性的。

Watershed transform is a technique in which each image is considered a topographic surface. To segment the image into different regions (catchment basins), it is assumed that the surface of the minima is flooded while the building barriers prevent the integration of water coming from different basins. Finally, the barriers determine the boundary between different regions. The main problem with the watershed transform is the over-segmentation of the image, which usually occurs due to small energy disturbances, noise, or any other irregularities in the image. Watershed conversion is usually applied to the image gradient, and estimating sharp gradients to determine the boundary between different regions is difficult and challenging.

基于标记的流域算法是流域转换的一个主要增强，该转换从先前定义的标记集淹没地形表面。它是一种解决分水岭算法和实例级分割问题的变换。标记决定了哪些盆地应该合并，哪些盆地不应该合并。它们被标记为图像中最有可能与前景或物体相关的区域，最有可能与背景或非物体区域相关的区域，最后是我们不确定它们的区域。然后应用分水岭算法（Zhang et al.，2010）。确定标记的步骤如下：首先，我们使用Otsu阈值化方法对图像进行二值化。然后，使用形态学操作（关闭和打开），我们去除小的噪声或

The marker-based watershed algorithm is a major enhancement of the watershed transformation that floods the topographic surface from a previously defined set of markers. It is a transform that has been proposed to solve the over-segmentation problem of watershed algorithm and instance-level segmentation. Markers determine which basins should be merged and which should not. They are defined by labeling regions of the image that are most likely to be relevant to the foreground or object, regions that are most likely to be relevant to the background or non-object regions, and finally, the regions that we are not sure about them. Then the watershed algorithm is applied (Zhang et al., 2010). The steps for determining the markers are as follows: First, we binary the image using the Otsu thresholding method. Then, using morphological operations (closing and opening), we remove small noises or

物体内部的小洞。这样，就可以说，靠近物体中心的区域将与前景相关，而距离足够远的区域将与背景相关。此外，我们不确定的区域与物体的边界有关。接下来，使用另外两个形态学操作（侵蚀和扩张），我们去除物体的边界，以准确地确定与物体和背景相关的区域。在仔细确定了这些区域后，标记物就被确定了。然后，通过应用分水岭算法，确定了目标的精确边界。确定精确的边界对于有重叠的物体尤为重要。这是使用基于标记的分水岭算法进行的。最后，将基于分割的模型的语义分割结果转换为实例分割结果。

small holes inside the objects. In this way, it can be said that regions close to the center of the objects will be related to the foreground, and regions far enough away will be related to the background. Also, regions that we are not sure about are related to the boundaries of the objects. Next, using two other morphological operations (erosion and dilation), we remove the boundary of the objects to determine exactly the regions related to the objects and the background. After carefully determining these regions, markers are determined. Then, by applying the watershed

algorithm, the exact boundary of the objects is determined. Determining the exact boundary is especially important for the objects that have overlapping. This is conducted using the marker-based watershed algorithm. Finally, the semantic segmentation results of the segmentationbased models are converted to instance segmentation results.

2.3.炭疽芽孢杆菌和重要的免疫细胞检测和实例分割检查大量含有染色样本的载玻片是一项无聊的任务，这取决于专家的视觉检查结果和他们的先验知识。所以，通常考试的结果是由的

2.3. Bacillus anthracis bacteria and important immune cells detection and instance segmentation

Examining a large number of slides containing stained samples is a boring task and depends on the results of a specialist’s visual examination and their prior knowledge. So, often the examination results by

有几个专家是不同的。为了克服这些问题，Gurcan等人（2009）提出了使用机器视觉和图像处理算法对医学图像进行自动分析。有效和自动分割的细胞核/细胞或细菌在显微图像，作为计算机辅助图像诊断的基本前提，是许多图像分析的基础，如疾病诊断和进展，诊断细胞的类型和形态，诊断细菌的类型等。（梅杰林出版社，2012年）。到目前为止，还没有关于炭疽病的自动诊断以及在显微镜图像中对炭疽杆菌和重要免疫细胞的准确检测和实例分割的研究。之前，UNet和Mask-RCNN被用于核分割，并在Kaggle的2018年数据科学碗（Caicedo，2019）中取得了最佳的结果。在本文中，我们提出了一种改进的UNet++结构，并将其性能与几种基于分割的模型和Mask-RCNN结构进行了比较，以分割与患者相关的组织样本的显微图像。提出的算法格式

several specialists are different. To overcome such problems, automated analysis of medical images using machine vision and image processing algorithms is proposed in Gurcan et al. (2009). Effective and automatic segmentation of the nucleus/cell or bacteria in microscopic images, as a

basic prerequisite for computer-assisted image-based diagnosis, is the basis of many image analyses such as disease diagnosis and progression, diagnosis of the type and morphology of the cell, diagnosis of the type of bacteria, etc. (Meijering, 2012). So far, no research has been conducted on the automatic diagnosis of anthrax disease and the accurate detection and instance segmentation of the bacterium B. anthracis and important immune cells in microscopic

images. Previously UNet and Mask-RCNN were used for nuclei segmentation and achieved the best results in Kaggle’s 2018 Data Science Bowl (Caicedo, 2019). In this paper, we propose an improved UNet++ architecture and compare its performance with several segmentation-based models and Mask-RCNN architecture to segment the microscopic images of the tissue sample related to patients. The algorithmic format of the proposed method is shown in Algorithm 1.

Algorithm 1. (The proposed algorithm)

正在上传…重新上传取消

2.4.实现A)数据准备和预处理为了训练深度神经网络进行分割，我们需要注释数据并创建一个训练数据集。一位微生物学家通过检测和分割炭疽杆菌的区域和与炭疽热数据集相关的图像的重要显微镜免疫细胞来完成这项任务。在MoNuSAC数据集的情况下，可以在专家病理学家的帮助下提供带注释的图像。由于显微图像通常具有高分辨率，由于CPU或GPU的内存有限，它们必须分为小补丁和/或调整到更小的尺寸。

2.4. Implementation

A) Data preparation and pre-processing

To train the deep neural network for segmentation, we need to annotate data and create a training dataset. A microbiologist conducted this task by detecting and segmenting the region of the B. anthracis bacterium and important microscopic immune cells of the images related to the anthrax datasets. In the case of MoNuSAC dataset, annotated images with the help of expert pathologists are available. Given that the microscopic images usually have a high resolution, and due to the limited memory of CPU or GPU, they must be divided into small patches and/or resized to smaller dimensions.

此外，深度神经网络（DNN）需要很长时间来处理其原始大小的整个输入图像。因此，通过检查不同输入大小下的分割结果，我们得出结论，将输入图像调整为512×512像素并不会显著改变分割结果。这是由于将图像大小缩小到512×512并不会改变重要物体（细菌和细胞）的形态。因此，为了降低计算成本，加快网络的训练和测试时间，我们将原始图像的大小从2696×2696像素减少到512×512像素。然后采用窗口大小为64×64、重叠度为50%的滑动窗口技术对整个图像进行扫描，进行训练。为了测试网络，使用相同的窗口大小，步幅为8个像素。在MoNuSAC数据集中，考虑的窗口大小为96×96，重叠度为50%。

Also, a deep neural network (DNN) takes a long time to process the entire input image in its original size. So, examining the segmentation results for different input sizes, we concluded that resizing the input image to 512 × 512 pixels doesn’t change the segmentation results significantly. This is due to the fact that reducing the image size down to 512 × 512 does not change the morphology of the important objects (bacteria and cells). Therefore, in order to reduce the computational cost and speed up the training and testing time of the network, we reduce the size of the original images from 2696 × 2696 to 512 × 512 pixels. Then the sliding window technique with a window size of 64 × 64 and overlap of 50% is used to sweep the whole image for training. To test the network, the same window size is used with a stride of 8 pixels. In the case of MoNuSAC dataset, the window size of 96 × 96 with an overlap of 50% is considered.

在测试阶段，从斑块重建整个分割图像，将预测的斑块从左到右，从上到下排列，在不同斑块重叠的区域，平均结果得到最终的推断。在训练网络之前，一个预处理阶段是由

In the testing phase to reconstruct the whole segmented image from the patches, the predicted patches are arranged from left to right and from top to bottom, and in the areas where different patches overlap, the results are averaged to find the final inference. Before training the network, a preprocessing stage is performed by

Fig. 7. An example of augmentation. (a) Original image, (b) Image with color augmentation, (c) Image with 90-degree rotation, (d) Image with additive

Gaussian noise.

图7。增强的一个例子。(a)原始图像，带增色的(b)图像，具有90度旋转的(c)图像，具有加性高斯噪声的(d)图像。

将特征归一化应用于原始显微图像。这提高了网络的泛化能力，加速了梯度下降算法的收敛性。

applying the feature normalization to raw microscopic images. This increases the generalization power of the network and accelerates the convergence of the gradient descent algorithm.

由于训练数据量较少，通常会发生过拟合。这是医疗数据中常见的问题，网络不能准确地分割新的和看不见的图像。因此，为了提高深度体系结构的泛化能力，防止过拟合，本文采用了数据增强（包括8角旋转、翻转、高斯加性噪声和颜色放大）。数据增强的一个重要优点是，通过增加数据集的大小来提高网络的分割精度，而不需要进行图像注释。对于旋转，使用45度的倍数（45、90、135、……）。同时，为了克服由于不同染色条件和玻片厚度不同而导致的显微图像色差问题，采用了增色法。它是通过在图像的每个颜色通道中添加一个随机平均值（在-0.08和+0.08之间），并将随机系数（在0.92到1.08之间）相乘来实现的。图中的数据增强实例如图7所示。

Overfitting usually occurs due to the small amount of training data. This is a common problem in medical data, and the network fails to accurately segment new and unseen images. Thus, to improve the generalization capability of the deep architectures and to prevent overfitting, data augmentation (including rotations at 8 angles, flipping, Gaussian additive noise and color amplification) is used in this paper. One of the important advantages of data augmentation is to increase the segmentation accuracy of the network by increasing the size of the dataset without the need for image annotating. For rotation, multiples of 45 degrees (45, 90, 135, …) are used. Also, to overcome the problem of the color difference of microscopic images, which often occurs due to different staining conditions and the different thickness of the slides, color augmentation is applied. It is performed by adding a random mean (between -0.08 and +0.08) and multiplying random coefficients (between 0.92 and 1.08) in each color channel of the image. An example of

data augmentation is shown in Fig. 7.

鉴于炭疽杆菌和主要免疫细胞在显微图像中可以以不同的形状、角度和颜色强度出现，适当的数据预处理、增强和调整模型参数可以大大减少现有的挑战。A)损失函数在微观图像分割中面临的一个重要挑战是不同类别的训练数据的不平衡，这降低了分割效率。显微镜图像中的重要物体，包括炭疽杆菌和打算被分割的不同细胞，与背景相比非常小。类不平衡导致网络倾向于将数据分类为具有更高比例的类

Given that B. anthracis bacteria and major immune cells can appear in different shapes, angles, and color intensities in the microscopic images, proper data preprocessing and augmentation and adjusting the model parameters can greatly reduce the existing challenges A) The loss function An important challenge in the segmentation of microscopic images, which reduces the segmentation efficiency, is the imbalance of training data of different classes. Important objects

of the microscopic images, including B. anthracis bacteria and different cells that are intended to be segmented, are very small compared to the background. Class imbalances cause the network

to tend to classify data into classes with a higher percentage of

图7。增强的一个例子。(a)原始图像，(b)彩色图像，90度旋转(c)图像，(d)加性高斯噪声图像

Fig. 7. An example of augmentation. (a) Original image, (b) Image with color augmentation, (c) Image with 90-degree rotation, (d) Image with additive

Gaussian noise

像素。另一方面，IoU和Dice是评价语义分割的主要标准。为了处理数据不平衡，实现更好的IoU和其他分割标准，我们提出了一个混合加权损失函数，如等式 7: LossSeg = WC.( α. LossCE +β.LossIoU +γ.+ δ.损失边界骰子)(7)，其中WC的系数与每个类的概率成反比。骰子、IoU、骰子和LossBoundary_骰子分别为分类交叉熵损失、IoU损失、骰子损失和骰子边界损失。损失、损失数量、损失骰子和损失边界_骰子的定义如下：

pixels. On the other hand, IoU and Dice are major criteria for evaluating semantic segmentations. To handle data imbalance and achieve a better IoU and other segmentation criteria, we

proposed a hybrid weighted loss function as in Eq. 7: LossSeg = WC.( α.LossCE +β.LossIoU +γ.LossDice + δ.LossBoundary Dice)(7) Where WC is a coefficient that is inversely proportional to the probability of each class. LossCE, LossIoU, LossDice, and LossBoundary_ Dice are categorical cross-entropy loss, IoU loss, Dice loss and Dice boundary loss, respectively. LossCE, LossIoU, LossDice and Loss Boundary_Dice are defined as follows:

yn，c和yn，c代表地面真相和预测概率n像素批相关类c. yB和yB代表相同的值，但只在像素在地面真相和预测面具和N是一批像素的数量。IoU和骰子是评估分割结果质量的两个重要指标。Boundy_Dice是通过确定预测掩模和地面真实掩模周围边界之间的骰子准则数量得到的。α、β、γ和δ是这些损失的权重系数，并通过实验进行了选择。在本研究中，α=1、β=3、γ=3和δ= 0.5的效果最好。

Where yn,c and ̂yn,c represent the ground truths and predicted probabilities for nth pixel in the batch which is related to class c. yB and ̂yB represents the same values but only in the pixels around the ground truth and predicted masks and N is the number of pixels in one batch. IoU and Dice are two important metrics for evaluating the quality of the segmentation results. Boundary_Dice is obtained by determining the amount of Dice criterion between the boundaries around the predicted mask and the ground truth mask. α, β, γ and δ are weight coefficients of these losses and are selected experimentally. In this study, α=1, β=3, γ=3 and δ= 0.5 led to the best results.

参数设置在本节中，在实验结果部分提供了所提出的体系结构和用于微观图像分割的其他模型的参数。对于改进的UNet++和UNet++体系结构，初始学习率为0.001，每1000次迭代减少10%。在改进的UNet+ +和UNet+ +的所有层中使用的滤波器的大小是3×3，降采样通过2×2最大池化进行，步幅为2，除了解码器的最后一层，其中滤波器的大小为1×1。最后一层中的过滤器的数量等于所需的类的数量（5个类，包括背景）。采用辍学率为0.2的辍学技术来提高网络的通用性。批处理大小为5，以Adam作为优化方法。

B) Parameter settings

In this section, more details are provided about the parameters of the proposed architecture and other models which are used for microscopic image segmentation in the experimental results section. For improved UNet++ and UNet++ architectures, the initial learning rate is 0.001, which is reduced by 10% every 1000 iterations. The size of the filters used in all layers of improved

UNet+ + and UNet+ + is 3 × 3 and down-sampling is conducted through 2 × 2 max-pooling with stride 2, except for the last layer of the decoder in which filters are of size 1 × 1. The number of filters in the last layer is equal to the desired number of classes (5 classes, including the background). The dropout technique with a dropout rate of 0.2 is used to improve the generalizability of the network. The batch size is 5, and Adam is used as the optimization method.

在Mask-RCNN架构中使用的主干是FPN（带有ResNet-101主干），尽管它的复杂性和比ResNet-50的执行时间更长，但可以得到更好的结果。将使用MSCOCO数据库的预训练模型的权值作为模型的初始权值。然后，分三个阶段进行训练：首先，使用随机初始化的权值训练网络头，然后训练更高的层（从Resnet-101模型的第四阶段及以上），最后对所有层进行训练。前两个阶段的学习率为0.001，最后一个阶段为0.0001。采用批大小为4，采用动量为0.9的随机梯度下降训练，l2正则化器

The backbone used in Mask-RCNN architecture is FPN (with ResNet-101 backbone), which despite its complexity and longer execution time than ResNet-50, leads to much better results. The weights of the pre-trained model using the MSCOCO database are considered as the initial weights of the model. After that, the layers are trained in three stages: first, the network heads are trained using randomly initialized weights, then higher layers (from the fourth stage and above of the Resnet-101 model) are trained and finally, training of all layers is done. The learning rate for the first two stages is 0.001 and for the last stage is 0.0001. The batch size of 4 is used, the model is trained using stochastic gradient descent with a momentum of 0.9 and L2 regularizer with

使用重量衰减0.0001。其他网络的参数是根据本文中的陈述来设置的。为了防止过拟合，根据验证数据确定了不同时期的数量。这意味着训练仍在继续，而验证损失减少，网络训练在开始增加时停止。此外，通过考虑不同的超参数进行了多次实验，并选择分割效果最好的实验进行训练。C)评价标准使用五个指标，包括骰子相似系数（DSC）、交叉联合交集（IoU）、精度和召回率来评价所提模型的分割结果(Eq。12至等式 15).分割结果与由专家准备的注释图像进行比较，以计算这些指标。

weight decay 0.0001 is used. The parameters of other networks are set based on what is

stated in their paper. To prevent overfitting, the number of epochs is determined based on validation data. It means that training continues while validation loss decreases and network training stops when it starts to increase. Besides, several experiments have been performed by considering different hyperparameters, and those that produced the best segmentation results are selected for training. C) Evaluation criterion Five metrics, including the Dice Similarity Coefficient (DSC), Intersection over Union (IoU), Precision, and Recall are used to evaluate the

quality of the segmentation results of the proposed models (Eq. 12 to Eq. 15). The segmentation results are compared with the ground truths that are annotated images prepared by a specialist to calculate these metrics.

TP, FP, TN, and FN are defined as follow:

True Positive (TP): pixels correctly segmented as bacteria or important immune cells False Positive (FP): pixels falsely segmented as bacteria or important immune cellsTrue Negative (TN): pixels correctly detected as background False Negative (FN): pixels falsely detected as background

TP、FP、TN和FN的定义为：真阳性（TP）：像素正确分割为细菌或重要免疫细胞假阳性（FP）：像素错误分割为细菌或重要免疫细胞真阴性（TN）：像素正确检测为背景假阴性（FN）：错误检测为像素为背景

实验结果和讨论我们比较了我们提出的模型与几种重要的和最先进的分割架构（注意UNet，UNet++，LinkNet，PSPNet，FPN和Mask-RCNN）的性能，以从患者载玻片拍摄的原始测试图像中分割细菌和细胞。执行我们的实验（使用基于分割的模型），在炭疽数据集的情况下，我们减少原始图像的大小从2696×2696到512×512像素，然后滑动窗口技术窗口大小64×64和重叠50%用于扫描整个图像训练。为了测试网络，使用相同的窗口大小，步幅为8个像素。在MoNuSAC数据集中，考虑的窗口大小为96×96，重叠度为50%。从总图像中，选择80%用于训练，20%用于测试。此外，为了防止过拟合，我们随机考虑了18%的训练数据作为验证数据。

TP, FP, TN, and FN are defined as follow: True Positive (TP): pixels correctly segmented as bacteria or important immune cells False Positive (FP): pixels falsely segmented as bacteria or important

immune cells True Negative (TN): pixels correctly detected as background False Negative (FN): pixels falsely detected as background 3. Experimental results and discussion We compare the performance of our proposed model with several important and state-of-the-art segmentation architectures (Attention UNet, UNet++, LinkNet, PSPNet, FPN and Mask-RCNN) to segment bacteria and cells in raw test images taken from patient slides. To perform our experiments (using segmentation-based models), in the case of the anthrax dataset, we reduce the size of the original images from 2696 × 2696 to 512 × 512 pixels and then the sliding window technique

with a window size of 64 × 64 and overlap of 50% is used to sweep the whole image for training. To test the network, the same window size is used with a stride of 8 pixels. In the case of MoNuSAC dataset, the window size of 96 × 96 with an overlap of 50% is considered. From the total images, 80% were selected for training and 20% for testing. Furthermore, 18% of the training data were randomly considered as validation data in order to prevent overfitting.

3.1.在炭疽显微镜诊断的自动化方面没有类似的研究，但本节研究了一些基于其他微生物自动诊断的研究。在高分辨率玻片扫描系统获得的痰涂片扫描中自动检测结核分枝杆菌的研究见（Hu et al.，2019）。在2630张痰涂片显微镜图像数据集上，验证了ResNet等3个CNN模型的准确性和可靠性。采用精度、精密度、灵敏度、f1评分、ROC曲线和AUC等指标来评价模型的性能。实验结果表明，《盗梦空间》v3的表现最好

3.1. Baseline

There is no similar study in the automation of anthrax microscopic diagnosis, but some studies based on other microbial automated diagnoses are investigated in this section. The first study on automatically detecting M. tuberculosis in sputum smear scans obtained by a high-resolution slide scanning system was represented in (Hu et al., 2019). The accuracy and reliability of three

CNN models, including Inception v3, ResNet, and DenseNet on a dataset of 2,630 sputum smear microscopic images were verified. The metrics of Accuracy, Precision, Sensitivity, F1 score, ROC curve, and AUC were utilized to evaluate the performance of the models. Based on the experimental results, Inception v3 had the best performance with all

1. 在炭疽显微镜诊断的自动化方面没有类似的研究，但本节研究了一些基于其他微生物自动诊断的研究。在高分辨率玻片扫描系统获得的痰涂片扫描中自动检测结核分枝杆菌的研究见（Hu et al.，2019）。在2630张痰涂片显微镜图像数据集上，验证了ResNet等3个CNN模型的准确性和可靠性。采用精度、精密度、灵敏度、f1评分、ROC曲线和AUC等指标来评价模型的性能。实验结果表明，《盗梦空间》v3的表现最好

3.1. Baseline

Table 3

Segmentation results of the proposed improved UNet+ + model using different

optimizers.

表3提出的改进UNet+ +模型的分割结果。

正在上传…重新上传取消

指标等于98.4%。绸缎等。al提出了一种智能识别和计数系统，通过应用数字图像处理和人工智能技术来检测锌染色涂片痰图像中结核分枝杆菌的存在。首先，利用全局增强、局部增强和对比度有限的自适应直方图均衡化（CLAHE）的结合，对图像对比度进行了改进。然后，利用颜色阈值法对增强后的图像进行分割。在下一步中，我们将从生成的图像中提取出颜色、形状和纹理特征。然后，我们又使用了邻域成分分析（NCA）和ReliefF分析来选择相关的特征。最后，使用多层感知器（MLP）、k-最近邻域（k-NN）和支持向量机（SVM）进行分类。MLP神经网络的最佳性能准确性、敏感性和特异性分别为93.8%、93.4%和94.1%（Roslietal.，2019）。

indicators equal to 98.4%. Fatin et. al proposed an intelligence identification and counting system to detect the presence of M. tuberculosis in the ZN-stained smear sputum image by applying digital image processing and artificial intelligence techniques. First, the image contrast has been improved using a combination of global enhancement, local enhancement, and Contrast Limited Adaptive Histogram Equalization (CLAHE). Then, color thresholding has been used to segment the enhanced image. In the next step, color, shape, and texture features have been extracted from the resulted image. Afterward, Neighborhood Component Analysis (NCA) and ReliefF Analysis have been used to select the relevant features. Finally, Multilayer Perceptron (MLP), k-Nearest Neighborhood (k-NN), and Support Vector Machine (SVM) have been used for the classification. The MLP neural network had the best performance accuracy, sensitivity, and specificity of 93.8%, 93.4%, and 94.1%, respectively (Rosli et al., 2019).

G‘ orriz等人（2018）提出了一种基于深度学习的方法，从三种利什曼原虫感染巨噬细胞的显微图像诊断利什曼原虫。UNet结构是一种FCNN，用于利什曼原虫的自动分割，然后将其分为前鞭毛虫、无鞭毛虫和粘附寄生虫三类。采用后处理方法来估计每种寄生虫类型的数量和大小。同时，利用非均匀采样和广义骰子损失函数来处理数据不平衡的问题。对最佳状态(无鞭子评分、准确率、召回率和f1评分进行评价，分别为77.7%、75.7%、82.3%和77.7%。

In G´orriz et al. (2018), a method based on deep learning for diagnosing Leishmania disease from microscopic images of the cultures generated from macrophage infection of RAW cells with three species of Leishmania parasites has been presented. UNet architecture, which is a FCNN has been used for the automatic segmentation of Leishmania parasites, and then classification into three classes including promastigotes, amastigotes, and adhered parasites, has been done. A post-processing method has been used to estimate the number and size of each parasite type. Also, non-uniform sampling and the Generalized Dice Loss function have been used to deal with the problem of data imbalance. Evaluation of this method in terms of Dice score, precision,

recall, and F1-score at the best state (for the amastigote class) resulted in 77.7%, 75.7%, 82.3%, and 77.7%, respectively.

在密特拉和伊曼纽尔（2019年）中，提出了一种从痰显微镜图像中自动识别结核细菌的方法。本文采用降噪和强度校正作为预处理步骤，以提高图像质量以进行后续处理。然后，利用通道面积阈值（CAT）算法对显微图像中的结核细菌进行分割。在该步骤中，根据图像的绿色通道的强度确定第一适当的阈值，并且根据基于最小和最大细菌区域大小的图像中获得的白色像素的总数确定随后的阈值。然后，利用位置导向直方图（LoH）和加速鲁棒特征（SURF）描述符，从上一步得到的候选细菌的分段区域中提取出适当的特征。提取的特征被用来训练深度信念

In Mithra and Emmanuel (2019), a method for automatic identification of tuberculosis bacteria from sputum microscopic images has been proposed. In this paper, noise reduction and intensity correction have been used as preprocessing steps to improve image quality for subsequent processing. Then, Channel Area Thresholding (CAT) algorithm has been used for the segmentation of tuberculosis bacteria in microscopic images. At this step, the first appropriate thresholds have been determined based on the intensity of the green channel of the images, and the subsequent thresholds have been determined based on the total number of white pixels in the image obtained from the previous step for the minimum and maximum bacterial region size. Then, using Location-Oriented Histogram (LoH) and Speeded up Robust Feature (SURF) descriptor, appropriate features have been extracted from the segmented regions of the bacterial candidate, obtained from the previous step. The extracted features have been used to train a Deep Belief

神经网络（DBN）来准确地识别与细菌相关的区域。最后，通过杆菌计数计算了感染水平，并确定了痰图像的类别(没有杆菌，很少

Neural Network (DBN) to identify regions related to bacteria accurately. Finally, the level of infection by bacilli counting has been computed, and the category of sputum images has been determined (no bacillus, few

芽孢杆菌和重叠的芽孢杆菌)。他们在痰显微镜图像中自动识别结核分枝杆菌的分类准确率为97.5%。歌曲等。al.提出了一种在显微镜下自动诊断细菌性阴道病的方法，其灵敏度、特异性和准确性分别为58.3%、87.1%和79.1%。本文对细菌形态的定量分析分为三个步骤：细菌区域的分割，重叠细菌的分离，最后是细菌形态的分类。细菌区域通过评估全局对比度和空间权重相干性进行分割。马尔可夫随机场（MRF）也被用于分割小物体。接下来，通过利用黑森基质改进棒状形状特征，获得了标记物来分裂重叠的细菌。最后，通过提取细菌的形状特征，训练AdaBoost分类器，根据Nugent评分标准对细菌进行诊断（Song，2016）。

bacillus, and overlapping bacillus). They achieved a classification accuracy of 97.5% for automated identification of mycobacterium tuberculosis in sputum microscopic images. Song et. al. proposed an automatic method to diagnose bacterial vaginosis in microscopic images and achieved a sensitivity, specificity, and accuracy of 58.3%, 87.1%, and 79.1%. In this paper, quantitative analysis of bacterial morphotypes has been performed in three steps: segmentation of the bacterial region, isolation of overlapping bacteria, and finally, classification of bacterial morphotypes. Segmentation of bacterial regions has been done by evaluating global contrast and spatial weight coherence. The Markov Random Field (MRF) has also been used to segment small objects. Next, by improving the rod-like shape features using Hessian matrices, markers have been obtained to split the overlapping bacteria. Finally, by extracting the shape features of the bacteria and training an AdaBoost classifier, the diagnosis of bacteria has been made according to the Nugent score criteria (Song, 2016).

3.2.将不同优化器的性能比较将不同的优化器应用于提出的优化器，以找到最佳优化器。SGD、Adam、Nadam和RMSProp是已经测试过的优化器。使用这些优化器改进的UNet+ +模型的分割效率如表3所示。根据表中所示的分割结果，Adam优化器的性能最好，精度和召回率分别为81.2%和90.1%。那达慕的性能几乎与亚当相似，但由于亚当的优势，这个优化器已被用于所提出的模型。在其他两个优化器中，SGD的表现最弱，而RMSProp的表现比Adam和Nadam更差。

3.2. Performance comparison of different optimizers We applied different optimizers to the proposed model to find the best optimizer. SGD, Adam, Nadam, and RMSProp are the optimizers

that have been tested. The segmentation efficiency of the improved UNet+ + model using these optimizers is shown in Table 3. Based on the segmentation results shown in the table, the Adam optimizer has the best performance with precision and recall of 81.2% and 90.1%, respectively. Nadam has a performance almost similar to Adam, but due to Adam’s superiority, this optimizer has been used for the proposed model. In the case of the other two optimizers, the SGD performed the weakest, and the RMSProp performed worse than Adam and Nadam.

1. 将不同优化器的性能比较将不同的优化器应用于提出的优化器，以找到最佳优化器。SGD、Adam、Nadam和RMSProp是已经测试过的优化器。使用这些优化器改进的UNet+ +模型的分割效率如表3所示。根据表中所示的分割结果，Adam优化器的性能最好，精度和召回率分别为81.2%和90.1%。那达慕的性能几乎与亚当相似，但由于亚当的优势，这个优化器已被用于所提出的模型。在其他两个优化器中，SGD的表现最弱，而RMSProp的表现比Adam和Nadam更差。

3.2. Performance comparison of different optimizers We applied different optimizers to the proposed model to find the best optimizer. SGD, Adam, Nadam, and RMSProp are the optimizers

that have been tested. The segmentation efficiency of the improved UNet+ + model using these optimizers is shown in Table 3. Based on the segmentation results shown in the table, the Adam optimizer has the best performance with precision and recall of 81.2% and 90.1%, respectively. Nadam has a performance almost similar to Adam, but due to Adam’s superiority, this optimizer has been used for the proposed model. In the case of the other two optiizers, the SGD performed the weakest, and the RMSProp performed worse than Adam and Nadam.

1. 使用混合加权损失函数的效果在实验的第二部分中，使用所提出的加权混合损失函数（L2）的效果(Eq。7)与UNet+ +论文（L1）中使用的损失函数相比(Eq。1)研究了UNet+ +和改进的UNet+ +的分割效率。由于多类分割的差异，我们使用了分类交叉熵，而不是使用二值交叉熵损失。UNet+ +和改进的UNet+ +对两种损失函数的分割结果如表4所示。

3.3. Effect of using the hybrid weighted loss function

In the second part of the experiments, the effect of using the proposed weighted hybrid loss function (L2) (Eq. 7) in comparison with the loss function which has been used in UNet+ + paper (L1) (Eq. 1) on the segmentation efficiency of UNet+ + and improved UNet+ + has beeninvestigated. With the difference that due to multi-class segmentation, instead of using binary cross-entropy loss, categorical cross-entropy has been used. The segmentation results of UNet+ + and improved UNet+ + for both loss functions are shown in Table 4.

如表4所示，使用所提出的加权混合损失函数的两种模型的分割效率都明显优于UNet+ +论文中提出的损失函数。数据不平衡是微观图像分割中的一个重要挑战，它降低了分割效率。与不同类相关的数据不平衡使得网络在预测时更倾向于将对象分配给种群数最高的类。考虑到类权值和IoU，这是在提出的损失函数中评估语义分割的主要标准，显著提高了分割性能（显著减少了假阴性和假阳性的数量）。此外，基于所有评估标准的所提出的模型的性能都优于使用这两种损失函数的UNet+ +。

Table 4

Segmentation results of UNet+ + and improved UNet+ + by examining two different loss functions. Underlined numbers represent the highest metrics, and bold numbers represent the highest metrics for each network separately.

表4通过检测两种不同的损失函数，对UNet+ +和改进的UNet+ +的分割结果。下划线数字表示最高指标，粗体数字分别表示每个网络的最高指标。

Table 5

Layer names of the encoder’s backbones from which feature maps are extracted.

表5从中提取特征映射的编码器主干的图层名称。

正在上传…重新上传取消

1. 使用不同的卷积网络作为编码器的主干，为了提高模型的分割效率，在分类任务中表现良好的普通卷积网络可以作为这些模型的编码器的主干。为此，首先，通过将不同的卷积网络作为该模型的编码器的主干，来检验UNet+ +的性能。然后选择具有最佳分割结果的骨干网络来评价该模型的性能。因此，在实验的第二部分中，我们通过考虑不同的卷积网络作为编码器的主干来研究UNet+ +的分割结果。表5给出了提取不同尺度的特征图的层名（编码器和解码器之间的跳过连接的输入）。

3.4. Effect of using different convolution networks as the encoder’s

backbone As mentioned, to improve the segmentation efficiency of the models, common convolutional networks that have performed well in the classification task can be used as encoder’s backbone of these models. For this purpose, first, the performance of UNet+ + is examined by considering different convolution networks as the encoder’s backbone of this model. Then the backbone with the best segmentation results is selected to evaluate the performance of the proposed model. Therefore, in the second part of the experiments, the segmentation results of UNet+ + are investigated by considering different convolution networks as the encoder’s backbone. The layer names from which feature maps at different scales have been extracted (input of skip connections between encoder and decoder) are given in Table 5.

UNet+ +考虑不同编码器主干的分割结果如表6所示。考虑到在之前的实验中，使用L2比L1作为损失函数可以得到更好的分割结果。在本实验中，我们使用l2（所提出的加权混合损失函数）作为该模型的损失函数来更新其权重。如表6所示，通过比较UNet++的不同编码器主干的分割性能，可以得出，效率netb4（在召回标准中）和效率netb6（在所有标准中）的性能优于其他网络。因此，在以后的实验中，我们使用了效率netb6作为UNet++和所提出的模型的编码器的主干。由于免疫细胞高度相似，细菌与组织丝相似，并且在显微镜图像中存在类似于重要物体的伪影和染色点，有可能对它们进行错误分类。精度的值越差

The segmentation results of UNet+ + considering different encoder’s backbones are shown in Table 6. Given that in the previous experiment, the use of L2 compared to L1 as loss function led to better segmentation results. In this experiment, L2 (the proposed weighted hybrid loss function) has been used as the loss function of this model to update its weights. As shown in Table 6, by comparing the segmentation performance of different encoder’s backbones of UNet++, it can be concluded that EfficientNetB4 (in recall criteria) and EfficientNetB6 (in all criteria) have performed better than other networks. Therefore, in later experiments, EfficientNetB6 is used as the encoder’s backbone of UNet++ and the proposed model. Due to the high similarity of the immune cells and the similarity of bacteria to tissue filaments, and the presence of artifacts and staining spots similar to important objects in the microscopic images, it is possible to misclassify them. The worse values of the precision

表6使用不同编码器主干的UNet+ +的分割结果。粗体显示的结果表示与被调查网络相关的最高度量。

Table 6

Segmentation results of UNet+ + using different encoder’s backbone. Results in

bold represent the highest metric related to the investigated network

正在上传…重新上传取消

表7每个提出的组件对UNet+ +分割效率的重要性的调查。

Table 7

Investigation of the importance of each proposed component on the segmentation efficiency of UNet+ +.

正在上传…重新上传取消

与回忆标准相比，该标准意味着假阳性高于假阴性，证明了我们的主张。通过比较DenseNet-121和DenseNet-201的性能，以及效率netb6和效率Netb7的分割效率，可以得出结论，使用更深的网络不一定会得到更好的分割结果。

criterion compared to the recall criteria mean the higher false-positive than the false-negative, proving our claim. By comparing the performance of DenseNet-121 and DenseNet-201, also the segmentation efficiency of EfficientNetB6 and EfficientNetB7, it can be concluded that using a deeper network does not necessarily lead to better segmentation results.

1. 为了更好地了解UNet++架构（效率netb6作为编码器的主干）中的每一个变化对改进显微图像分割结果的影响，我们进行了消融研究。不同结构的分割结果如表7所示。如表7所示，与其他组件相比，在UNet+ +的编码器和解码器部分之间添加跳过连接对提高分割效率的影响最为显著。之后，在解码器部分中添加的跳过连接对提高模型性能的影响最大。最后，在跳过连接中使用挤压和激励起始块的改进份额最低。

3.5. Ablation study

To better understand the effect of each of the changes made in the UNet++ architecture (EfficientNetB6 is used as encoder’s backbone) on improving the results of microscopic images segmentation, an ablation study is conducted. The segmentation results of different structures are shown in Table 7. As shown in Table 7, adding the skip connections between the encoder and decoder section of UNet+ + has the most significant impact on improving segmentation efficiency compared to other components. After that, the added skip connections in the decoder section have the greatest effect on improving the model performance. Finally, the use of squeeze and excitation-inception blocks in the skip connections has the lowest share of the improvements.

1. 在本实验的第一部分，所提出的模型的分割效率与几个重要的和最先进的基于分割的模型，包括UNet++、注意UNet、链接Net、PSPNet和FPN在两个炭疽准备的数据集。并将该模型与基于检测的模型Mask-RCNN模型的分割效率进行了比较。以特征金字塔网络（FPN）作为Mask-RCNN骨干的网络分割结果如表8所示。

3.6. Comparison with the state-of-the-art models

In the first part of this experiment, the segmentation efficiency of the proposed model is compared with several important and state-of-the-art segmentation-based models, including UNet++, Attention UNet, LinkNet, PSPNet, and FPN on two anthrax prepared datasets. Also, the segmentation efficiency of the proposed model is compared with that of Mask-RCNN, which is a detection-based model. The segmentation results of these networks considering Feature Pyramid Network (FPN) as the backbone of Mask-RCNN are shown in Table 8

为了构建一个特征金字塔，FPN还需要一个主干网络，在本文中，我们使用了预训练好的resnet101。UNet+ +和提出的模型中使用的参数和损失函数是基于参数设置部分和以往实验结果设置的（效率netb6、L2和Adam作为两种模型的编码器主干、损失函数和优化器）。在其他网络的情况下，这些都是基于他们的论文中所陈述的内容来设置的。所有基于分割的模型对显微图像的评价结果都采用了滑动窗口技术（我们准备的数据集的窗口大小为64×64，步幅为8像素，MoNuSAC数据集的窗口大小为96×96，重叠度为50%）。为了从斑块中重建整个分割后的图像，将预测的斑块从左到右，从上到下排列，在不同斑块重叠的区域进行平均，得到最终的推断。

根据该表，所提出的模型在两个制备的炭疽热数据集上都优于其他基于所有评价标准的模型。UNet+ +和Unatet在基于分割的模型中也具有可接受的性能。FPN的表现略好于LinkNet，而在所有模型中表现最差的是与PSPNet有关。尽管存在现有的挑战，MaskRCNN也取得了良好的结果，但一些基于分割的模型(改进的UNet++，UNet++和

According to this table, the proposed model has performed better than other models based on all evaluation criteria over both prepared anthrax datasets. UNet+ + and then Attention Unet also had acceptable performance among segmentation-based models after the proposed model. FPN has performed somewhat better than LinkNet, and the worst performance among all the models has been related to PSPNet. MaskRCNN has also achieved good results despite the existing challenges, but some segmentation-based models (improved UNet++, UNet++ and

为了更好地理解UNet++架构（使用效率netb6作为编码器的主干）中的每一个变化对改进微观图像分割结果的影响，我们进行了消融研究。不同结构的分割结果如表7所示。如表7所示，与其他组件相比，在UNet+ +的编码器和解码器部分之间添加跳过连接对提高分割效率的影响最为显著。之后，在解码器部分中添加的跳过连接对提高模型性能的影响最大。最后，在跳过连接中使用挤压和激励起始块的改进份额最低。

attention UNet) have performed somewhat better. One of the reasons for this superiority can be related to the method of segmentation. As mentioned, instance segmentation methods are generally divided into two categories: 1) detection-based methods that first detect the bounding box, 2) segmentation-based methods that use pixel-based dense features.

表现有所一些。造成这种优越性的原因之一可能与分割的方法有关。如前所述，实例分割方法通常分为两类： 1)基于检测的方法首先检测边界框，2)基于分割的方法首先使用基于像素的密集特征。

In some cases, in the Mask-RCNN architecture, which is a detectionbased method, the bounding box does not fully and precisely encapsulate the object, and sometimes parts of the object are cut. Also, because the segmentation phase of this network is applied to the ROIs and because the branch of creating the segmentation mask is a simple fully convolutional network, the segmentation accuracy is somewhat reduced. On the other, because the overlap of objects from the same class is very low in these images, the accuracy of semantic segmentation will not decrease when converting semantic results to instance segmentation in the segmentation-based architectures.

在某些情况下，在Mask-RCNN体系结构中，这是一种基于检测的方法，边界框不能完全和精确地封装对象，有时会对对象的部分进行切割。此外，由于该网络的分割阶段应用于roi，而且由于创建分割掩模的分支是一个简单的全卷积网络，分割精度有所降低。另一方面，由于这些图像中来自同一类的对象的重叠程度很低，因此在基于分割的架构中，将语义结果转换为实例分割时，语义分割的精度不会降低。

In the next part of the experiments, we compared the proposed model’s computational complexity and execution speed with other models. The average segmentation time per patch image related to the prepared anthrax dataset for all segmentation-based models using EfficientNetB6 as their encoder’s backbone (which led to the best results based on the previous experiments) has been compared on GPU (Nvidia GTX 1060 6 GB) in Fig. 8. This time has been obtained by averaging the execution time to process the patch test images.

在接下来的实验中，我们比较了该模型的计算复杂度和执行速度。在图8中的GPU（Nvidia GTX 1060 6 GB）上比较了基于分割的所有基于分割的模型的平均分割时间（基于之前实验的最佳结果）。这个时间是通过平均处理补丁测试图像的执行时间来获得的。为了比较所提模型与其他模型的计算复杂度，不同模型的参数数如图9所示。

为了比较所提模型与其他模型的计算复杂度，不同模型的参数数如图9所示。如图所示。8和9，所提出的模型（有25,621,641个参数）的计算复杂度和执行时间并不比有22,764,809个参数（0.0369svs.0.03103s）长多少。在平均运行时间和参数数量方面，所提出的模型最接近于注意Unet（在测试补丁图像上的平均执行时间相差0.03s）。在所有被研究的模型中，改进的UNet+ +，注意的Unet

正在上传…重新上传取消

To compare the computational complexity of the proposed model with other models, the number of parameters of different models is shown in Fig. 9. As shown in Figs. 8 and 9, the proposed model (with 25,621,641 parameters) computational complexity and execution time is not much longer than UNet+ + with 22,764,809 parameters (0.0369 s vs. 0.03103 s on GPU). In terms of the average runtime and number of parameters, the proposed model is closest to Attention Unet (with 0.03 s difference in the average execution time on the test patch image).

Among all the investigated models, Improved UNet+ +, Attention Unet

图8。在GPU上进行patch测试图像分割的平均测试时间。

Fig. 8. Average test time of the patch test image segmentation on GPU.

正在上传…重新上传取消

图9。不同的基于分割的模型的参数的数量

和UNet+ +的参数数最多，执行时间最长。在其他模型中，最高的执行时间与LinkNet有关。为了进行定性评价，图10给出了两个制备的炭疽数据集的分割结果，其中包含细菌和主要免疫细胞，包括淋巴细胞、巨噬细胞和中性粒细胞。将分割结果与图中第一列的基本事实进行比较，可以发现，尽管存在挑战，但我们提出的架构在语义分割和实例分割方面都优于其他模型。改进的UNet+ +错误地检测并分割了图像中一个不重要的区域为巨噬细胞（如黑盒所示）。

Mask-RCNN和UNet+ +已经能够检测和分割最重要的组件。但UNet+ +错误地将其中一个淋巴细胞预测为巨噬细胞，其中一个不重要的区域被分割为巨噬细胞，图像中两个不重要的区域被Mask-RCNN错误地识别为巨噬细胞。此外，所有的模型都成功地分割了细菌，尽管与人工制品重叠。所有的架构都成功地分割和检测了第二幅显微镜图像的重要成分。将分割结果与图10第二列的地面事实进行比较，发现所提出的改进的UNet+ +模型比其他两种模型分割了更准确的图像中的重要对象。

Mask-RCNN and UNet+ + have been able to detect and segment the most important components. But UNet+ + has mistakenly predicted one of the lymphocyte cells as a macrophage, and one insignificant region has been segmented as macrophage, and two insignificant regions of the

image incorrectly have been identified as macrophage by Mask-RCNN. Also, all models have been successful in segmenting bacteria despite overlapping with an artifact. All architectures have been successful in segmenting and detecting the important components of the second microscopic image. Comparing the segmentation results with the ground truths in the second column of Fig. 10 show that the proposed improved UNet+ + model has segmented important objects in the image more

accurately than two other models.图11显示了炭疽的两个具有挑战性的数据集，它们拥挤且有重叠的成分。在这里，所有的架构在分割图像时都犯了一些错误。两者都是拥挤的图像，有许多挑战，不包含任何基于地面真实图像的重要对象。但是所有的架构都错误地检测并分割了图像中的一些成分（包括染色点、阴影等）。作为图像的重要对象。然而，该模型在检测和分割方面的误差较少，其性能优于其他模型。在第一列有4个错误的图像中，它的性能与UNet+ +相同，并且优于Mask-RCNN，而在第二列有两个错误的图像中，它优于其他两个模型。

Fig. 11 shows two challenging images of the anthrax datasets that are crowded and have overlapping components. Here, all architectures have made some mistakes in segmenting the images. Both are crowded images with many challenges that do not contain any important objects based on the ground truth images. But all architectures have erroneously detected and segmented some components in the image (including staining spots, shadows, etc.) as important objects of the image. However, the proposed model has performed better than other models with fewer mistakes in detection and segmentation. In the image of the first column with 4 mistakes, it had the same performance as UNet+ + and was better than Mask-RCNN, and in the image of the second column with two mistakes, it has been better than the other two models.

图11显示了炭疽的两个具有挑战性的数据集，它们拥挤且有重叠的成分。在这里，所有的架构在分割图像时都犯了一些错误。两者都是拥挤的图像，有许多挑战，不包含任何基于地面真实图像的重要对象。但是所有的架构都错误地检测并分割了图像中的一些成分（包括染色点、阴影等）。作为图像的重要对象。然而，该模型在检测和分割方面的误差较少，其性能优于其他模型。在第一列有4个错误的图像中，它的性能与UNet+ +相同，并且优于Mask-RCNN，而在第二列有两个错误的图像中，它优于其他两个模型。

图10。通过UNet+ +对两个测试图像的分割结果，改进了UNet+ +和Mask-RCNN。a)原始测试图像，b)地面真实图像，c)使用基于标记的分水岭算法的UNet+ +的实例分割结果，d)改进的UNet+ +的实例分割结果，e)Mask-RCNN的实例分割结果。

Fig. 10. Segmentation results of two test images by UNet+ +, improved UNet+ + and Mask-RCNN. a) Raw test images, b) Ground truth images, c) Instance segmentation results of UNet+ + using a marker-based watershed algorithm, d) Instance segmentation results of improved UNet+ +, e) Instance segmentation results of

Mask-RCNN.

在本实验的最后一部分中，所提出的模型的性能也与MoNuSAC数据集上的其他最先进的模型进行了比较。评估在两个不同的数据集上进行，每个数据集包含与20个不同患者相关的4个器官（乳腺、肾、肺和前列腺）的H&E染色全玻片图像。该模型和其他基于分割的模型的分割结果如表9所示。如前所述，窗口大小为96×96、重叠率为50%的滑动窗口技术已被用来扫描该数据集的全幻灯片图像，以检测和分割重要的细胞（上皮细胞、巨噬细胞、淋巴细胞和中性粒细胞）

In the last part of this experiment, the performance of the proposed model has also been compared with other state-of-the-art models on the MoNuSAC dataset. The evauation has been performed on two different datasets, each containing H&E stained whole slide images of four organs (breast, kidney, lung and prostate) related to 20 different patients. The segmentation results of the proposed model and other segmentationbased models are shown in Table 9. As mentioned, the sliding window technique with a window size of 96 × 96 and an overlap of 50% has been used to sweep the whole-slide images of this dataset to detect and segment important cells (epithelial, macrophage, lymphocyte and neutrophil cells)

根据该表可知，在两个MoNuSAC数据集上，所提出的模型都比基于大多数评估标准的其他模型表现得更好。在第一个数据集的情况下，提出的模型的性能已经有点优于UNet+ +和注意Unet（基于骰子标准，它有一个几乎相似的性能UNet++，根据IoU标准，其性能略（0.2%）弱于注意Unet）。在第二个数据集的情况下，与其他模型相比，该模型的性能最好。UNet+ +和之后的注意力Unet在所提出的基于分割的模型中也具有可接受的性能。FPN和LinkNet的性能有些相似（FPN略有优势）。在这个数据集的情况下，与炭疽数据集的情况下，PSPNet在所有模型中表现最差。

According to this table, the proposed model has performed better than other models based on most evaluation criteria over both MoNuSAC datasets. In the case of the first dataset, the performance of the proposed model has been somewhat better than that of UNet+ + andAttention Unet (based on the Dice criteria, it had a almost similar performance to UNet++, and according to the IoU criteria, its performance has slightly (0.2%) been weaker than Attention Unet). In the case of the second dataset, the proposed model had the best performance compared to other models. UNet+ + and after that Attention Unet also had acceptable performance among the segmentation-based models after the proposed model. The performance of FPN and LinkNet has been somewhat similar (with a slight superiority of FPN). In the case of this dataset, as in the case of anthrax datasets, PSPNet had the worst performance

among all the models.

使用UNet+ +和改进的UNet+ +对MoNuSAC数据集的三个子图像的分割结果如图12所示。将分割结果与图中第一行和第二行的地面事实进行比较，可以发现尽管存在挑战，但我们提出的架构表现优于UNet+ +。UNet+ +在检测细胞类型时错误地检测和分割全部或部分淋巴细胞作为上皮细胞的错误。在第二行图像中，图像中被UNet+ +错误地识别为上皮细胞。所提出的模型也有错误(在第一行图像中，一个淋巴细胞被错误地检测和分割为一个上皮细胞。在第二行图像中，图像中不重要的部分被两种模型错误地检测为巨噬细胞)。但总的来说，所提出的模型比UNet+ +犯的错误更少。在第三行图像的情况下，两种模型的性能几乎很相似。从图中模型的掩模图像中可以看出，该模型在所有图像中的分割精度都高于UNet+ +。

The segmentation results for three sub-images of the MoNuSAC dataset using UNet+ + and improved UNet+ + are given in Fig. 12. Comparing the segmentation results with the ground truths in the first and second rows of the figure show that despite the challenges, our proposed architecture has performed better than UNet+ +. UNet+ +has made several mistakes in detecting the type of cells by mistakenly detecting and segmenting all or part of the lymphocyte cells as epithelial cells. In the second row image, an insignificant part of the image has been mistakenly identified as an epithelial cell by UNet+ +. The proposed model also had errors (in the first row image, a lymphocyte cell has been incorrectly detected and segmented as an epithelial cell. In the second row image, an insignificant part of the image has been mistakenly detected as a macrophage by both models). But in general, the proposed model has made fewer mistakes than UNet+ +. In the case of the third row image, both models have performed almost similarly. As seen in the resulting mask images of the models in the figure, the segmentation accuracy of the proposed model in all images has been more than UNet+ +.

1. 利用所提出的加权混合损失函数改进的UNet+ +模型对512×512像素的原始微观图像的类分割结果如表10所示。在炭疽热准备的数据集（数据集1）和MoNuSAC数据集（数据集1）上，分别给出了基于不同标准的评估结果。根据这个表格，我们看到，尽管存在现有挑战，
2. Class-wise segmentation results of the proposed model The class-wise segmentation results of improved UNet+ + model using the proposed weighted hybrid loss function on raw microscopic images of the size 512 × 512 pixels are shown in Table 10. The evaluation results based on different criteria are presented separately for each class on the anthrax prepared dataset (dataset1) and the MoNuSAC dataset (dataset1). According to this table, we see that despite the existing challenges,

表9所提出的改进的UNet+ +模型和在MoNuSAC数据集上的一些最新模型的分割结果

Table 9 Segmentation results of the proposed improved UNet+ + model and some stateof-the-art models on MoNuSAC dataset.

正在上传…重新上传取消

该模型对不同类别的对象（细菌和细胞）进行了良好的检测和分割结果。一般来说，类不平衡导致网络倾向于将数据分类为像素百分比较高的类。这两个数据集在不同类别的数据之间都存在显著的不平衡。特别是在炭疽显微图像中，我们有一个高度不平衡的前景和背景比例。然而，这种不平衡并不影响该方法的类级分割性能

the proposed model have led to good detection and segmentation results for the objects (bacteria and cells) belonging to different classes. Generally, class imbalances cause the network to tend to classify data to the classes with a higher percentage of pixels. Both datasets have significant imbalance between data of different classes. Especially in the case of anthrax microscopic images, we have a highly unbalanced foreground and background ratio. However, this imbalance does not affect the class-wise segmentation performance of the proposed method

这些结果背后的主要原因是，我们使用了一个加权混合损失函数(Eq。 7).这样，在训练过程中，我们给像素百分比较少的类更多的权重。分配给每个类的数据的权重与该类的概率成反比。如表10所示，虽然MoNuSAC数据集中的巨噬细胞和中性粒细胞的数量远小于淋巴细胞和上皮细胞的数量，但该模型对这些细胞的分割效率更高。

The main reason behind these results is that we used a weighted hybrid loss function (Eq. 7). In this way, during training, we give more weights to the classes with less percentage of pixels. The weight assigned to the data of each class is inversely proportional to the probability of that class.

As shown in Table 10, Although the number of macrophage and neutrophil cells is much smaller than the number of lymphocyte and epithelial cells in the MoNuSAC dataset, the segmentation efficiency of the model is higher for these cells

上皮细胞的分割结果比其他细胞稍弱。上皮细胞多薄而细长，细胞核大，几何形状为鳞状细胞。但这取决于这些细胞属于身体的哪个器官；它们可以有不同的外观。巨噬细胞较大，中性粒细胞具有比巨噬细胞小的特征性多叶细胞核，胞浆较大。淋巴细胞较小，但细胞核较大，细胞质较小

The segmentation results of epithelial cells are somewhat weaker than other cells. Epithelial cells are mostly thin and elongated and have a large nucleus and a geometric shape of squamous cell. But depending on which organ of the body these cells belong to; they can have different appearances. Macrophage cells are larger, neutrophils have a characteristic multilobed nucleus smaller than macrophages, and they have a large cytoplasm. The lymphocytes are smaller, but have a large nucleus and a small cytoplasm

在炭疽数据集的情况下，尽管淋巴细胞和炭疽杆菌细胞数量比其他类别的对象少，但淋巴细胞的分割效率很高，提出的模型在细菌的情况下表现良好。B. 炭疽病的细菌很薄，呈杆状。在显微图像的组织丝或色点是在这些图像中检测和分割炭疽杆菌的挑战之一。与其他类别的模型相比，对中性粒细胞和淋巴细胞的检测和分割的效率相对较高。

In the case of the anthrax dataset, despite the small number of

lymphocytes cells and B. anthracis bacteria compared to the objects of

other classes, the segmentation efficiency is high for lymphocyte cells

and the proposed model performed well in the case of the bacteria.

B. anthracis bacteria is thin and rod-shaped. The presence of tissue filaments or color spots in some parts of microscopic images that are very

similar to bacteria is one of the challenges in detecting and segmenting

B. anthracis bacteria in these images. Detection and segmentation of

neutrophils cells and then lymphocytes cells have been done with relatively higher efficiency compared to the objects of other classes by the

proposed model.

1. 本文旨在开发一种可靠的显微图像分析系统来诊断组织疾病、转移、患者预后等，这是一项具有挑战性的任务。由于疲劳和视力下降的疾病的显微镜诊断可能导致准确性降低和错误率增加。显微图像中大量的物体（细菌和细胞），图像杂波和成分重叠，某些物体之间的高度相似性，同一物体在图像不同部位的外观不同，以及图像伪影是这方面的重要挑战。此外，在炭疽病的情况下，在炭疽病的显微镜图像中，炭疽杆菌的数量非常少，这使得专家很难进行诊断。

3.8. Discussion

This paper aims to develop a reliable system in microscopic image analysis to diagnose tissue diseases, metastasis, patient prognosis, etc that is a challenging task. Microscopic diagnosis of diseases by humans due to fatigue and decreased visual acuity may lead to decreased accuracy and increased error rate. A large number of objects (bacteria and cells) in microscopic images, the image clutter and component overlap, the high similarity of some objects to each other, different appearance of the same object in different parts of the image, and image artifacts are

the important challenges in this regard. In addition, in the case of anthrax disease, the very small number of B. anthracis bacteria in the microscopic images of this disease makes it difficult for a specialist to diagnose.

我们提出了一种基于改进的UNet+ +的新方法，例如微观图像的分割。通过检测和分割炭疽杆菌和免疫系统的主要细胞，在两个制备的炭疽数据集上对炭疽病诊断模型进行了评估。它还在公开的MoNuSAC数据集上进行了评估，用于检测和分割不同的细胞类型。该模型利用了结合多尺度特征的优点，在解码器部分和编码器和解码器子网之间添加多尺度跳跃连接，提高了UNet+ +的性能。通过信道自适应特征映射重新校准，在编码器和解码器之间添加的跳跃连接中集成压缩和激励起始块，可以更好地提取多尺度特征

We proposed a novel method based on improved UNet+ +, for

instance segmentation of microscopic images. The proposed model was

evaluated on two prepared anthrax datasets for diagnosis of this disease

through the detection and segmentation of B. anthracis bacteria and

major cells of the immune system. It is also evaluated on the publicly

available MoNuSAC dataset for the detection and segmentation of

different cell types. The proposed model has used the advantage of

combining multi-scale features by adding multi-scale skip connections

in its decoder section and between the encoder and decoder subnetworks which improved the performance of UNet+ +. Integration of

squeeze and excitation-inception blocks in the added skip connections

between encoder and decoder by channel-wise adaptively feature maps

recalibration led to the extraction of multi-scale features with better

表示此外，在解决数据不平衡问题的基础上，还提出了一种加权混合损失函数

representation. Also, in this paper, a weighted hybrid loss function was

proposed, which, in addition to solving the problem of data imbalance

图12。MoNuSAC数据集的三幅子图像的分割结果。a)原始测试子图像，b)地面真实掩模，c)使用UNet+ +的预测掩模，d)使用改进的UNet+ +的预测掩模。上皮细胞为红色，淋巴细胞为黄色，巨噬细胞为绿色，中性粒细胞为蓝色。表10改进的UNet+ +模型对炭疽热和MoNuSAC数据集的原始测试图像的分类分割结果。

Fig. 12. Segmentation results of three sub-images of MoNuSAC dataset. a) Raw test sub-images, b) Ground truth masks, c) Predicted masks using UNet+ +, d)

Predicted masks using improved UNet+ +. Epithelial cells have been annotated in red, lymphocytes in yellow, macrophages in green, and neutrophils in blue.

Table 10

Class-wise segmentation results of improved UNet+ + model on the raw test images of anthrax and MoNuSAC datasets.

正在上传…重新上传取消

针对不同的类别，提高了模型的分割效率（UNet++和改进的UNet++）。我们通过实验验证了所提出的模型在显微图像分割中诊断疾病方面的潜力。在第一个实验中，我们研究了使用所提出的加权混合损失函数（L2）与UNet+ +论文（L1）中所提出的损失函数对UNet+ +和改进的UNet+ +分割效率的影响。实验结果表明，基于不同的评价标准，使用加权混合损失函数的两种模型的分割效率都得到了提高，且该模型的性能都优于使用两种损失函数的UNet+ +。在接下来的实验中，我们研究了使用不同的卷积网络作为UNet+ +的编码器主干的影响。结果表明，与其他基于所有评价标准的卷积网络相比，作为编码器的骨干，结果最好。同时，也是根据研究结果得出的针对不同的类别，提高了模型的分割效率（UNet++和改进的UNet++）。我们通过实验验证了所提出的模型在显微图像分割中诊断疾病方面的潜力。在第一个实验中，我们研究了使用所提出的加权混合损失函数（L2）与UNet+ +论文（L1）中所提出的损失函数对UNet+ +和改进的UNet+ +分割效率的影响。实验结果表明，基于不同的评价标准，使用加权混合损失函数的两种模型的分割效率都得到了提高，且该模型的性能都优于使用两种损失函数的UNet+ +。在接下来的实验中，我们研究了使用不同的卷积网络作为UNet+ +的编码器主干的影响。结果表明，与其他基于所有评价标准的卷积网络相比，作为编码器的骨干，结果最好。同时，也是根据研究结果得出的针对不同的类别，提高了模型的分割效率（UNet++和改进的UNet++）。我们通过实验验证了所提出的模型在显微图像分割中诊断疾病方面的潜力。在第一个实验中，我们研究了使用所提出的加权混合损失函数（L2）与UNet+ +论文（L1）中所提出的损失函数对UNet+ +和改进的UNet+ +分割效率的影响。实验结果表明，基于不同的评价标准，使用加权混合损失函数的两种模型的分割效率都得到了提高，且该模型的性能都优于使用两种损失函数的UNet+ +。在接下来的实验中，我们研究了使用不同的卷积网络作为UNet+ +的编码器主干的影响。结果表明，与其他基于所有评价标准的卷积网络相比，作为编码器的骨干，结果最好。同时，也是根据研究结果得出的

related to different classes, improved the segmentation efficiency of the models (UNet++ and improved UNet++). We experimentally verified the potential of the proposed model for microscopic image segmentation to diagnose diseases. In the first experiment, the effect of using the proposed weighted hybrid loss function (L2) compared to the proposed loss function in UNet+ + paper (L1) on the segmentation efficiency of UNet+ + and improved UNet+ +was investigated. The results of this experiment showed that based on different evaluation criteria, the segmentation efficiency of both models using the proposed weighted hybrid loss function were improved, and the performance of the proposed model was better than UNet+ + using both loss functions. In the next experiment, the effect of using different convolution networks as the encoder’s backbone of UNet+ + was investigated. According to the results, using EfficientNetB6 as the encoder’s backbone compared to other convolutional networks based on all evaluation criteria led to the best results. Also, based on the results of

在这个实验中，使用更深层次的网络并不一定会得到更好的分割结果。然后，在消融研究中，研究了每个提出的修改对UNet+ +体系结构对分割结果的影响。由于不同尺度的特征图包含不同的信息，多尺度特征的组合对提高分割结果有很大的影响。实验结果表明，与其他组件相比，在UNet+ +的编码器和解码器部分之间添加跳过连接对提高分割效率的影响最大。之后，在解码器部分中添加的跳过连接对提高模型性能的影响最为显著。初始块用于编码器和解码器之间添加的跳过连接，提取出更有用的多尺度特征，具有更好的表示能力和更高的计算效率。此外，挤压和激励技术提高了由开始产生的表示的质量

this experiment, using a deeper network does not necessarily lead to better segmentation results. Then, in an ablation study, the effect of each proposed modification on UNet+ + architecture on the segmentation results was investigated. Given that feature maps at different scales contain different information, the combination of multi-scale features has a great impact on improving segmentation results. The experimental results demonstrated that adding the skip connections between the encoder and decoder section of UNet+ + had the greatest impact on enhancing segmentation efficiency compared to other components. After that, the added skip connections in the decoder section had the most significant effect on improving the model performance. The inception blocks, which were used in the added skip connections between encoder and decoder,

extract the more useful multi-scale features with better representation and high computational efficiency. Also, Squeeze and excitation technique improves the quality of representations produced by the inception

然后，将该模型的计算复杂度和执行速度与其他模型进行了比较。基于这些实验的结果，提出的计算复杂度和执行时间模型不比UNet++（25621641参数和平均运行时间0.0369年代模型相比22764809参数和平均运行时间0.03103103s++）。在所有研究的模型中，改进的UNet+ +、Unatet和UNet+ +的参数数最多，执行时间最长。因此，提出的唯一限制模型相比UNet+ +和其他调查模型是根据新的跳过连接和挤压和激励块，它需要更多的执行时间（0.03s的平均运行时间差异测试补丁图像相比注意Unet和0.05 s差异相比UNet++）。

Afterwards, the proposed model’s computational complexity and execution speed were compared with other models. Based on the results of these experiments, the computational complexity and execution time of the proposed model is not much longer than UNet+ + (with

25,621,641 parameters and average run time 0.0369 s in the proposed model compared to 22,764,809 parameters and average run time 0.03103 s in UNet++). Among all the investigated models, Improved UNet+ +, Attention Unet and UNet+ + had the highest number of parameters and the longest execution time, respectively. Therefore, the only limitation of the proposed model compared to UNet+ + and other investigated models is that according to the new skip connections and squeeze and excitation-inception blocks, it takes a little more execution time on GPU (with 0.03 s difference in the average run time on test patch images compared to Attention Unet and 0.05 s difference compared to UNet++).

最后，在最后的实验中，评价了该方法在炭疽热和MoNuSAC数据集上的类分割效率。基于这个实验的结果，考虑到类失衡的问题已经解决使用混合加权损失函数，提出模型在检测和分割类的对象比例较少的像素（淋巴细胞细胞和炭疽杆菌炭疽数据集和巨噬细胞和中性粒细胞MoNuSAC数据集）。在MoNuSAC数据集中，基于所有标准的模型对巨噬细胞和中性粒细胞的分割效率更高，而对上皮细胞的分割效率则低于其他细胞。在炭疽热数据集的情况下，结果更加平衡，并且所提出的检测和分割中性粒细胞和淋巴细胞的方法导致了更好的结果。

Finally, in the last experiment, the class-wise segmentation efficiency of the proposed method on the anthrax and MoNuSAC datasets was evaluated. Based on the results of this experiment, considering that the problem of class imbalance has been solved using the proposed hybrid weighted loss function, the proposed model has performed well in detecting and segmenting the objects of classes with less percentage of pixels (lymphocytes cells and B. anthracis bacteria in the anthrax dataset and macrophage and neutrophil cells in MoNuSAC dataset). The segmentation efficiency of the proposed model based on all criteria was higher for macrophage and neutrophil cells and for epithelial cells was weaker than other cells in MoNuSAC dataset. In the case of anthrax dataset, the results were more balanced, and the proposed method for detecting and segmenting neutrophils and lymphocytes led to somewhat better results.

根据所进行的评估，尽管显微图像存在挑战，但所提出的模型可以作为诊断组织疾病、转移、患者预后等方面的可靠系统。特别是对于炭疽病，本文对其自动诊断进行了研究。由于伤口内的细菌数量和显微镜视力本身就较低，炭疽杆菌的自动定位将增加显微镜诊断的难度。虽然因为组织的细丝可能看起来很细长，或者可能有像细菌的图像伪影，但在某些情况下，算法错误地将它们识别为细菌。在检测免疫系统的重要细胞时也存在同样的问题，因为它们之间有明显的相似性。

According to the performed evaluations, despite the challenges in microscopic images, the proposed model can be used as a reliable system for diagnosing tissue diseases, metastasis, patient prognosis, etc. Especially in the case of anthrax disease, which its automatic diagnosis was investigated in this paper. Because the number of bacteria is inherently low in the wound and microscopic vision, the automatic localization of B. anthracis bacteria would facilitate the difficulty of microscopic diagnosis. Although because filaments of tissue may appear elongated and thin, or there may be image artifacts that look like bacteria, in some cases, the algorithm mistakenly identifies them as bacteria. The same problem exists in detecting important cells of the immune system because of their apparent similarity to each other.

4.结论一种改进的基于UNet+ +的检测和实例方法

4. Conclusion

A novel improved UNet+ + based method for detection and instance

提出了对炭疽数据集中的重要对象（炭疽杆菌和免疫系统的主要细胞和MoNuSAC数据集中的不同类型的细胞）在显微镜图像中进行分割，用于疾病诊断。数据集的图像是在不同的条件下获得的，分割它们存在许多挑战。该模型利用了结合多尺度特征的优点，在解码器部分和编码器的连接处添加跳跃连接，提高了UNet+ +的性能。此外，在编码器和解码器之间添加的跳过连接中使用了压缩和激发初始块，以提取具有更好表示的特征。在改进的UNet+ +模型中，采用批归一化和退出技术加速收敛，提高系统的通用性。为了克服不同类别数据不平衡的问题，提出了一种加权混合损失函数，提高了分割效率。

segmentation of important objects (B. anthracis bacteria and major cells

of the immune system in the anthrax datasets and different types of cells

in MoNuSAC dataset) in the microscopic images for diseases diagnosis was proposed. The images of the datasets have been obtained under different conditions, and there are many challenges to segmenting them. The proposed model used the advantage of combining multi-scale features by adding skip connections in its decoder section and at the junction of the encoder to the decoder, which improved the performance of UNet+ +. Also, squeeze and excitation-inception blocks were used in the added skip connections between encoder and decoder to extract features with better representations. The batch normalization and dropout techniques were also used to accelerate convergence and increase the generalizability of the system in the improved UNet+ +model. To overcome the problem of data imbalance of different classes, a weighted hybrid loss function was proposed, which improved the segmentation efficiency.

基于IoU、Dice、召回率和精度标准，改进的UNet+ +在第一个炭疽杆菌数据集和重要免疫细胞分割上表现出良好的性能，对应值分别为80.1%、88.9%、94.1%和84.3%。此外，在第二个炭疽热数据集和MoNuSAC数据集的情况下，所提出的模型比其他已调查的最先进的模型表现得更好。基于原始显微图像的实验结果表明，所提出的UNet+ +模型对疾病的自动诊断具有很高的有效效果。与UNet+ +相比，该模型的唯一局限性是，根据新的跳过连接和挤压和激励初始块，它需要更多的训练时间（比UNet+ +多0.05秒，比GPU上的注意力Unet长0.03秒）。

Based on the IoU, Dice, recall and precision criteria, the improved UNet+ + showed good performance on the first anthrax dataset in B. anthracis bacteria and important immune cells segmentation with corresponding values of 80.1%, 88.9%, 94.1% and 84.3%. Also, in the case

of the second anthrax dataset and MoNuSAC dataset, the proposed model performed better than other investigated state-of-the-art models. Based on the experimental results on raw microscopic images, it was shown that the proposed UNet+ + model could be highly effective in the automatic diagnosis of diseases. The only limitation of the proposed model compared to UNet+ + is that according to the new skip connections and squeeze and excitation-inception blocks, it takes a little more time to be trained (0.05 s more than UNet++ and 0.03 s longer to Attention Unet on GPU).

所有对手稿中报告的工作做出重大贡献的人（如技术帮助、写作和编辑协助、一般支持），但不符合作者标准的人，均在确认中指定，并已书面允许我们指定姓名。如果我们没有包括一个致谢，那么这表明我们没有收到来自非作者的实质性贡献

CRediT authorship contribution statement Fatemeh Hoorali: Conceptualization, Writing- Original draft preparation, Software Hossein Khosravi: Supervision, Writing- Reviewing and Editing Bagher Moradi: Data Preparation, Validation, WritingReviewing Acknowledgements All persons who have made substantial contributions to the work reported in the manuscript (e.g., technical help, writing and editing assistance, general support), but who do not meet the criteria for authorship, are named in the Acknowledgements and have given us their written permission to be named. If we have not included an Acknowledgements, then that indicates that we have not received substantial contributions from non-authors