CNN第三周 Image_segmentation_Unet_v2

姑苏落雨心中

已于 2022-08-29 20:49:54 修改

阅读量2.2k

点赞数 2

分类专栏： CNN 文章标签：计算机视觉深度学习卷积神经网络

于 2021-08-06 09:31:49 首次发布

本文链接：https://blog.csdn.net/a1137608040/article/details/119413588

版权

CNN 专栏收录该内容

8 篇文章 16 订阅

订阅专栏

Image Segmentation with U-Net

U-Net是一种CNN的类型，U-Net是在快速、精确地分割图像。
U-Net分割类型称为语义图像分割，对于物体检测相似都会存在一个问题就是：图像中有什么物体，这些物体在图像中的位置在哪?
在之前的目标检测中边界框标记对象时边界框可能有不是该对象的像素，而对于语义图像分割则会更加准确的标记出每个对象的精确遮罩。下面是语义分割图像：
在这里插入图片描述
对于自动驾驶汽车来说特定的标识对于驾驶是需要考虑的关键因素，这个时候就需要对图像中有像素级的理解

Packages

使用到的库包有以下:

import tensorflow as tf
import numpy as np

from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Dropout 
from tensorflow.keras.layers import Conv2DTranspose
from tensorflow.keras.layers import concatenate

from test_utils import summary, comparator

Load and Split the Data

通过下载好的数据加载分割
在这里插入图片描述

N = 2
#分别从image_list和mask_list中读取图像和掩码
img = imageio.imread(image_list[N])
mask = imageio.imread(mask_list[N])
#mask = np.array([max(mask[i, j]) for i in range(mask.shape[0]) for j in range(mask.shape[1])]).reshape(img.shape[0], img.shape[1])

# arr中第一个存原始图像，第二个存分割图像
fig, arr = plt.subplots(1, 2, figsize=(14, 10))
arr[0].imshow(img)
arr[0].set_title('Image')
arr[1].imshow(mask[:, :, 0])
arr[1].set_title('Segmentation')

在这里插入图片描述
结果：
对于(x,y,z)这种结构我们在pycharm中做出打印，由于之前都是(a,b,c,d)这种打印出来为axb的数组每个元素为cxd，尝试用3个参数：

结果为：

可以看出变为了X个YxZ的数组。

结果：

Split Your Dataset into Unmasked and Masked Images

将数据分为有掩码的图像和无掩码的图像，也就是原图像和掩码是否在一起

Preprocess Your Data

通常来说你的图像做归一化一般除以255让你的像素在0~1之间，而现在我们可以使用Tensorflow给的函数tf.image.convert_image_dtype且类型为tf.float32对图像进行归一化操作使得像素在0 ~ 1之间

# 这个是对有掩码的图像进行预处理
def process_path(image_path, mask_path):
    img = tf.io.read_file(image_path)
    img = tf.image.decode_png(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)

    mask = tf.io.read_file(mask_path)
    mask = tf.image.decode_png(mask, channels=3)
    mask = tf.math.reduce_max(mask, axis=-1, keepdims=True)
    return img, mask
# 这个是对图像掩码分离的进行处理
def preprocess(image, mask):
    input_image = tf.image.resize(image, (96, 128), method='nearest')
    input_mask = tf.image.resize(mask, (96, 128), method='nearest')

    return input_image, input_mask

image_ds = dataset.map(process_path)
processed_image_ds = image_ds.map(preprocess)

以上内容是图像中有掩码和无掩码进行处理

U-Net

U-Net是因为形状为U型而命名，用于肿瘤检测是语义分割任务的不错选择，在建立之前叫做Fully Convolutional Network, or FCN 全卷机网络，它用了转置的卷积层代替了全连接层。因为全连接层破坏了特征信息的位置空间，使用转置卷积代替它并且输入的大小也不再固定。
但是FCN的最终层由于向下采样过多使得信息丢失，丢失了信息就很难上采样，最后会导致输出很粗糙。
U-Net引入了FCN使用了相似的设计但是在有的重要地方是不同的。首先是代替了末尾转置卷积的地方
，它使用匹配数量的卷积将输入图像向下采样到特征映射，并将转置卷积将这些映射向上采样到原始输入图像大小。除此之外还增加了一个跳跃连接为了在编码过程中信息得到保存，跳跃连接这有助于防止信息丢失，以及模型过拟合。
U-Net的网络模型结构：
在这里插入图片描述

Contracting path：图片首先通过几个卷积层传输，这些卷积层降低了图像的高度和宽度但是增加了nc通道数的数量
Crop function：创建跳跃连接使用，将图像连接到展开路径上的图像中
Expanding path ：将图片大小恢复增长为原来的大小，收缩通道。
Final Feature Mapping Block：在最后一层使用到了1x1卷积减少通道维度，使每个类有一个层

Encoder (Downsampling Block)

在这里插入图片描述

Exercise 1 - conv_block

def conv_block(inputs=None, n_filters=32, dropout_prob=0, max_pooling=True):
    """
    Convolutional downsampling block
    卷积下采样块
    Arguments:
        inputs -- Input tensor 输入的张量
        n_filters -- Number of filters for the convolutional layers 卷积层过滤器数量
        dropout_prob -- Dropout probability  dropout的概率在Dropout层中作为参数使用
        max_pooling -- Use MaxPooling2D to reduce the spatial dimensions of the output volume 使用MaxPooling2D减少输出维度
    Returns: 
        next_layer, skip_connection --  Next layer and skip connection outputs 下一层和跳跃连接输出
    """

    ### START CODE HERE
    conv = Conv2D(n_filters, # Number of filters
                  3,   # Kernel size   
                  activation='relu',
                  padding='same',
                  kernel_initializer='he_normal')(inputs)
    conv = Conv2D(n_filters, # Number of filters
                  3,   # Kernel size
                  activation='relu',
                  padding='same',
                  kernel_initializer='he_normal')(conv)
    ### END CODE HERE
    
    # if dropout_prob > 0 add a dropout layer, with the variable dropout_prob as parameter
    if dropout_prob > 0:
         ### START CODE HERE
        conv = tf.keras.layers.Dropout(dropout_prob)(conv)
         ### END CODE HERE
         
        
    # if max_pooling is True add a MaxPooling2D with 2x2 pool_size
    if max_pooling:
        ### START CODE HERE
        next_layer = tf.keras.layers.MaxPooling2D((2,2))(conv)
        ### END CODE HERE
        
    else:
        next_layer = conv
        
    skip_connection = conv
    
    return next_layer, skip_connection

通过conv_block()函数最后返回值为next_layer, skip_connection

Decoder (Upsampling Block)

解码器，或上采样块，将特征上采样回原始图像大小。在每个上采样级别，您将获得相应编码器块的输出，并将其连接到下一个解码器块之前。
在这里插入图片描述
解码器有两个新组建，up和merge，这些是转置卷积核跳跃连接

upsampling_block

def upsampling_block(expansive_input, contractive_input, n_filters=32):
    """
    Convolutional upsampling block
    卷积上采样块
    Arguments:
        expansive_input -- Input tensor from previous layer 上一层输入的张量
        contractive_input -- Input tensor from previous skip layer 跳跃层输入的张量
        n_filters -- Number of filters for the convolutional layers 过滤器数量
    Returns: 
        conv -- Tensor output 输出的张量
    """
    
    ### START CODE HERE
    up = Conv2DTranspose(
                 n_filters,    # number of filters
                 3,    # Kernel size
                 strides=(2,2),
                 padding='same')(expansive_input)
    
    # Merge the previous output and the contractive_input
    # 
    merge = concatenate([up, contractive_input], axis=3)
    conv = Conv2D(n_filters,   # Number of filters
                 3,     # Kernel size
                 activation='relu',
                 padding='same',
                 kernel_initializer='he_normal')(merge)
    conv = Conv2D(n_filters,  # Number of filters
                 3,   # Kernel size
                 activation='relu',
                 padding='same',
                 kernel_initializer='he_normal')(conv)
    ### END CODE HERE
    
    return conv

Build the Model

我们将 encoder、bottleneck和decoder连接在一起

Exercise 3 - unet_model

在unet_model函数中指定输入形状、过滤器数量、和类别数量

模型的前半部分：

从一个conv块开始，该块接受模型的输入和过滤器的数量
然后，将每个块的第一个输出元素链到下一个卷积块的输入
接下来，在每一步将过滤器的数量增加一倍
在conv_block4增加一个0.3的dropout
在最后的卷积块中再次设置0.3dropout，并且弄一个最大池化层

下半部分：

使用cblock5作为expansive_input, cblock4作为contractive_input，使用n_filters * 8。这是bottleneck层
将前一个块的输出链接为expansive_input和相应的contractive块输出。
注意，必须在最大池层之前使用contractive块的第二个元素。
在每一步中，使用前一个块的一半的过滤器数量
conv9是卷积层的relu激活层，归一化并且same填充
最后conv10采用类数为过滤器，kernel size为1

def unet_model(input_size=(96, 128, 3), n_filters=32, n_classes=23):
    """
    Unet model
    U-Net 模型
    Arguments:
        input_size -- Input shape  输入图片的shape
        n_filters -- Number of filters for the convolutional layers 过滤器数量
        n_classes -- Number of output classes 输出类别的数量
    Returns: 
        model -- tf.keras.Model 返回的是一个模型
    """
    # 首先是获得输入的shape
    inputs = Input(input_size)
    # Contracting Path (encoding) 创造一个编码器(下采样块)
    # Add a conv_block with the inputs of the unet_ model and n_filters
    ### START CODE HERE def conv_block(inputs=None, n_filters=32, dropout_prob=0, max_pooling=True):
    # 卷积向下采样块 返回的是next_layer, skip_connection ，是有两个元素的返回
    cblock1 = conv_block(inputs, n_filters)
    # Chain the first element of the output of each block to be the input of the next conv_block. 
    #将每个块输出的第一个元素链接为下一个conv_block的输入，cblock1[0]为第一个元素
    # Double the number of filters at each new step在每个新层中将过滤器的数量增加一倍
    cblock2 = conv_block(cblock1[0], n_filters * 2) # 注意增加一倍就是乘上2的指数倍
    cblock3 = conv_block(cblock2[0], n_filters * 4)
    cblock4 = conv_block(cblock3[0], n_filters * 8, dropout_prob=0.3) # Include a dropout of 0.3 for this layer
    # Include a dropout of 0.3 for this layer, and avoid the max_pooling layer 这里说的是turn off 池化层
    cblock5 = conv_block(cblock4[0], n_filters * 16, dropout_prob=0.3, max_pooling=False) 
    ### END CODE HERE
    
    # Expanding Path (decoding) 进入解码器(向上采样)
    # Add the first upsampling_block.
    # Use the cblock5[0] as expansive_input and cblock4[1] as contractive_input and n_filters * 8
    ### START CODE HERE def upsampling_block(expansive_input, contractive_input, n_filters=32):
    ublock6 = upsampling_block(cblock5[0], cblock4[1],  n_filters * 8)
    # Chain the output of the previous block as expansive_input and the corresponding contractive block output.
    # Note that you must use the second element of the contractive block i.e before the maxpooling layer. 
    # 将前一层的输出作为输入，将相应的contractive block作为输出，并且过滤器每次按倍数减少
    # At each step, use half the number of filters of the previous block 
    ublock7 = upsampling_block(ublock6, cblock3[1],  n_filters * 4) 
    ublock8 = upsampling_block(ublock7, cblock2[1],  n_filters * 2) 
    ublock9 = upsampling_block(ublock8, cblock1[1],  n_filters)
    ### END CODE HERE

    conv9 = Conv2D(n_filters,
                 3,
                 activation='relu',
                 padding='same',
                 kernel_initializer='he_normal')(ublock9)

    # Add a Conv2D layer with n_classes filter, kernel size of 1 and a 'same' padding
    ### START CODE HERE
    conv10 = Conv2D(n_classes, (1,1), padding='same')(conv9)
    ### END CODE HERE
    
    model = tf.keras.Model(inputs=inputs, outputs=conv10)

    return model

这里最需要注意的就是对于conv_block()它最后返回的是两个值next_layer, skip_connection。

Loss Function

在语义分割中，有多少对象分类就有多少掩码。对于使用的数据集，每个掩码中的每个像素都被分配了一个单一的整数概率，它属于一个特定的类，从0到num_classes-1。正确的类是具有较高概率的层。正确的类是具有较高概率的层。
不同于分类交叉，分类交叉中标签是one-hot编码。下面我们使用稀疏分类交叉熵作为损失函数，执行像素级多分类预测。处理大量类的时候，稀疏分类交叉熵比其他损失函数更有效

unet.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

上面代码是对U-Net网络进行编译

Dataset Handling

下面，定义一个函数，它允许你同时显示一个输入图像和它的ground truth: true mask。真正的蒙版是您训练过的模型输出的目标是尽可能接近的。

for image, mask in image_ds.take(1):
    sample_image, sample_mask = image, mask
    print(mask.shape)
display([sample_image, sample_mask])

for image, mask in processed_image_ds.take(1):
    sample_image, sample_mask = image, mask
    print(mask.shape)
display([sample_image, sample_mask])

Train the Model

EPOCHS = 40
VAL_SUBSPLITS = 5
BUFFER_SIZE = 500
BATCH_SIZE = 32
processed_image_ds.batch(BATCH_SIZE)
train_dataset = processed_image_ds.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
print(processed_image_ds.element_spec)
model_history = unet.fit(train_dataset, epochs=EPOCHS)

Create Predicted Masks

def create_mask(pred_mask):
    pred_mask = tf.argmax(pred_mask, axis=-1)
    pred_mask = pred_mask[..., tf.newaxis]
    return pred_mask[0]