tensorflow2.3.0语义分割（二）_tensorflow语义分割-CSDN博客

本文链接：https://blog.csdn.net/wchwdog13/article/details/112686243

在上一节中，我们将数据集划分为了训练数据集和测试数据集，本节进行数据处理、模型构建与训练。

一、数据处理

为使程序更加整洁、可读性更强，我们将数据读取和处理的代码段定义为函数。

#根据图片路径读取一张图片
def read_jpg(path):
    img = tf.io.read_file(path)
    img = tf.image.decode_jpeg(img, channels=3)
    return img

#根据图像分割文件路径读取一张图像分割文件
def read_png(path):
    img = tf.io.read_file(path)
    img = tf.image.decode_png(img, channels=1)
    return img

#将输入图片和分割图像文件进行标准化处理
#input_image为待识别的图片，input_mask为分割图像文件
def normalize(input_image, input_mask):
    input_image = tf.cast(input_image, tf.float32)/127.5 - 1 #使图片每个像素对应的值范围在-1至1之间
    input_mask -= 1 #使分割图像文件每个像素对应的可能取值为0、1、2
    return input_image, input_mask

#调用上面三个函数进行图像的读取与处理，返回待识别图像和分割图像文件
def load_image(input_image_path, input_mask_path):
    input_image = read_jpg(input_image_path)
    input_mask = read_png(input_mask_path)
    input_image = tf.image.resize(input_image, (224, 224))
    input_mask = tf.image.resize(input_mask, (224, 224))
    input_image, input_mask = normalize(input_image, input_mask)
    return input_image, input_mask

定义批训练有关的参数

BATCH_SIZE = 8
BUFFER_SIZE = 100
STEPS_PER_EPOCH = train_count // BATCH_SIZE
VALIDATION_STEPS = test_count // BATCH_SIZE

将训练数据集和测试数据集中的每个文件调用函数load_image进行处理。

train = dataset_train.map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
test = dataset_test.map(load_image)

将数据集进行乱序与分批。

train_dataset = train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
train_dataset = train_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
test_dataset = test.batch(BATCH_SIZE)

可以查看一下训练数据集

train_dataset

输出结果为<PrefetchDataset shapes: ((None, 224, 224, 3), (None, 224, 224, 1)), types: (tf.float32, tf.float32)>。其中(None, 224, 224, 3)是待识别图片的结构，(None, 224, 224, 1)是该图片对应的分割图像文件。(None, 224, 224, 3)中的3代表该图片为彩色，语义分割本质是对图片中的每个像素进行识别，识别出该像素是属于哪一类，而(None, 224, 224, 1)中的1代表的是识别出的类别，取值为“0”或“1”或“2”。

下面，我显示一下训练数据集中的一个图片及对应的分割文件。

%matplotlib inline

for img, musk in train_dataset.take(1):
    plt.subplot(1,2,1)
    plt.imshow(tf.keras.preprocessing.image.array_to_img(img[0]))
    plt.subplot(1,2,2)
    plt.imshow(tf.keras.preprocessing.image.array_to_img(musk[0]))

二、模型构建

语义分割的模型采用全卷积网路（FCN），具体可以参考查看这个资料https://blog.csdn.net/qinghuaci666/article/details/80863032

首先进行特征提取，使用的权重为在别人在imagenet数据集上训练好的权重，这是迁移学习中的知识点。include_top = False表示只使用卷积基，而不使用上采样部分。因此，我需要自己训练上采样层的权重。

#weights='imagenet'表示使用在imagenet上训练好的权重
#include_top = False表示只使用卷积基，而不使用全连接部分
covn_base = tf.keras.applications.VGG16(weights='imagenet', 
                                        input_shape=(224, 224, 3),
                                        include_top=False)

依据fcn的定义，下面取出中间输出结果

layer_names = [
    'block5_conv3',   # 14x14×512
    'block4_conv3',   # 28x28*512
    'block3_conv3',   # 56x56*256
    'block5_pool',    # 7x57*512
]
layers = [covn_base.get_layer(name).output for name in layer_names]
# 创建特征提取模型
down_stack = tf.keras.Model(inputs=covn_base.input, outputs=layers)
down_stack.trainable = False

进行卷积上采样

inputs = tf.keras.layers.Input(shape=(224, 224, 3))
o1, o2, o3, x = down_stack(inputs)
x1 = tf.keras.layers.Conv2DTranspose(512, 3, padding='same', 
                                     strides=2, activation='relu')(x)  # 14*14*512
x1 = tf.keras.layers.Conv2D(512, 3, padding='same', activation='relu')(x1)  # 14*14*512
c1 = tf.add(o1, x1)    # 14*14*512
x2 = tf.keras.layers.Conv2DTranspose(512, 3, padding='same', 
                                     strides=2, activation='relu')(c1)  # 28*28*512
x2 = tf.keras.layers.Conv2D(512, 3, padding='same', activation='relu')(x2)  # 28*28*512
c2 = tf.add(o2, x2)
x3 = tf.keras.layers.Conv2DTranspose(256, 3, padding='same', 
                                     strides=2, activation='relu')(c2)  # 256*256*256
x3 = tf.keras.layers.Conv2D(256, 3, padding='same', activation='relu')(x3)  # 256*256*256
c3 = tf.add(o3, x3)

x4 = tf.keras.layers.Conv2DTranspose(128, 3, padding='same', 
                                     strides=2, activation='relu')(c3)  # 112*112*128
x4 = tf.keras.layers.Conv2D(128, 3, padding='same', activation='relu')(x4)  # 112*112*128

predictions = tf.keras.layers.Conv2DTranspose(3, 3, padding='same', 
                                     strides=2, activation='softmax')(x4)   # 224*224*3

model = tf.keras.models.Model(inputs=inputs, outputs=predictions)

三、模型训练

首先编译模型

model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

训练

EPOCHS = 20
history = model.fit(train_dataset, 
                          epochs=EPOCHS,
                          steps_per_epoch=STEPS_PER_EPOCH,
                          validation_steps=VALIDATION_STEPS,
                          validation_data=test_dataset)

可见，准确率在0.9以上，还是很高。

下面，查看并对比一下原图、分割文件和识别结果。

num = 3
for image, mask in test_dataset.take(1):
    pred_mask = model.predict(image)#通过模型进行预测，输出结果的结构为224*224*3，需要注意的是最后一维是中的“3”是3个概率值，代表该像素分别属于各个类别的概率。
    pred_mask = tf.argmax(pred_mask, axis=-1)#取最后一维的最大值，即取最大概率值的类别。
    pred_mask = pred_mask[..., tf.newaxis]#pred_mask的结构为224*224*1
    
    plt.figure(figsize=(10, 10))
    for i in range(num):
        plt.subplot(num, 3, i*num+1)
        plt.imshow(tf.keras.preprocessing.image.array_to_img(image[i]))
        plt.subplot(num, 3, i*num+2)
        plt.imshow(tf.keras.preprocessing.image.array_to_img(mask[i]))
        plt.subplot(num, 3, i*num+3)
        plt.imshow(tf.keras.preprocessing.image.array_to_img(pred_mask[i]))