【Tensoflow2】Unet实现CityScapes语义分割及resize插值问题

1.数据介绍

  • 本文主要使用Tensorflow2实现Unet模型在城市景观数据场景下的语义分割实现。使用数据为CityScapes,数据主页:https://www.cityscapes-dataset.com/。数据分为原图,分割图,并包含训练集及测试集。
    原图
    分割图像
  • 语义分割后数据输出共34类。每一类为单独的物体,不同物体标注不同的颜色。
物体名称输出类型ID颜色
unlabeled0( 0, 0, 0)
ego vehicle1( 0, 0, 0)
rectification border2( 0, 0, 0)
out of roi3( 0, 0, 0)
static4( 0, 0, 0)
dynamic5(111, 74, 0)
ground6( 81, 0, 81)
road7(128, 64,128)
sidewalk8(244, 35,232)
parking9(250,170,160)
rail track10(230,150,140)
building11( 70, 70, 70)
wall12(102,102,156)
fence13(190,153,153)
guard rail14(180,165,180)
bridge15(150,100,100)
tunnel16(150,120, 90)
pole17(153,153,153)
polegroup18(153,153,153)
traffic light19(250,170, 30)
traffic sign20(220,220, 0)
vegetation21(107,142, 35)
terrain22(152,251,152)
sky23( 70,130,180)
person24(220, 20, 60)
rider25(255, 0, 0)
car26( 0, 0,142)
truck27( 0, 0, 70)
bus28( 0, 60,100)
caravan29( 0, 0, 90)
trailer30( 0, 0,110)
train31( 0, 80,100)
motorcycle32( 0, 0,230)
bicycle33(119, 11, 32)

2.Unet模型

  • Unet模型是广泛用于无人驾驶,医学影像语义分割场景的基础模型。Unet使用编码器-解码器的U型结构,左半部分下采样进行特征工程,右半部分上采样结合skip connection融合特征。最后输出每个像素的分类(此处共34类)。
    在这里插入图片描述

3.开发流程

1).读取数据及数据预处理

  • 导包
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import glob
import matplotlib.pyplot as plt
  • 加载数据,乱序
    注意:如果使用Kaggle NoteBook进行训练,图片及语义分割图像不一一对应
# 训练集
train_path_raw = glob.glob(r'D:\tensorflowDataSet\cityUNET\image2\images\train\*\*.png')
train_label_path_raw = glob.glob(r'D:\tensorflowDataSet\cityUNET\gtFine\train\*\*gtFine_labelIds.png')
# 测试集
test_path_raw = glob.glob(r'D:\tensorflowDataSet\cityUNET\image2\images\val\*\*.png')
test_label_path_raw = glob.glob(r'D:\tensorflowDataSet\cityUNET\gtFine\val\*\*gtFine_labelIds.png')

train_total_num = len(train_path_raw) #2975
test_total_num = len(test_path_raw)   #500
# 打乱数据集
index = np.random.permutation(train_total_num)
train_path = np.array(train_path)[index]
train_label_path = np.array(train_label_path)[index]
  • 图片预处理,原始图片像素为(2048,1024),考虑到显存,统一resize到(256,256),并使用图像增强。
def read_image(path): # 加载原始图像
    image = tf.io.read_file(path)
    image = tf.image.decode_png(image,channels=3)
    return image

def read_label(path): # 加载语义分割图像
    image = tf.io.read_file(path)
    image = tf.image.decode_png(image,channels=1)
    return image

def crop_img(image,mask):      # 图像增强,随机剪裁为(256,256)
    
    img = tf.concat([image,mask],axis=-1)#将原图和语义图合并剪裁 
    # resize成(256,256),插值方法用邻近插值(为啥要用临近插值?后续会讲)
    img = tf.image.resize(img,(280, 280),method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
    img = tf.image.random_crop(img,size=[256,256,4])
    
    return img[:,:,:3],img[:,:,3:]

def normalize(image,mask):    #归一化到[-1,1],加速训练
    image = tf.cast(image,tf.float32) /127.5 - 1
    mask = tf.cast(mask,tf.int32)
    return image,mask

def image_handler_train(image_path,label_path): #训练集加载
    train_img = read_image(image_path)
    label_img = read_label(label_path)
    
    image,mask = crop_img(train_img,label_img)
    
    if tf.random.uniform(()) > 0.5: #图像增强-随机左右翻转
        image = tf.image.flip_left_right(image)
        mask = tf.image.flip_left_right(mask)
        
    if tf.random.uniform(()) > 0.5: #图像增强-明暗度
        image = tf.image.adjust_brightness(image,0.5)
        mask = tf.image.adjust_brightness(mask,0.5)
        
    image,mask = normalize(image,mask)
    return image,mask

def image_handler_test(image_path,label_path):  #测试集加载(不需要进行增强)
    train_img = read_image(image_path)
    label_img = read_label(label_path)
    
    train_img = tf.image.resize(train_img,size=(256,256),method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
    label_img = tf.image.resize(label_img,size=(256,256),method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
        
    image,mask = normalize(train_img,label_img)
    return image,mask
  • 构建数据集
BUFFERSIZE = 100 
BATCHSIZE = 20
AUTO = tf.data.experimental.AUTOTUNE #自动加载
#构建训练集
train_path_dataset = tf.data.Dataset.from_tensor_slices((train_path,train_label_path))
train_dataset = train_path_dataset.map(image_handler_train,num_parallel_calls=AUTO)
train_dataset = train_dataset.repeat().shuffle(BUFFERSIZE).batch(BATCHSIZE).prefetch(AUTO)
#构建测试集
test_path_dataset = tf.data.Dataset.from_tensor_slices((test_path,test_label_path))
test_dataset = test_path_dataset.map(image_handler_test,num_parallel_calls=AUTO)
test_dataset = test_dataset.batch(BATCHSIZE)

2).函数式模型构建

  • 模型构建的时候可以根据Unet一层一层的写,也可以使用模型组合的方式。
# 按照Unet模型一层一层写
def make_unet_model():
    input = keras.Input(shape=(256,256,3))
    x1 = layers.Conv2D(64,3,strides=1,padding='same')(input)
    x1 = layers.BatchNormalization()(x1)
    x1 = layers.ReLU()(x1)
    x1 = layers.Conv2D(64,3,strides=1,padding='same')(x1)
    x1 = layers.BatchNormalization()(x1)
    x1 = layers.ReLU()(x1)  #(256,256,64)

    x2 = layers.MaxPooling2D()(x1) #(128,128,64)

    x2 = layers.Conv2D(128,3,strides=1,padding='same')(x2)
    x2 = layers.BatchNormalization()(x2)
    x2 = layers.ReLU()(x2)
    x2 = layers.Conv2D(128,3,strides=1,padding='same')(x2)
    x2 = layers.BatchNormalization()(x2)
    x2 = layers.ReLU()(x2)  #(128,128,128)

    x3 = layers.MaxPooling2D()(x2) #(64,64,128)

    x3 = layers.Conv2D(256,3,strides=1,padding='same')(x3)
    x3 = layers.BatchNormalization()(x3)
    x3 = layers.ReLU()(x3)
    x3 = layers.Conv2D(256,3,strides=1,padding='same')(x3)
    x3 = layers.BatchNormalization()(x3)
    x3 = layers.ReLU()(x3) #(64,64,256)

    x4 = layers.MaxPooling2D()(x3) #(32,32,256)

    x4 = layers.Conv2D(512,3,padding='same',strides=1)(x4)
    x4 = layers.BatchNormalization()(x4)
    x4 = layers.ReLU()(x4)
    x4 = layers.Conv2D(512,3,padding='same',strides=1)(x4)
    x4 = layers.BatchNormalization()(x4)
    x4 = layers.ReLU()(x4) #(32,32,512)

    x5 = layers.MaxPooling2D()(x4) #(16,16,512)

    x5 = layers.Conv2D(1024,3,padding='same',strides=1)(x5)
    x5 = layers.BatchNormalization()(x5)
    x5 = layers.ReLU()(x5)
    x5 = layers.Conv2D(1024,3,padding='same',strides=1)(x5)
    x5 = layers.BatchNormalization()(x5)
    x5 = layers.ReLU()(x5) #(16,16,1024)

    x4_ = layers.Conv2DTranspose(512,2,padding='same',strides=2)(x5)
    x4_ = layers.BatchNormalization()(x4_)
    x4_ = layers.ReLU()(x4_) #(32,32,512)

    x3_ = tf.concat([x4,x4_],axis=-1) #(32,32,1024)
    x3_ = layers.Conv2D(512,3,padding='same',strides=1)(x3_)
    x3_ = layers.BatchNormalization()(x3_)
    x3_ = layers.ReLU()(x3_)
    x3_ = layers.Conv2D(512,3,padding='same',strides=1)(x3_)
    x3_ = layers.BatchNormalization()(x3_)
    x3_ = layers.ReLU()(x3_) #(32,32,512)

    x3_= layers.Conv2DTranspose(256,2,padding='same',strides=2)(x3_)
    x3_ = layers.BatchNormalization()(x3_)
    x3_ = layers.ReLU()(x3_) #(64,64,256)

    x2_ = tf.concat([x3,x3_],axis=-1) #(64,64,512)
    x2_ = layers.Conv2D(256,3,padding='same',strides=1)(x2_)
    x2_ = layers.BatchNormalization()(x2_)
    x2_ = layers.ReLU()(x2_)
    x2_ = layers.Conv2D(256,3,padding='same',strides=1)(x2_)
    x2_ = layers.BatchNormalization()(x2_)
    x2_ = layers.ReLU()(x2_) #(64,64,256)

    x2_ = layers.Conv2DTranspose(128,2,padding='same',strides=2)(x2_)
    x2_ = layers.BatchNormalization()(x2_)
    x2_ = layers.ReLU()(x2_) #(128,128,128)

    x1_ = tf.concat([x2,x2_],axis=-1) #(128,128,256)
    x1_ = layers.Conv2D(128,3,padding='same',strides=1)(x1_)
    x1_ = layers.BatchNormalization()(x1_)
    x1_ = layers.ReLU()(x1_)
    x1_ = layers.Conv2D(128,3,padding='same',strides=1)(x1_)
    x1_ = layers.BatchNormalization()(x1_)
    x1_ = layers.ReLU()(x1_) #(128,128,128)

    x1_= layers.Conv2DTranspose(64,2,padding='same',strides=2)(x1_) 
    x1_ = layers.BatchNormalization()(x1_)
    x1_ = layers.ReLU()(x1_) #(256,256,64)

    x_ = tf.concat([x1,x1_],axis=-1) #(256,256,128)
    x_ = layers.Conv2D(64,3,padding='same',strides=1)(x_)
    x_ = layers.BatchNormalization()(x_)
    x_ = layers.ReLU()(x_)
    x_ = layers.Conv2D(64,3,padding='same',strides=1)(x_)
    x_ = layers.BatchNormalization()(x_)
    x_ = layers.ReLU()(x_) #(256,256,64)
	#输出层,共34类
    output = layers.Conv2D(34,1,padding='same',strides=1,activation='softmax')(x_) #(256,256,34)

    return keras.Model(inputs=input,outputs=output)
  • 使用模型组合构建模型
# 构建基础模型
def down_sample(filters,kernel):  #下采样卷积层
   model = keras.Sequential()
   model.add(layers.Conv2D(filters,kernel,strides=1,padding='same'))
   model.add(layers.BatchNormalization())
   model.add(layers.ReLU())
   return model

def max_pooling():                #下采样池化层
   model = keras.Sequential()
   model.add(layers.MaxPooling2D())
   return model

def up_sample(filters,kernel):    #上采样卷积层
   model = keras.Sequential()
   model.add(layers.Conv2DTranspose(filters,kernel,padding='same',strides=2))
   model.add(layers.BatchNormalization())
   model.add(layers.ReLU())
   return model
def get_unet_model():
    input = keras.Input(shape=(256,256,3)) #(256,256,3)
    
    down_models = [
        down_sample(64,3), #(256,256,64)
        down_sample(64,3),  
        max_pooling(),     #(128,128,64)
        down_sample(128,3),#(128,128,128)
        down_sample(128,3), 
        max_pooling(),     #(64,64,128)
        down_sample(256,3),#(64,64,256)
        down_sample(256,3), 
        max_pooling(),     #(32,32,256)
        down_sample(512,3),#(32,32,512)
        down_sample(512,3),
        max_pooling(),     #(16,16,512)
        down_sample(1024,3),#(16,16,1024)
        down_sample(1024,3)#(16,16,1024)
    ]
    
    down_output = []
    x = input
    for i,down in enumerate(down_models):
        x = down(x)
        if i % 3 == 1:
            down_output.append(x)
            
    down_output = reversed(down_output[:-1])
    
    up_models = [
        up_sample(512,2),
        up_sample(256,2),
        up_sample(128,2),
        up_sample(64,2)
    ]
    
    up_conv2_model_1 = [
        down_sample(512,3),
        down_sample(256,3),
        down_sample(128,3),
        down_sample(64,3)
    ]
    
    up_conv2_model_2 = [
        down_sample(512,3),
        down_sample(256,3),
        down_sample(128,3),
        down_sample(64,3)
    ]
    
    for d_out,up,conv2_1,conv2_2 in zip(down_output,up_models,up_conv2_model_1,up_conv2_model_2):
        x = up(x)
        x = tf.concat([d_out,x],axis=-1)
        x = conv2_1(x)
        x = conv2_2(x)
    #输出层,共34类
    x = layers.Conv2D(34,1,padding='same',strides=1,activation='softmax')(x)
    
    return keras.Model(inputs=input,outputs=x)
#构建模型
model = get_unet_model()

3).模型编译及训练

  • 模型评估指标
    i) acc:分类正确率。
    ii) MeanIou:交集与并集之比,越大拟合效果越好。
#重写__call__方法,MeanIou默认按照OneHot进行计算。
class MeanIoU(tf.keras.metrics.MeanIoU):
   def __call__(self, y_true, y_pred, sample_weight=None):
       y_pred = tf.argmax(y_pred, axis=-1)
       return super().__call__(y_true, y_pred, sample_weight=sample_weight)
  • 模型编译
model.compile(optimizer='adam',
               loss='sparse_categorical_crossentropy',
               metrics=['acc',MeanIoU(num_classes=34)])
  • 模型训练
EPOCHS = 80
train_step = train_total_num // BATCHSIZE
val_step = test_total_num // BATCHSIZE

history = model.fit(train_dataset,
           epochs=EPOCHS,
           steps_per_epoch= train_step,
           validation_data=test_dataset,
           validation_steps=val_step)
  • 指标可视化
    损失曲线
    正确率曲线
    MeanIou

  • 结论
    由图像可以看出,训练严重过拟合,建议使用Dropout抑制过拟合。测试集表现不佳,建议增加训练数据集,使用图像增强。

  • 模型预测
    第一列为原始图像,第二列为标签分割图,第三列为预测图像。
    在这里插入图片描述

4).Resize插值问题

  • 在对图像进行resize时,使用了tf.image.ResizeMethod.NEAREST_NEIGHBOR邻近插值法,其实不仅有这种插值法,如图:
class ResizeMethod(object):
 BILINEAR = 'bilinear' #双线性
 NEAREST_NEIGHBOR = 'nearest' #最近邻插值法
 BICUBIC = 'bicubic'  #双三次插值
 AREA = 'area' #区域插值
 LANCZOS3 = 'lanczos3' #领域采样插值
 LANCZOS5 = 'lanczos5'
 GAUSSIAN = 'gaussian'
 MITCHELLCUBIC = 'mitchellcubic'
  • 这里为啥要选择邻近插值法,邻近插值法在resize之后不会改变在标签数据的像素类型(目标数据34类)。
label_img = read_image(test_label_path_raw[1])

label_img = tf.image.resize(label_img,size=(256,256),method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
np.unique(label_img.numpy())
# output[ 1,  2,  3,  4,  7,  8,  9, 11, 13, 17, 20, 21, 23, 25, 26, 33]  


#以下插值法可能会影响分类。
label_img = tf.image.resize(label_img,size=(256,256),method=tf.image.ResizeMethod.BILINEAR)
np.unique(label_img.numpy())
# output[ 1.  ,  1.25,  1.5 ,  1.75,  2.  ,  2.5 ,  3.  ,  3.25,  4.  ,4.5 ,  5.5 ,  6.5 ,  6.75,  7.  ,  7.25,  7.5 ,  7.75,  8. ......]


label_img = tf.image.resize(label_img,size=(256,256),method=tf.image.ResizeMethod.AREA)
np.unique(label_img.numpy())
# output[ 1.  ,  1.03125,  1.0625 ,  1.09375,  1.125  ,  1.15625, 1.1875 ,  1.21875,  1.25   ,  1.28125,  1.34375,  1.375  ...]

  • 4
    点赞
  • 31
    收藏
    觉得还不错? 一键收藏
  • 11
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 11
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值