背景
语义分割是许多视觉理解系统重要组成部分。主要有以下的应用场景:医学图像分析,无人驾驶,地物分类等。最早的语义分割算法是基于阈值化、直方图、区域划分、聚类等方法,而基于深度学习的分割方法主要分为以下几类:
- Fully convolutional networks
- Convolutional models with graphical models
- Encoder-decoder based models
- Multi-scaledand pyramid network based models
- R-CNN based models(for instance models)
- Dilated convolutional models and DeepLab family
- Recurrent network based models
- Attention-based models
- Generative models and adversarial training
- Convolutional models with active contour
本文主要主要介绍Deeplab系列算法,Deeplab算法里面主要介绍Deeplabv3plus算法,后续会将语义分割算法综述的论文翻译一遍(见参考1)。
Deeplab Family
Dilated convolution(扩张/空洞 卷积)如下图所示。
y
i
=
∑
k
=
1
K
x
[
i
+
r
k
]
w
[
k
]
y_i=\sum^{K}_{k=1}x[i+rk]w[k]
yi=∑k=1Kx[i+rk]w[k],其中r是膨胀率,即卷积核里权重之间的间距。
DeeplabV1
Semantic Image Segmentation With Deep Convolution Nets and Fully Connected CRFS
Deeplabv1主要结合了深度卷积神经网络(DCNNS)和概率图模型(CRFs)的方法。由于DCNNs的高级特征的平移不等性,在重复的池化和下采样导致DCNNs在语义分割任务精准度不够。针对信号下采样或池化降低分辨率,Deeplab采用空洞卷积算法扩展感受野,获取更多的语义信息。
Deeplabv1做了以下的修改:
- VGG16的全连接层转为卷积
- 最后两个最大池化层去掉下采样
- 后续卷积层改为空洞卷积
DeeplabV2
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
相比DeeplabV1 Deeplabv2的主要改进是:
-
提出了ASPP,使用多个不同的采样率采样得到多尺度分割对象获得更好的分割效果。
-
Backbone使用Restnet
DeeplabV3
Rethinking Atrous Convolution for Semantic Image Segmentation
概述
-
Deeplab V3主要优化问题有:
a. Feature map的分辨率过低导致后续恢复为原图的分辨率不够精确
b. 对多尺度物体检测表现不好 -
针对以上两个问题,提出了:
a. 空洞卷积
b. multi-grid method
c. ASPP
DeeplabV3+
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
与DeeplabV3不同,DeeplabV3+主要有以下特点:
- 使用encoder-decoder(高层特征提供语义,decoder逐步恢复边界信息),提升了分割效果,同时关注边界的信息(类似U-net结构)
- encoder结构中,采用Xception作为DCNN,使用ASPP,并将深度可分离卷积应用在ASPP和encoder模块当中
- decoder结构:用于恢复目标边界细节
- 模型获得了高分割精度,并详细分析了模型设计原则和模型变体
深度可分离卷积:
-
output stride影响
这里的 output_stride 表示为 输入图与输出图的比值。对于图像分类任务,通常 output_stride=32;对于语义分割,可以采用output_stride =16 or 8 提密集特征图,以及要修改最后的一个或者两个模块的滑动值(比如stride从2修改为1。当output_stride=8时,最后两个模块的空洞值分别为rate=2,4)。DeeplabV3 使用空洞卷积提取任意分辨率的特征。Deeplabv3 采用 ASPP 模块,可以通过设置不同的 rate 来提取不同尺寸的卷积特征,以及采用 image-level feature;将Deeplabv3 最后一层的特征图作为本文的 encoder 的输出(包含256通道的特征图),并包含丰富的语义特征。 -
Xception
Deeplabv3+实现
model.py:
from __future__ import absolute_import, division, print_function
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import (Activation, BatchNormalization,
Concatenate, Conv2D, DepthwiseConv2D,
Dropout, GlobalAveragePooling2D, Input,
Lambda, Softmax, ZeroPadding2D)
from tensorflow.keras.models import Model
from nets.mobilenet import mobilenetV2
from nets.Xception import Xception
def SepConv_BN(x, filters, prefix, stride=1, kernel_size=3, rate=1, depth_activation=False, epsilon=1e-3):
# 计算padding的数量,hw是否需要收缩
if stride == 1:
depth_padding = 'same'
else:
kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
pad_total = kernel_size_effective - 1
pad_beg = pad_total // 2
pad_end = pad_total - pad_beg
x = ZeroPadding2D((pad_beg, pad_end))(x)
depth_padding = 'valid'
# 如果需要激活函数
if not depth_activation:
x = Activation('relu')(x)
# 分离卷积,首先3x3分离卷积,再1x1卷积
# 3x3采用膨胀卷积
x = DepthwiseConv2D((kernel_size, kernel_size), strides=(stride, stride), dilation_rate=(rate, rate),
padding=depth_padding, use_bias=False, name=prefix + '_depthwise')(x)
x = BatchNormalization(name=prefix + '_depthwise_BN', epsilon=epsilon)(x)
if depth_activation:
x = Activation('relu')(x)
# 1x1卷积,进行压缩
x = Conv2D(filters, (1, 1), padding='same',
use_bias=False, name=prefix + '_pointwise')(x)
x = BatchNormalization(name=prefix + '_pointwise_BN', epsilon=epsilon)(x)
if depth_activation:
x = Activation('relu')(x)
return x
def Deeplabv3(input_shape, num_classes, alpha=1., backbone="mobilenet", downsample_factor=16):
img_input = Input(shape=input_shape)
if backbone=="xception":
x, atrous_rates, skip1 = Xception(img_input, alpha, downsample_factor=downsample_factor)
elif backbone=="mobilenet":
x, atrous_rates, skip1 = mobilenetV2(img_input, alpha, downsample_factor=downsample_factor)
else:
raise ValueError('Unsupported backbone - `{}`, Use mobilenet, xception.'.format(backbone))
size_before = tf.keras.backend.int_shape(x)
# ASPP特征提取模块
# 利用不同膨胀率的膨胀卷积进行特征提取
# 分支0
b0 = Conv2D(256, (1, 1), padding='same', use_bias=False, name='aspp0')(x)
b0 = BatchNormalization(name='aspp0_BN', epsilon=1e-5)(b0)
b0 = Activation('relu', name='aspp0_activation')(b0)
# 分支1 rate = 6 (12)
b1 = SepConv_BN(x, 256, 'aspp1',
rate=atrous_rates[0], depth_activation=True, epsilon=1e-5)
# 分支2 rate = 12 (24)
b2 = SepConv_BN(x, 256, 'aspp2',
rate=atrous_rates[1], depth_activation=True, epsilon=1e-5)
# 分支3 rate = 18 (36)
b3 = SepConv_BN(x, 256, 'aspp3',
rate=atrous_rates[2], depth_activation=True, epsilon=1e-5)
# 分支4 全部求平均后,再利用expand_dims扩充维度,之后利用1x1卷积调整通道
b4 = GlobalAveragePooling2D()(x)
b4 = Lambda(lambda x: K.expand_dims(x, 1))(b4)
b4 = Lambda(lambda x: K.expand_dims(x, 1))(b4)
b4 = Conv2D(256, (1, 1), padding='same', use_bias=False, name='image_pooling')(b4)
b4 = BatchNormalization(name='image_pooling_BN', epsilon=1e-5)(b4)
b4 = Activation('relu')(b4)
# 直接利用resize_images扩充hw
b4 = Lambda(lambda x: tf.compat.v1.image.resize_images(x, size_before[1:3], align_corners=True))(b4)
#-----------------------------------------#
# 将五个分支的内容堆叠起来
# 然后1x1卷积整合特征。
#-----------------------------------------#
x = Concatenate()([b4, b0, b1, b2, b3])
# 利用conv2d压缩 32,32,256
x = Conv2D(256, (1, 1), padding='same', use_bias=False, name='concat_projection')(x)
x = BatchNormalization(name='concat_projection_BN', epsilon=1e-5)(x)
x = Activation('relu')(x)
x = Dropout(0.1)(x)
skip_size = tf.keras.backend.int_shape(skip1)
#-----------------------------------------#
# 将加强特征边上采样
#-----------------------------------------#
x = Lambda(lambda xx: tf.compat.v1.image.resize_images(xx, skip_size[1:3], align_corners=True))(x)
#----------------------------------#
# 浅层特征边
#----------------------------------#
dec_skip1 = Conv2D(48, (1, 1), padding='same',use_bias=False, name='feature_projection0')(skip1)
dec_skip1 = BatchNormalization(name='feature_projection0_BN', epsilon=1e-5)(dec_skip1)
dec_skip1 = Activation(tf.nn.relu)(dec_skip1)
#-----------------------------------------#
# 与浅层特征堆叠后利用卷积进行特征提取
#-----------------------------------------#
x = Concatenate()([x, dec_skip1])
x = SepConv_BN(x, 256, 'decoder_conv0',
depth_activation=True, epsilon=1e-5)
x = SepConv_BN(x, 256, 'decoder_conv1',
depth_activation=True, epsilon=1e-5)
#-----------------------------------------#
# 获得每个像素点的分类
#-----------------------------------------#
# 512,512
size_before3 = tf.keras.backend.int_shape(img_input)
# 512,512,21
x = Conv2D(num_classes, (1, 1), padding='same')(x)
x = Lambda(lambda xx:tf.compat.v1.image.resize_images(xx,size_before3[1:3], align_corners=True))(x)
x = Softmax()(x)
model = Model(img_input, x, name='deeplabv3plus')
return model
Xeception
from tensorflow.keras import layers
from tensorflow.keras.layers import (Activation, BatchNormalization, Conv2D,
DepthwiseConv2D, ZeroPadding2D)
def _conv2d_same(x, filters, prefix, stride=1, kernel_size=3, rate=1):
# 计算padding的数量,hw是否需要收缩
if stride == 1:
return Conv2D(filters,
(kernel_size, kernel_size),
strides=(stride, stride),
padding='same', use_bias=False,
dilation_rate=(rate, rate),
name=prefix)(x)
else:
kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
pad_total = kernel_size_effective - 1
pad_beg = pad_total // 2
pad_end = pad_total - pad_beg
x = ZeroPadding2D((pad_beg, pad_end))(x)
return Conv2D(filters,
(kernel_size, kernel_size),
strides=(stride, stride),
padding='valid', use_bias=False,
dilation_rate=(rate, rate),
name=prefix)(x)
def SepConv_BN(x, filters, prefix, stride=1, kernel_size=3, rate=1, depth_activation=False, epsilon=1e-3):
# 计算padding的数量,hw是否需要收缩
if stride == 1:
depth_padding = 'same'
else:
kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
pad_total = kernel_size_effective - 1
pad_beg = pad_total // 2
pad_end = pad_total - pad_beg
x = ZeroPadding2D((pad_beg, pad_end))(x)
depth_padding = 'valid'
# 如果需要激活函数
if not depth_activation:
x = Activation('relu')(x)
# 分离卷积,首先3x3分离卷积,再1x1卷积
# 3x3采用膨胀卷积
x = DepthwiseConv2D((kernel_size, kernel_size), strides=(stride, stride), dilation_rate=(rate, rate),
padding=depth_padding, use_bias=False, name=prefix + '_depthwise')(x)
x = BatchNormalization(name=prefix + '_depthwise_BN', epsilon=epsilon)(x)
if depth_activation:
x = Activation('relu')(x)
# 1x1卷积,进行压缩
x = Conv2D(filters, (1, 1), padding='same',
use_bias=False, name=prefix + '_pointwise')(x)
x = BatchNormalization(name=prefix + '_pointwise_BN', epsilon=epsilon)(x)
if depth_activation:
x = Activation('relu')(x)
return x
def _xception_block(inputs, depth_list, prefix, skip_connection_type, stride,
rate=1, depth_activation=False, return_skip=False):
residual = inputs
for i in range(3):
residual = SepConv_BN(residual,
depth_list[i],
prefix + '_separable_conv{}'.format(i + 1),
stride=stride if i == 2 else 1,
rate=rate,
depth_activation=depth_activation)
if i == 1:
skip = residual
if skip_connection_type == 'conv':
shortcut = _conv2d_same(inputs, depth_list[-1], prefix + '_shortcut',
kernel_size=1,
stride=stride)
shortcut = BatchNormalization(name=prefix + '_shortcut_BN')(shortcut)
outputs = layers.add([residual, shortcut])
elif skip_connection_type == 'sum':
outputs = layers.add([residual, inputs])
elif skip_connection_type == 'none':
outputs = residual
if return_skip:
return outputs, skip
else:
return outputs
def Xception(inputs, alpha=1, downsample_factor=16):
if downsample_factor == 8:
entry_block3_stride = 1
middle_block_rate = 2 # ! Not mentioned in paper, but required
exit_block_rates = (2, 4)
atrous_rates = (12, 24, 36)
elif downsample_factor == 16:
entry_block3_stride = 2
middle_block_rate = 1
exit_block_rates = (1, 2)
atrous_rates = (6, 12, 18)
else:
raise ValueError('Unsupported factor - `{}`, Use 8 or 16.'.format(downsample_factor))
# 256,256,32
x = Conv2D(32, (3, 3), strides=(2, 2),
name='entry_flow_conv1_1', use_bias=False, padding='same')(inputs)
x = BatchNormalization(name='entry_flow_conv1_1_BN')(x)
x = Activation('relu')(x)
# 256,256,64
x = _conv2d_same(x, 64, 'entry_flow_conv1_2', kernel_size=3, stride=1)
x = BatchNormalization(name='entry_flow_conv1_2_BN')(x)
x = Activation('relu')(x)
# 256,256,128 -> 256,256,128 -> 128,128,128
x = _xception_block(x, [128, 128, 128], 'entry_flow_block1',
skip_connection_type='conv', stride=2,
depth_activation=False)
# 128,128,256 -> 128,128,256 -> 64,64,256
# skip = 128,128,256
x, skip1 = _xception_block(x, [256, 256, 256], 'entry_flow_block2',
skip_connection_type='conv', stride=2,
depth_activation=False, return_skip=True)
x = _xception_block(x, [728, 728, 728], 'entry_flow_block3',
skip_connection_type='conv', stride=entry_block3_stride,
depth_activation=False)
for i in range(16):
x = _xception_block(x, [728, 728, 728], 'middle_flow_unit_{}'.format(i + 1),
skip_connection_type='sum', stride=1, rate=middle_block_rate,
depth_activation=False)
x = _xception_block(x, [728, 1024, 1024], 'exit_flow_block1',
skip_connection_type='conv', stride=1, rate=exit_block_rates[0],
depth_activation=False)
x = _xception_block(x, [1536, 1536, 2048], 'exit_flow_block2',
skip_connection_type='none', stride=1, rate=exit_block_rates[1],
depth_activation=True)
return x,atrous_rates,skip1
损失函数设计
import tensorflow as tf
from tensorflow.keras import backend as K
def dice_loss_with_CE(beta=1, smooth = 1e-5):
def _dice_loss_with_CE(y_true, y_pred):
y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon())
CE_loss = - y_true[...,:-1] * K.log(y_pred)
CE_loss = K.mean(K.sum(CE_loss, axis = -1))
tp = K.sum(y_true[...,:-1] * y_pred, axis=[0,1,2])
fp = K.sum(y_pred , axis=[0,1,2]) - tp
fn = K.sum(y_true[...,:-1], axis=[0,1,2]) - tp
score = ((1 + beta ** 2) * tp + smooth) / ((1 + beta ** 2) * tp + beta ** 2 * fn + fp + smooth)
score = tf.reduce_mean(score)
dice_loss = 1 - score
# dice_loss = tf.Print(dice_loss, [dice_loss, CE_loss])
return CE_loss + dice_loss
return _dice_loss_with_CE
def CE():
def _CE(y_true, y_pred):
y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon())
CE_loss = - y_true[...,:-1] * K.log(y_pred)
CE_loss = K.mean(K.sum(CE_loss, axis = -1))
# dice_loss = tf.Print(CE_loss, [CE_loss])
return CE_loss
return _CE
参考
- 语义分割综述
- DeeplabV3+ Tensorflow2.0实现有用的话请给我一个star,非常感谢!