Keras 实现 FCN 语义分割并训练自己的数据之 FCN-16s、FCN-8s

Keras 实现 FCN 语义分割并训练自己的数据之 FCN-16s、FCN-8s

Keras 实现 FCN 语义分割并训练自己的数据之 FCN-32s 中已经实现了最简单的 FCN 语义分割, 但是分割效果很不理想, 所以接下来按论文讲的方式提升分割效果

一. 转置卷积

Keras 实现 FCN 语义分割并训练自己的数据之 FCN-32s 中恢复原图尺寸的方法简单粗暴, 用不需要学习的 32 倍插值方式上采样, 插值方式是 nearest, UpSampling2D 默认就是 nearest. 这个方式效果很不理想. 但是如果把插值方式改成 bilinear, 效果马上就好了很多了

model.add(keras.layers.UpSampling2D(size = (32, 32), interpolation = "bilinear",
                                    name = "upsamping_6"))

nearest 效果
nearest
bilinear 方式
bilinear
论文中提到了转置卷积, 转置卷积是需要学习参数的, 所以效果应该会比插值方式来得好, 马上验证一下, 把前面的 UpSampling2D 换成转置卷积

# model.add(keras.layers.UpSampling2D(size = (32, 32), interpolation = "bilinear",
#                                    name = "upsamping_6"))

model.add(keras.layers.Conv2DTranspose(filters = 512,
                                       kernel_size = (3, 3),
                                       strides = (32, 32),
                                       padding = "same",
                                       kernel_initializer = "he_uniform",
                                       name = "Conv2DTranspose_6"))

transposed convolution
可以看到, 效果更不咋的, 应该是参数不适合. 你试着改一下参数看看. 最后上采样我们还是用 bilinear 插值, 前面的浅层上采样的时候用转置卷积

二. FCN-16s

之所以 FCN-32s 的分割效果不佳, 主要是多次卷积丢失了原图的细节, 所以要把前面浅层的卷积信息利用起来, 就是所谓的 Skip Architecture(跳跃结构), 要用到 Skip Architechture 就要改一下网络的实现方式. 因为要用到中间层, 所以模型修改如下(函数式)

# 定义模型
project_name = "fcn_segment"

channels = 3
std_shape = (320, 320, channels) # 输入尺寸, std_shape[0]: img_rows, std_shape[1]: img_cols

img_input = keras.layers.Input(shape = std_shape, name = "input")

conv_1 = keras.layers.Conv2D(32, kernel_size = (3, 3), activation = "relu",
                             padding = "same", name = "conv_1")(img_input)
max_pool_1 = keras.layers.MaxPool2D(pool_size = (2, 2), strides = (2, 2),
                                    name = "max_pool_1")(conv_1)

conv_2 = keras.layers.Conv2D(64, kernel_size = (3, 3), activation = "relu",
                             padding = "same", name = "conv_2")(max_pool_1)
max_pool_2 = keras.layers.MaxPool2D(pool_size = (2, 2), strides = (2, 2),
                                    name = "max_pool_2")(conv_2)

conv_3 = keras.layers.Conv2D(128, kernel_size = (3, 3), activation = "relu",
                             padding = "same", name = "conv_3")(max_pool_2)
max_pool_3 = keras.layers.MaxPool2D(pool_size = (2, 2), strides = (2, 2),
                                    name = "max_pool_3")(conv_3)

conv_4 = keras.layers.Conv2D(256, kernel_size = (3, 3), activation = "relu",
                             padding = "same", name = "conv_4")(max_pool_3)
max_pool_4 = keras.layers.MaxPool2D(pool_size = (2, 2), strides = (2, 2),
                                    name = "max_pool_4")(conv_4)

conv_5 = keras.layers.Conv2D(512, kernel_size = (3, 3), activation = "relu",
                             padding = "same", name = "conv_5")(max_pool_4)
max_pool_5 = keras.layers.MaxPool2D(pool_size = (2, 2), strides = (2, 2),
                                    name = "max_pool_5")(conv_5)

用上面的方式, 我们可以很方便的使用中间过程的每一次输出, 接下来, 把 max_pool_5 利用 转置卷积 上采样 2 倍, 得到与 max_pool_4 相同的尺寸, 再把两个相加就是 16s 了

# max_pool_5 转置卷积上采样 2 倍至 max_pool_4 一样大
up6 = keras.layers.Conv2DTranspose(256, kernel_size = (3, 3),
                                   strides = (2, 2),
                                   padding = "same",
                                   kernel_initializer = "he_normal",
                                   name = "upsamping_6")(max_pool_5)
                
_16s = keras.layers.add([max_pool_4, up6])

# _16s 上采样 16 倍后与输入尺寸相同
up7 = keras.layers.UpSampling2D(size = (16, 16), interpolation = "bilinear",
                                name = "upsamping_7")(_16s)

# 这里 kernel 也是 3 * 3, 也可以同 FCN-32s 那样修改的
conv_7 = keras.layers.Conv2D(1, kernel_size = (3, 3), activation = "sigmoid",
                             padding = "same", name = "conv_7")(up7)

最后从输入到输出组合

model = keras.Model(img_input, conv_7, name = project_name)

model.summary()

这样, 16s 就完成了, 训练和预测和 Keras 实现 FCN 语义分割并训练自己的数据之 FCN-32s 一样, 最后预测效果明显好了很多
16s

三. FCN-8s

8s 和 16s 差异的地方是 8s 还要结合 max_pool_3, 代码如下

project_name = "fcn_segment"

channels = 3
std_shape = (320, 320, channels) # 输入尺寸, std_shape[0]: img_rows, std_shape[1]: img_cols

img_input = keras.layers.Input(shape = std_shape, name = "input")

conv_1 = keras.layers.Conv2D(32, kernel_size = (3, 3), activation = "relu",
                             padding = "same", name = "conv_1")(img_input)
max_pool_1 = keras.layers.MaxPool2D(pool_size = (2, 2), strides = (2, 2),
                                    name = "max_pool_1")(conv_1)

conv_2 = keras.layers.Conv2D(64, kernel_size = (3, 3), activation = "relu",
                             padding = "same", name = "conv_2")(max_pool_1)
max_pool_2 = keras.layers.MaxPool2D(pool_size = (2, 2), strides = (2, 2),
                                    name = "max_pool_2")(conv_2)

conv_3 = keras.layers.Conv2D(128, kernel_size = (3, 3), activation = "relu",
                             padding = "same", name = "conv_3")(max_pool_2)
max_pool_3 = keras.layers.MaxPool2D(pool_size = (2, 2), strides = (2, 2),
                                    name = "max_pool_3")(conv_3)

conv_4 = keras.layers.Conv2D(256, kernel_size = (3, 3), activation = "relu",
                             padding = "same", name = "conv_4")(max_pool_3)
max_pool_4 = keras.layers.MaxPool2D(pool_size = (2, 2), strides = (2, 2),
                                    name = "max_pool_4")(conv_4)

conv_5 = keras.layers.Conv2D(512, kernel_size = (3, 3), activation = "relu",
                             padding = "same", name = "conv_5")(max_pool_4)
max_pool_5 = keras.layers.MaxPool2D(pool_size = (2, 2), strides = (2, 2),
                                    name = "max_pool_5")(conv_5)

# max_pool_5 转置卷积上采样 2 倍和 max_pool_4 一样大
up6 = keras.layers.Conv2DTranspose(256, kernel_size = (3, 3),
                                   strides = (2, 2),
                                   padding = "same",
                                   kernel_initializer = "he_normal",
                                   name = "upsamping_6")(max_pool_5)
                
_16s = keras.layers.add([max_pool_4, up6])

# _16s 转置卷积上采样 2 倍和 max_pool_3 一样大
up_16s = keras.layers.Conv2DTranspose(128, kernel_size = (3, 3),
                                      strides = (2, 2),
                                      padding = "same",
                                      kernel_initializer = "he_normal",
                                      name = "Conv2DTranspose_16s")(_16s)
                                  
_8s = keras.layers.add([max_pool_3, up_16s])

# _8s 上采样 8 倍后与输入尺寸相同
up7 = keras.layers.UpSampling2D(size = (8, 8), interpolation = "bilinear",
                                name = "upsamping_7")(_8s)

# 这里 kernel 也是 3 * 3, 也可以同 FCN-32s 那样修改的
conv_7 = keras.layers.Conv2D(1, kernel_size = (3, 3), activation = "sigmoid",
                             padding = "same", name = "conv_7")(up7)

model = keras.Model(img_input, conv_7, name = project_name)

model.summary()

最后预测如下, 还是没有二值化, 其实二值化之后和 ground_truth 基本一致了
8s

四. 总结

FCN 语义分割就这样完成了, 网络结构不一定要和论文一模一样, 依据任务的特点来决定, 只要掌握了其原理(卷积学习目标特征再依据特征从小分辨率图像还原到原图像, Encoder-Decoder), 怎么变换都可以. 其他网络不也是从 FCN 变化而来的吗? 你可以试着改变网络结构来训练看一下有没有效果的提升

五. 代码下载

完整的代码可下载 Jupyter Notebook 代码示例

上一篇: Keras 实现 FCN 语义分割并训练自己的数据之 FCN-32s
下一篇: Keras 实现 FCN 语义分割并训练自己的数据之 多分类

好的,以下是使用PaddlePaddle实现FCN16s模型: ```python import paddle import paddle.nn as nn class FCN16s(nn.Layer): def __init__(self, num_classes=21): super(FCN16s, self).__init__() # 第一段 self.conv1_1 = nn.Conv2D(in_channels=3, out_channels=64, kernel_size=3, padding=100) self.relu1_1 = nn.ReLU() self.conv1_2 = nn.Conv2D(in_channels=64, out_channels=64, kernel_size=3, padding=1) self.relu1_2 = nn.ReLU() self.pool1 = nn.MaxPool2D(kernel_size=2, stride=2, ceil_mode=True) # 第二段 self.conv2_1 = nn.Conv2D(in_channels=64, out_channels=128, kernel_size=3, padding=1) self.relu2_1 = nn.ReLU() self.conv2_2 = nn.Conv2D(in_channels=128, out_channels=128, kernel_size=3, padding=1) self.relu2_2 = nn.ReLU() self.pool2 = nn.MaxPool2D(kernel_size=2, stride=2, ceil_mode=True) # 第三段 self.conv3_1 = nn.Conv2D(in_channels=128, out_channels=256, kernel_size=3, padding=1) self.relu3_1 = nn.ReLU() self.conv3_2 = nn.Conv2D(in_channels=256, out_channels=256, kernel_size=3, padding=1) self.relu3_2 = nn.ReLU() self.conv3_3 = nn.Conv2D(in_channels=256, out_channels=256, kernel_size=3, padding=1) self.relu3_3 = nn.ReLU() self.pool3 = nn.MaxPool2D(kernel_size=2, stride=2, ceil_mode=True) # 第四段 self.conv4_1 = nn.Conv2D(in_channels=256, out_channels=512, kernel_size=3, padding=1) self.relu4_1 = nn.ReLU() self.conv4_2 = nn.Conv2D(in_channels=512, out_channels=512, kernel_size=3, padding=1) self.relu4_2 = nn.ReLU() self.conv4_3 = nn.Conv2D(in_channels=512, out_channels=512, kernel_size=3, padding=1) self.relu4_3 = nn.ReLU() self.pool4 = nn.MaxPool2D(kernel_size=2, stride=2, ceil_mode=True) # 第五段 self.conv5_1 = nn.Conv2D(in_channels=512, out_channels=512, kernel_size=3, padding=1) self.relu5_1 = nn.ReLU() self.conv5_2 = nn.Conv2D(in_channels=512, out_channels=512, kernel_size=3, padding=1) self.relu5_2 = nn.ReLU() self.conv5_3 = nn.Conv2D(in_channels=512, out_channels=512, kernel_size=3, padding=1) self.relu5_3 = nn.ReLU() self.pool5 = nn.MaxPool2D(kernel_size=2, stride=2, ceil_mode=True) # FCN层 self.fc6 = nn.Conv2D(in_channels=512, out_channels=4096, kernel_size=7) self.relu6 = nn.ReLU() self.drop6 = nn.Dropout(p=0.5) self.fc7 = nn.Conv2D(in_channels=4096, out_channels=4096, kernel_size=1) self.relu7 = nn.ReLU() self.drop7 = nn.Dropout(p=0.5) self.score_fr = nn.Conv2D(in_channels=4096, out_channels=num_classes, kernel_size=1) self.upscore = nn.Conv2DTranspose(in_channels=num_classes, out_channels=num_classes, kernel_size=32, stride=16, bias_attr=False) def forward(self, x): # 第一段 x = self.conv1_1(x) x = self.relu1_1(x) x = self.conv1_2(x) x = self.relu1_2(x) x = self.pool1(x) # 第二段 x = self.conv2_1(x) x = self.relu2_1(x) x = self.conv2_2(x) x = self.relu2_2(x) x = self.pool2(x) # 第三段 x = self.conv3_1(x) x = self.relu3_1(x) x = self.conv3_2(x) x = self.relu3_2(x) x = self.conv3_3(x) x = self.relu3_3(x) x = self.pool3(x) pool3_out = x # 第四段 x = self.conv4_1(x) x = self.relu4_1(x) x = self.conv4_2(x) x = self.relu4_2(x) x = self.conv4_3(x) x = self.relu4_3(x) x = self.pool4(x) pool4_out = x # 第五段 x = self.conv5_1(x) x = self.relu5_1(x) x = self.conv5_2(x) x = self.relu5_2(x) x = self.conv5_3(x) x = self.relu5_3(x) x = self.pool5(x) # FCN层 x = self.fc6(x) x = self.relu6(x) x = self.drop6(x) x = self.fc7(x) x = self.relu7(x) x = self.drop7(x) x = self.score_fr(x) x = self.upscore(x, output_size=pool4_out.shape[-2:]) pool4_out = 0.01 * pool4_out x = paddle.add(x, pool4_out) x = self.upscore(x, output_size=pool3_out.shape[-2:]) pool3_out = 0.0001 * pool3_out x = paddle.add(x, pool3_out) x = self.upscore(x, output_size=x.shape[-2:]) return x ``` 这里使用了PaddlePaddle的`nn`模块实现FCN16s模型。在forward方法中,我们按照FCN16s网络结构的方式依次进行了前向计算。其中,我们使用了反卷积(`Conv2DTranspose`)对特征图进行上采样,并使用了跳跃连接(skip connection)将浅层特征和深层特征结合起来进行分割任务。
评论 11
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Mr-MegRob

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值