本周学习了百度飞桨PaddlePaddle图像分割7日打卡营,课程链接,本文对笔记加以整理和记录。
1. 图像分割综述
1.1 图像分割的类型
- 语义分割:给每个pixel分类
- 实例分割:给每个框里的object分mask
- 全景分割:背景pixel分类,框里mask
1.2 语义分割的根本目的:像素级分类
1.3 语义分割算法的基本流程
- 输入:图像(RGB)
- 输出:与输入结果一致的单通道图
- 训练过程:
- 输入:
image+label
- 前向:
out=model(image)
- 计算loss:
loss_func(out,label)
- 反向:
loss.backward()
- 更新权重:
optimizer.minimize(loss)
- 输入:
1.4 语义分割评价指标
- mIOU:分割每一个类别的交并比
- mACC
1.5 Paddle动态图构建dataloader
通过构造一个dataloader,作为相应的训练数据和标签作为模型的输入
2. FCN全卷积网络
将全连接层更改为 1 × 1 1\times 1 1×1卷积,只改变通道数大小
2.1 上采样的三种方法:
-
upsampling:双线性插值(bilinear interpolation)
import paddle.fluid as fluid import numpy as np from paddle.fluid.dygraph import to_variable x = np.array([[1,2],[3,4]]) x = x.astype(np.float32) # n,c,h,w x = x[np.newaxis,np.newaxis,:,:] with fluid.dygraph.guard(): x = to_variable(x) y = fluid.layers.interpolate(x,scale,align_corners=True) print(y.numpy())
-
transpose conv
-
import paddle.fluid as fluid import numpy as np from paddle.fluid.dygraph import to_variable # 用于指定卷积时的参数 from paddle.fluid import ParaAttr x = np.array([[1,2],[3,4]]) x = x.astype(np.float32) # n,c,h,w x = x[np.newaxis,np.newaxis,:,:] with fluid.dygraph.guard(fluid.CPUPlace()): x = to_variable(x) param_attr = ParamAttr(name="param",initializer= fluid.initializer.constant(1.0)) conv2dTranspose = fluid.dygraph.Conv2DTranspose(num_channels=1,num_filters=1,filter_size=3,param_attr=param_attr) y = conv2dTranspose(x) print(y.numpy()) print(conv2DTranspose.weight.numpy())
- up-pooling
2.2 FCN网络结构
整体使用了encoder-decoder结构
优点:
1. 可以接受任意尺寸的输入
2. 结合浅层信息
缺点:没有上下文信息
2.3 paddle实现FCN网络
import numpy as np
import paddle.fluid as fluid
from paddle.fluid.dygraph import Conv2D, Conv2DTranspose, Dropout, Pool2D, to_variable
from vgg import VGG16BN #导入VGG16作为backbone
class FCN8s(fluid.dygraph.Layer):
# TODO: create fcn8s model
def __init__(self, num_classes=59):
super(FCN8s, self).__init__()
backbone = VGG16BN(pretrained=False)
self.layer1 = backbone.layer1
self.layer1[0].conv._padding = [100, 100]
self.pool1 = Pool2D(pool_size=2, pool_stride=2, ceil_mode=True)
self.layer2 = backbone.layer2
self.pool2 = Pool2D(pool_size=2, pool_stride=2, ceil_mode=True)
self.layer3 = backbone.layer3
self.pool3 = Pool2D(pool_size=2, pool_stride=2, ceil_mode=True)
self.layer4 = backbone.layer4
self.pool4 = Pool2D(pool_size=2, pool_stride=2, ceil_mode=True)
self.layer5 = backbone.layer5
self.pool5 = Pool2D(pool_size=2, pool_stride=2, ceil_mode=True)
self.fc6 = Conv2D(num_channels=512, num_filters=4096, filter_size=7, act='relu')
self.fc7 = Conv2D(num_channels=4096, num_filters=4096, filter_size=1, act='relu')
self.drop = Dropout()
self.score = Conv2D(num_channels=4096, num_filters=num_classes, filter_size=1)
self.score_pool3 = Conv2D(num_channels=256, num_filters