AlexNet,VGGNet等分类网络,是把一张图片经过卷积等预测成某一类,比如说判断某张图片是喵(> ω<)喵还是汪(> ω<)汪。语义分割的任务基本类似,按照语义信息给图像中的类别打上标签,将不同类别的目标区分开来,如下图,将人,树,道路等不同的类别赋予不同的标签。语义分割需要预测每个像素点所属的类别。
![img](https://i-blog.csdnimg.cn/blog_migrate/eb0a096527e1c398eca5de71e855bc96.png)
分类网络是把整张图片预测成某一类,FCN将分类网络进行改进,将全连接层替换成卷积层,训练端到端的语义分割网络。
一:FCN32
conv1至pool5采用VGG16的网络结构,使图片缩小了32倍,FCN将全连接层替换成卷积层4096x(h/32)x(w/32),class x (h/32)x(w/32),完成对每一个像素的分类,然后对卷积结果直接进行32倍上采样,恢复到原始尺寸,class x h x w。
def FCN32(nClasses,input_height,input_width):
img_input = Input(shape=(input_height,input_width,3))
model = vgg16.VGG16(include_top=False,weights='imagenet',input_tensor=img_input)
# vgg去除全连接层为:7x7x512
# vgg:5个block,1:filters:64,kernel:3;3-128;3-256;3-512
# 内存原因,卷积核4096时报错OOM,降低至1024
o = Conv2D(filters=1024,kernel_size=(7,7),padding='same',activation='relu',name='fc6')(model.output)
o = Dropout(0.5)(o)
o = Conv2D(filters=1024,kernel_size=(1,1),padding='same',activation='relu',name='fc7')(o)
o = Dropout(0.5)(o)
o = Conv2D(filters=nClasses,kernel_size=(1,1),padding='same',activation='relu',name='score_fr')(o)
o = Conv2DTranspose(filters=nClasses,kernel_size=(32,32),strides=(32,32),padding='valid',activation=None,name='score2')(o)
o = Reshape((-1,nClasses))(o)
o = Activation("softmax")(o)
fcn8 = Model(img_input,o)
return fcn8
分割结果:
二:FCN16
直接对卷积后的结果进行32倍上采样,得到的结果基本由大幅的像素块组成,对于图像的细节等无法很好的进行分割。FCN16将卷积后的结果进行反卷积和pool4的结果进行融合,再做16倍上采样得到原始尺寸。
借助已经训练好的FCN32的权重,减少训练次数。
def FCN16(nClasses,input_height,input_width):
img_input = Input(shape=(input_height,input_width,3))
# model = vgg16.VGG16(include_top=False,weights='imagenet',input_tensor=img_input)
# vgg去除全连接层为:7x7x512
# vgg:5个block,1:filters:64,kernel:3;3-128;3-256;3-512
model = FCN32(11, 320, 320)
model.load_weights("model.h5")
skip1 = Conv2DTranspose(512,kernel_size=(3,3),strides=(2,2),padding='same',kernel_initializer="he_normal",name="upsampling6")(model.get_layer("fc7").output)
summed = add(inputs=[skip1,model.get_layer("block4_pool").output])
up7 = UpSampling2D(size=(16,16),interpolation='bilinear',name='upsamping_7')(summed)
o = Conv2D(nClasses,kernel_size=(3,3),activation='relu',padding='same',name='conv_7')(up7)
o = Reshape((-1,nClasses))(o)
o = Activation("softmax")(o)
fcn16 = Model(model.input,o)
return fcn16
分割结果:
三:FCN8
FCN8类似地做法,将pool3的结果进行融合。
def FCN8(nClasses,input_height,input_width):
# model = vgg16.VGG16(include_top=False,weights='imagenet',input_tensor=img_input)
# vgg去除全连接层为:7x7x512
# vgg:5个block,1:filters:64,kernel:3;3-128;3-256;3-512
model = FCN32(11, 320, 320)
model.load_weights("model.h5")
skip1 = Conv2DTranspose(512,kernel_size=(3,3),strides=(2,2),padding='same',kernel_initializer="he_normal",name="up7")(model.get_layer("fc7").output)
# skip2 = Conv2DTranspose(256,kernel_size=(3,3),strides=(2,2),padding='same',kernel_initializer="he_normal",name="up4")(model.get_layer('block4_pool').output)
summed = add(inputs=[skip1,model.get_layer("block4_pool").output])
skip2 = Conv2DTranspose(256,kernel_size=(3,3),strides=(2,2),padding='same',kernel_initializer="he_normal",name='up4')(summed)
summed = add(inputs=[skip2,model.get_layer("block3_pool").output])
up7 = UpSampling2D(size=(8,8),interpolation='bilinear',name='upsamping_7')(summed)
o = Conv2D(nClasses,kernel_size=(3,3),activation='relu',padding='same',name='conv_7')(up7)
o = Reshape((-1,nClasses))(o)
o = Activation("softmax")(o)
fcn8 = Model(model.input,o)
return fcn8
分割结果:
【说明】:
电脑算力有点差,FCN32迭代次数为200,准确率82%左右;FCN16,8在FCN32模型基础上进行训练,迭代次数20,准确率82%左右。下图为FCN16,8训练准确率曲线。