deeplabv2

最新推荐文章于 2024-06-07 09:56:37 发布

换个名字就很好

最新推荐文章于 2024-06-07 09:56:37 发布

阅读量1.5k

点赞数

分类专栏：计算机视觉文章标签：计算机视觉

本文链接：https://blog.csdn.net/AliceH1226/article/details/122090060

版权

计算机视觉专栏收录该内容

20 篇文章 1 订阅

订阅专栏

前置知识

deeplabv1
spp

3个贡献

1，使用了空洞卷积
The repeated combination of max-pooling and striding at consecutive layers of these networks reduces significantly the spatial resolution of the resulting feature maps, typically by a factor of 32 across each direction in recent DCNNs.
A partial remedy is to use “deconvolutional” layers, which however requires additional memory and time.
Atrous convolution allows us to compute the responses of any layer at any desirable resolution without increasing the number of parameters or the amount of computation.
Besides, it also allows us to arbitrarily enlarge the field-of -view of filters at any DCNN layer, which offers an efficient mechanism to control the field-of-view and finds the best trade-off between accurate localization(small field-of-view) and context assimilation(large field-of-view).
（1）池化和带striding的卷积的组合重复地使用会大大地减少空间的分辨率。
（2）可以使用转置卷积来解决空间分辨率丢失的问题，但需要额外的内存和时间。
（3）空洞卷积在不需要额外的参数量和计算量的情况下可以获得任意你想要的分辨率。
（4）它也可以随意地扩大感受野，这样可以控制感受野的大小，可以在对需要准确位置的小感受野和需要大感受野的背景中做取舍。

2，设计并使用了ASPP
aspp 里的conv用的是atros conv(空洞卷积)。rate不同。kernel size相同。
spp里用的是pooling。kernel size不同。
关于输出，aspp的论文有讲"The features extracted for each sampling rate are further processed
in separate branches and fused to generate the final result.",所以输出是每个conv的结果的各自输出后做进一步处理和融合后的结果。
spp的输出是k*M维度的向量，k是最后conv层的filter数(num of output_channels)，M是bin数。
aspp
ASPP for VGG-16 employs several parallel fc6-fc7-fc8 branches.
They all use 3×3 kernels but different atrous rates r in the ‘fc6’ in order to capture objects of different size.
deeplab_aspp

3，DCNNs和FC CRF结合进行分割
deeplabv1里有介绍

补充

1,学习率使用了 poly learning rate策略，公式是（1-iter/max_iter)^power，power=0.9，表现比step learning rate 策略多出了1.17%。step learning rate 指的是reduce the learning rate at a fixed step size，learning rate 的下降是以固定的步伐下降的。
2，相比于v1的使用的是vgg，当前的v2使用的是resnet。