WDSR论文阅读笔记

最新推荐文章于 2021-12-13 20:27:36 发布

ssf-yasuo

最新推荐文章于 2021-12-13 20:27:36 发布

阅读量892

点赞数

分类专栏：论文阅读笔记文章标签： deeplearning WDSR

本文链接：https://blog.csdn.net/weixin_44326452/article/details/96888579

版权

论文阅读笔记专栏收录该内容

161 篇文章 35 订阅

订阅专栏

在ReLU前有更宽的channel有助于提高acc，在相同数量的参数下，使需要做res-identity传播的层的channel数量减少而增多ReLU层前的channel的数量。
要Weight-Normalization而不是batch-normalization，能提供更好的acc和更快的收敛，batch normalization 在SR中被抛弃了
linear-low-rank convolution（1*1卷积核）有助于提供更宽的channel（在相同数量的参数下且能提供更好的acc）
在论文中有以下有趣的conv方式，日后有时间可以仔细看看：
1. Flattened convolution ：Flattened convolutions [13] consist of consecutive sequence of onedimensional filters across all directions in 3D space (lateral, vertical and horizontal) to approximate 3 conventional convolutions.
2. Group convolution ：Group convolutions [38] divide features into groups channel-wisely and perform convolutions inside the group individually, followed by a concatenation to form the final output.
3. Depthwise separable convolution ：Depthwise separable convolution is a stack of depthwise convolution (i.e. a spatial convolution performed independently over each channel of an input) followed by a pointwise convolution (i.e. a 1x1 convolution) without non-linearities
4. Inverted residuals
WDSR-A：channel变宽的比例不要超过4，否则做res的层会太过slim，导致甚至小于最终的3通道输出，一般2-4是合适的
Assume the width of identity mapping pathway (Fig. 2) is w1 and width before activation inside residual block is w2.
We introduce expansion factor before activation as r thus w2 = r × w1. In the vanilla residual networks (e.g., used in EDSR and MDSR) we have w2 = w1 and the number of parameters are 2 × w1^2 × k^2 in each residual block. The computational (Mult-Add operations) complexity is a constant scaling of parameter numbers when we fix the input patch size. To have same complexity w1^2 = w1' × w2' = r × w1'^2, the residual identity mapping pathway need to be slimmed as a factor of pr and the activation can be expanded with r^0.5 times meanwhile.
WDSR-B：在1 * 1 conv 和 3*3 conv之间如果加ReLU会显著降低准确率，文中认为这也是个支持wide activation的现象
在SR中，几乎不会有过拟合现象，所以正则化是不需要的，而BN提供的正则化效果多余了；
而且SR需要在train和test的时候有相同的formulation，否则会降低acc，而BN在train和test过程中有不同的formulation
SR用的batch大小和patch大小都太小，所以BN并不适用
去除了Global residual pathway前的conv，去除了Global residual pathway后的conv，去除了pixel Shuffle后的conv，发现这些去除并不影响acc，而且能提高运算速度
最后总结，其实论文提出了三个关键点，主要是提高了efficiency，acc并没有太大提高：
1. wider activation
2. linear low-rank convolution
3. WB

ssf-yasuo

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
7
评论
WDSR论文阅读笔记

在ReLU前有更宽的channel有助于提高acc，在相同数量的参数下，使需要做res-identity传播的层的channel数量减少而增多ReLU层前的channel的数量。要Weight-Normalization而不是batch-normalization，能提供更好的acc和更快的收敛，batch normalization 在SR中被抛弃了 linear-low-rank c...
复制链接

扫一扫