WDSR论文阅读笔记

  1. 在ReLU前有更宽的channel有助于提高acc,在相同数量的参数下,使需要做res-identity传播的层的channel数量减少而增多ReLU层前的channel的数量。
  2. 要Weight-Normalization而不是batch-normalization,能提供更好的acc和更快的收敛,batch normalization 在SR中被抛弃了
  3. linear-low-rank convolution(1*1卷积核)有助于提供更宽的channel(在相同数量的参数下且能提供更好的acc)
  4. 在论文中有以下有趣的conv方式,日后有时间可以仔细看看:
    1. Flattened convolution :Flattened convolutions [13] consist of consecutive sequence of onedimensional filters across all directions in 3D space (lateral, vertical and horizontal) to approximate 3 conventional convolutions.
    2. Group convolution :Group convolutions [38] divide features into groups channel-wisely and perform convolutions inside the group individually, followed by a concatenation to form the final output.
    3. Depthwise separable convolution :Depthwise separable convolution is a stack of depthwise convolution (i.e. a spatial convolution performed independently over each channel of an input) followed by a pointwise convolution (i.e. a 1x1 convolution) without non-linearities
    4. Inverted residuals
  5. WDSR-A:channel变宽的比例不要超过4,否则做res的层会太过slim,导致甚至小于最终的3通道输出,一般2-4是合适的
    Assume the width of identity mapping pathway (Fig. 2) is w1 and width before activation inside residual block is w2.
    We introduce expansion factor before activation as r thus w2 = r × w1. In the vanilla residual networks (e.g., used in EDSR and MDSR) we have w2 = w1 and the number of parameters are 2 × w1^2 × k^2 in each residual block. The computational (Mult-Add operations) complexity is a constant scaling of parameter numbers when we fix the input patch size. To have same complexity w1^2  = w1' × w2' = r × w1'^2, the residual identity mapping pathway need to be slimmed as a factor of pr and the activation can be expanded with r^0.5 times meanwhile.
     
  6. WDSR-B:在1 * 1 conv 和 3*3 conv之间如果加ReLU会显著降低准确率,文中认为这也是个支持wide activation的现象
  7. 在SR中,几乎不会有过拟合现象,所以正则化是不需要的,而BN提供的正则化效果多余了;
  8. 而且SR需要在train和test的时候有相同的formulation,否则会降低acc,而BN在train和test过程中有不同的formulation
  9. SR用的batch大小和patch大小都太小,所以BN并不适用

  10. 去除了Global residual pathway前的conv,去除了Global residual pathway后的conv, 去除了pixel Shuffle后的conv,发现这些去除并不影响acc,而且能提高运算速度
  11. 最后总结,其实论文提出了三个关键点,主要是提高了efficiency,acc并没有太大提高:
    1. wider activation
    2. linear low-rank convolution
    3. WB
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 7
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值