SRCNN阅读笔记

最新推荐文章于 2024-07-18 04:55:02 发布

ssf-yasuo

最新推荐文章于 2024-07-18 04:55:02 发布

阅读量412

点赞数

分类专栏：论文阅读笔记文章标签： deep Learning

本文链接：https://blog.csdn.net/weixin_44326452/article/details/96604212

版权

论文阅读笔记专栏收录该内容

174 篇文章

订阅专栏

原文链接如下：
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7115171&tag=1

首先是pre-processing，将低分辨率图像用插值法增加pixel数量到与高分辨率图像X一样多，设此时图像为Y，那么SRCNN的目的就是找到这样一个function F，使F(Y)与X越接近越好
1) 其次是 “Patch extraction and representation. this operation extracts (overlapping) patches from the low-resolution image Y and represents each patch as a highdimensional vector. These vectors comprise a set of feature maps, of which the number equals to the dimensionality of the vectors.” 这里是一层普通的卷积层加一层relu
2)中间是 “Non-linear mapping. this operation nonlinearly maps each high-dimensional vector onto another highdimensional vector. Each mapped vector is conceptually the representation of a high-resolution patch. These vectors comprise another set of feature maps.” 这里也是一层普通的卷积层加一层relu
3) 最后 “Reconstruction. this operation aggregates the above high-resolution patch-wise representations to generate the final high-resolution image. This image is expected to be similar to the ground truth X.” 呃，这里还是普通的卷积层，不过没用relu，原论文解释是这样，原先一般是直接取均值，那么可以视为一个特殊的filter，所以不如直接就当成一个卷积层让它自己去学习，如果取均值是对的，他也能学习成一个均值滤波器
原文图如下：
Loss 使用的是MSEloss，原文说这种loss更倾向于高PSNR
文中使用的optimizer是sgd，前两层学习率1e-4, 后两层1e-5, 原文说We empirically find that a smaller learning rate in the
last layer is important for the network to converge
为了避免边际效应，conv时没有padding，会越来越小
大数据集对SRCNN有帮助，但是帮助并不像其他高级cv技术那么明显
增加filter的数量可以提高性能，减少filter的数量可以提高速度，而且性能仍然比传统方法要好
增大filter的尺寸可以稍微提高一点点性能
增加层数并没有提高性能，相反还会降低
由于没有fc层和pooling层，受初始化会很大，而且会比较难train得起来
The results are shown in Table 5, where we have the following observations.
(i) If we directly train on the YCbCr channels, the results are even worse than that of bicubic interpolation.
(ii) If we pre-train on the Y or Cb, Cr channels, the performance finally improves, but is still not better than “Y only” on the color image
(iii) We observe that the Cb, Cr channels have higher PSNR values for “Y pre-train” than for “CbCr pretrain”
(iv) Training on the RGB channels achieves the best result on the color image.
(v) It is also worth noting that the improvement compared with the single-channel network is not that significant