EnhanceNet的简要笔记

最新推荐文章于 2024-06-06 09:54:02 发布

_JoeZoe

最新推荐文章于 2024-06-06 09:54:02 发布

阅读量1.1k

点赞数

分类专栏：超分辨率重建深度学习与计算机视觉神经网络文章标签：超分辨率重建

本文链接：https://blog.csdn.net/qq_25196865/article/details/78777220

版权

本文是关于EnhanceNet的详细笔记，重点探讨了如何通过自动纹理合成实现单图像超分辨率重建。针对传统方法中存在的像素级重建度量与视觉感知不符的问题，EnhanceNet在损失函数上进行了创新，引入了感知相似度和纹理匹配，并使用残差块加速网络收敛。训练和损失函数部分，包括MSE、感知相似度、纹理匹配等多个方面，以创建逼真的纹理效果。

摘要由CSDN通过智能技术生成

论文名称： EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis
归类： ICCV2017，作者： Sajjadi, Mehdi~S.~M. 等

一、难点（看论文时遇到的问题）：

E/P/T/A 是怎样进行结合的? 相加还是如何?
（E: MSE, P: Perceptual similarity, T: Texture matching, A: Adversarial Training）
T/A的具体执行过程?

二、问题：

传统的方法基于pixel-wise reconstruction measures, 如PSNR, 该衡量方法生成的图像与我们视觉感知不符。
（即，即便在衡量标准下得到的“分数”很高，认为生成的图像很好，但我们看上去却有over-smoothed的感觉，丢失了一定的high-frequency信息）

三、改进方案：

在损失函数上做文章，以 creating realistic texture。（如题目所言，Through Automated Texture Synthesis
在performance evaluation上，用Object recongnition performance来替代传统的PSNR, SSIM等标准。

四、具体方法：

4.1 Network Arthitecture

– 这里写图片描述

作者对该网络结构的几个地方做了特别说明：

(1). 网络的主体部分使用了residual blocks。原因是，相比于stacked convolution layers, 其收敛速度更快。

Reference: 残差的提出【2】，残差首次用于SR【3】

(2). 作者探讨了为什么会选择nearest neighbor upsampling.

A. Bicubic interpolation introduces redundancies to the input image and leads to higher computational cost.

B. Convolution transpose layers (which unsample the feature activations inside the network) would produce checkerboard artifacts in the output. (棋盘格效应)，需要通过额外的regularization term来修正。增加了计算量。

C. 可以用NN upsampling + Conv 来替代Transposed convolutional layers. 在某些特定的模型下依然会产生棋盘格效应，但在大多数complex models里面都不需要额外添加正则化项。

Reference:Bicubic interpolation的使用【4】，Convolution transpose layers的使用【5】 Nearest neighbor upsampling【6】

(3). 输入的是低分辨率的图，输出的是残差图像。作用： It does not need to learn the identity functioin forILR.

4.2 Training and loss functions: （重点部分）

Pixel-wise loss in the image-space    传统的基于MSE的方法

Perceptual loss in feature space    把最后生成的图像映射到某一特征空间，再做MSE

Texture matching loss             映射到某一特徵空间还不够，再进行精细的纹理匹配，

Adversarial training         在特定的Descriminative model下，使得生成的图像无法被识别为是生成的

(1)：传统的基于MSE的loss function: