IRN: Invertible Image Rescaling实验验证

最新推荐文章于 2024-08-09 08:30:23 发布

ytao_wang

最新推荐文章于 2024-08-09 08:30:23 发布

阅读量3.5k

点赞数 3

分类专栏：图像高分辨率图像恢复文章标签：深度学习图像处理

本文链接：https://blog.csdn.net/weixin_46773169/article/details/123944765

版权

图像高分辨率同时被 2 个专栏收录

13 篇文章

订阅专栏

图像恢复

12 篇文章

订阅专栏

IRN：Invertible Image Rescaling实验验证

paper，code

ECCV2020，可逆神经网络，将HR经小波变换分解为低频分量和高频分量作为网络的输入，网络生成潜在分布和多个LR图像，然后以此再逆变换重建HR图像，性能提升极大。

IRN在超分上性能提升极大，因此这里检验它的真实性。发现原文用真实图像HR作为输入得出了如表格中的结果，当仅使用LR输入，性能降低。

这篇论文主要是模拟图像在传输过程中，降尺度和升尺度的问题：高分辨率数字图像（HR图像）通常会按比例缩小（LR图像）以适应各种显示屏或者节省存储成本和带宽，同时终端设备采用后放大方法恢复原始的分辨率或放大图像中的细节。形成一个HR $\rightarrow$ LR $\rightarrow$ HR 可逆过程，因此这里，高分辨率图像HR一开始就是可以获得的。对于超分辨率任务，真实情况下只有一个LR图像的输入，HR图像是不可获得的。

但作者给出的超分结果，是将HR作为模型的输入，通过正向网络生成LR图像，然后将其输入到反向网络重建出最终的高分辨率结果（SR图像），这里的HR是必要的。当本人模拟真实情况下仅有LR输入时，测试的结果性能非常低。

作者测试SR的流程（使用HR输入）：

# 正向
input = img_HR  ---- shape：(1, 12, 2H, 2W)
img_LR = net(input, ver=False)[:, :3, :, :]   ---- shape：(1, 12, H, W) --> shape：(1, 3, H, W)

# 反向
# 潜变量z使用随机生成的张量，用于网络输入的padding
input = cat(img_LR, torch.randn(z_shape))   ---- shape: (1, 12, H, W) [z_shape: (1, 9, H, W)]
img_SR = net(input, rev=True)[:, :3, :, :]   ---- shape: (1, 12, 2H, 2W) --> shape: (1, 3, 2H, 2W)


# net(ver=False)正向：HaarDownSampling + InvBlockExp(number=8)
               				|
               				--> out_shape: (1, 12, H, W)
               
# net(ver=True) 反向：InvBlockExp(number=8) + HaarDownSampling
                                                     |
                                                     --> out_shape: (1, 3, 2H, 2W)

本人模拟实际SR的流程（只用LR输入）：

img_LR  ---- shape：(1, 3, H, W)
# 潜变量z使用随机生成的张量，用于网络输入的padding
input = cat(img_LR, torch.randn(z_shape))   ---- shape: (1, 12, H, W) [z_shape: (1, 9, H, W)]
img_SR = net(input, rev=True)[:, :3, :, :]   ---- shape: (1, 12, 2H, 2W) --> shape: (1, 3, 2H, 2W)


# net(ver=False)正向：无正向过程
               
# net(ver=True) 反向：InvBlockExp(number=8) + HaarDownSampling
                                                     |
                                                     --> out_shape: (1, 3, 2H, 2W)

结果如下(PSNR)：

	Set5	Set14	B100	Urban	Div2K(val)
input：HR	43.9994	40.7885	41.2929	39.9002	44.3248
input：LR	35.4581	31.0439	30.8835	28.4020	33.6007

计算量：

	input_shape	params	MAdd	Flops	Memory
input：HR	(3, 192, 192)	1,668,000	30.68G	15.38G	465.36M
input：LR	(3, 96, 96)	1,668,000	7.67G	3.84G	121.11M
RCAN	(3, 96, 96)	15,444,667	282.26G	141.44G	2.77G

MAdd：乘法累加运算

Flops：floating point operations，浮点运算数，即计算量，用来衡量算法/模型的复杂度。

作者的原话解释如下：

Image rescaling is a different task from super-resolution (see ‘Difference from SR’ in the paper). IRN downscales HR images and reconstruct them from the downscaled LR images, while the ultimate goal of super-resolution is to upscale arbitrary LR images. So in our test code, we only need HR images to verify the performance.

If we just use the architecture of IRN for paired training of bicubic-downscaled LR images and HR images (latent variable z as padding 0), which is the setting of many sr methods, the performance is not as good as them. Reasons include that our invertible architecture is not mainly designed for prior learning, and the parameters are fewer. The improvement of IRN comes from our invertible modeling for downscaling and upscaling.

具体见：Github issue#4

大意是：

图像缩放(image rescaling)和超分辨率(super-resolution)是不同的任务。IRN是下采样HR得到LR，然后从LR重建出HR；SR的目标是放大任意的LR图像。（按博主理解，）
如果仅仅使用IRN的架构对由bicubic下采样得到LR图像和HR图像进行配对训练(潜变量z作为0 padding)，这也是许多SR方法的设置，其性能不如它们。原因包括IRN不是主要为先验学习设计的，而且参数很少。IRN的提升来自于对降尺度和升尺度的可逆建模。