[ACM MM 2024] GRFormer: 轻量级单张图像超分辨率分组残差自注意力

Phoenixtree_DongZhao

于 2024-08-15 21:14:27 发布

阅读量407

点赞数 10

分类专栏： MyDLNote-Network MyDLNote-Enhancement MyDLNote-Attention 文章标签：深度学习图像超分辨

本文链接：https://blog.csdn.net/u014546828/article/details/141231588

版权

MyDLNote-Enhancement 同时被 3 个专栏收录

43 篇文章 10 订阅

订阅专栏

MyDLNote-Attention

38 篇文章 6 订阅

订阅专栏

MyDLNote-Network

36 篇文章 2 订阅

订阅专栏

GRFormer: Grouped Residual Self-Attention for Lightweight Single Image Super-Resolution

GitHub - sisrformer/GRFormer: ACM MM 2024

https://arxiv.org/pdf/2408.07484

Abstract

Previous works have shown that reducing parameter overhead and computations for transformer-based single image super-resolution (SISR) models (e.g., SwinIR) usually leads to a reduction of performance. In this paper, we present GRFormer, an efficient and lightweight method, which not only reduces the parameter overhead and computations, but also greatly improves performance. The core of GRFormer is Grouped Residual Self-Attention (GRSA), which is specifically oriented towards two fundamental components. Firstly, it introduces a novel grouped residual layer (GRL) to replace the Query, Key, Value (QKV) linear layer in self-attention, aimed at efficiently reducing parameter overhead, computations, and performance loss at the same time. Secondly, it integrates a compact Exponential-Space Relative Position Bias (ES-RPB) as a substitute for the original relative position bias to improve the ability to represent position information while further minimizing the parameter count. Extensive experimental results demonstrate that GRFormer outperforms state-of-the-art transformer-based methods for ×2, ×3 and ×4 SISR tasks, notably outperforming SOTA by a maximum PSNR of 0.23dB when trained on the DIV2K dataset, while reducing the number of parameter and MACs by about 60% and 49% in only self-attention module respectively. We hope that our simple and effective method that can easily applied to SR models based on window-division self-attention can serve as a useful tool for further research in image super-resolution.

Introduction

单图像超分辨率（SISR）旨在通过从低分辨率图像重建高分辨率图像来提高图像分辨率。随着基于卷积神经网络（CNN）和Transformer的超分辨率（SR）模型的发展，SISR任务取得了一系列成就。无论是基于CNN还是Transformer的模型，通过增加网络层和特征维度来提升性能的同时，也伴随着参数和计算量的增加。一个直接的问题是：对于基于Transformer的SR模型，是否能在减少参数和计算量的同时提高性能？基于这一问题，本文深入研究了自注意力机制，并提出了三个关于自注意力的子问题：

RQ1：自注意力内部是否存在冗余？
RQ2：自注意力的表达能力能否进一步提升？
RQ3：是否有更好的替代方案来表示位置信息，以替代相对位置偏差（RPB）？

针对RQ1，本文分析了自注意力模块中不同参数数量、计算量（MACs）与性能之间的关系，发现SwinIR的自注意力机制在参数和计算效率上存在优化空间。因此，本文提出了一种新颖的Q、K、V线性层分组方案，旨在减少参数开销和计算复杂度。

针对RQ2，鉴于残差学习在提升网络性能方面的成功，本文探索了在自注意力的QKV线性层中直接引入残差连接的可能性，以增强其在深度神经网络中的表示能力。

针对RQ3，本文指出了传统相对位置偏差（RPB）的四个致命缺陷（Fig 2），并设计了一种指数空间相对位置偏差（ES-RPB）来替代RPB，以更高效地表示位置信息。

Figure 2: Comparison between RPB and ES-RPB. The subfigure (a) and (b) showcase the relative position bias (RPB) from the SwinIR model and an GRFomer model where RPB are replaced to ES-RPB, respectively. The subfigure at 𝑖 𝑡ℎ row and 𝑗 𝑡ℎ column corresponds to the relative position bias of𝑖 𝑡ℎ GRSAB Group and 𝑗 𝑡ℎ GRSAB in the network. These figures specifically highlight the horizontal evolution of the relative position bias values. The x-axis extends from 0 to 62, while the y-axis corresponds to the data taken at the 7th point on the x-axis from an RPB matrix of size 15×63.

综合以上三点，本文提出了分组残差自注意力（GRSA），并基于此设计了轻量级超分辨率网络GRFormer。

在五个常用数据集上的实验结果表明，GRFormer在几乎所有基准数据集上都实现了显著的性能提升。特别是在DIV2K数据集上训练的×2 SR任务中，GRFormer在具有挑战性的Urban100数据集上达到了33.17的PSNR分数，远高于最近的SwinIR-light（32.76）和当前最优的轻量级SR模型（32.94）。这一改进在×3和×4任务中也得到了一致观察。全面实验显示，GRFormer不仅优于先前的轻量级SISR模型，而且与具有相同超参数的SwinIR（1000k参数）相比，总模型架构中的参数数量减少了约20%。

Method

Overall Architecture：

Grouped Residual Self-Attention Block：

Grouped Residual Self-Attention：

Grouped Residual Linear：

自注意力机制中QKV线性层的分组方案

在生成Q（查询）、K（键）、V（值）以及进行Q和K的矩阵乘法时存在冗余。为了减少这种冗余，本文采用了分组的思想。给定输入X，在通道维度上将X分成两个相等的部分，然后使用两个独立的线性层分别获取Q、K、V。Q、K、V的分组方案并不会显著减少像素特征之间的交互，因为在自注意力机制中，Q和K的矩阵乘法会在一定程度上弥补这些不足。

Q、K、V线性层的残差连接

残差连接允许在残差空间中进行训练，这使得网络能够在残差空间中找到最优解。因此，将残差连接加入到QKV线性层中，以将QKV线性层的训练空间从线性空间转换到残差空间，从而增强QKV线性层的特征学习能力。

Exponential-Space Relative Position Bias

Explanation of the effectiveness of GRL

Explanation of the effectiveness of ES-RPB

上述三节主要讲了：指数空间相对位置偏差；GRL有效性的解释；ES-RPB有效性的解释。细节请看原文，这里不详细描述了，用的时候再细研究吧。

Phoenixtree_DongZhao

关注

10
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
[ACM MM 2024] GRFormer: 轻量级单张图像超分辨率分组残差自注意力

GitHub - sisrformer/GRFormer: ACM MM 2024https://arxiv.org/pdf/2408.07484Previous works have shown that reducing parameter overhead and computations for transformer-based single image super-resolution (SISR) models (e.g., SwinIR) usually leads to a reducti
复制链接

扫一扫

专栏目录