Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation文献复现

小夭。

已于 2024-06-05 16:32:38 修改

阅读量175

点赞数

分类专栏： pytorch 文献代码复现深度学习文章标签：图像处理

于 2023-07-19 01:42:01 首次发布

本文链接：https://blog.csdn.net/m0_47146037/article/details/131799550

版权

本文档详细记录了Asymmetric Gained Deep Image Compression模型的代码学习与实验复现过程，包括模型框架、增益模块、前向传播、离散码率可变的实现以及连续码率可变的方法。使用OpenImages数据集的子集进行训练，并在多个数据集规模下进行实验，探讨了数据集大小对模型性能的影响。在评估模型部分，展示了不同迭代次数的实验结果，并讨论了在数据集选择中遇到的问题，如图片尺寸不足。

摘要由CSDN通过智能技术生成

前言

相关论文阅读自行解决，这里主要是记录代码的学习与实验的复现
github地址
此代码非官方部署代码，而是私人实现的。
本博客仅做学习记录。

1 代码学习

1.1 主要框架部分

在这里插入图片描述

这里的主编解码器与高斯建模的方式，采用的是同joint上下联合自回归一样的方式，主要的改动在增益模块的部分

1.2 增益模块

增益模块的部分主要对应框架中的gain部分，大概的思想是取一组不同的lambda，lambda的参考值在论文中如下

在这里插入图片描述
我们通过取不同的lambda，对应不同的ms，单个ms为对不同通道组成的缩放因子矩阵，通道数为192则单个ms对应为不同的192的通道对应的缩放因子，对于不同的lambda有不同的ms，所以这里ms的维数为[len(lambda), channel]
在这里插入图片描述
对主编解码器、超先验编解码器都可以设置对应维度的由ms张量组成的增益模块，然后在训练中获得最优参数。

1.3 前向传播过程

在这里插入图片描述
前向传播过程其实和框架看到的一致，和joint联合自回归也是相对一致的，主要区别点就在加入了对于增益模块、反增益模块的学习。

1.4 率失真损失（对于离散可变的实现）

这里关于离散码率可变，我的理解是主要通过损失函数的修改使其实现码率的离散可变
在这里插入图片描述
我们的lambda取的是一组数据，针对于单个lambda，我们对应不同的码率点，我们对其求和，求最小值得到的损失，那么我们最后可以得到一组lambda对应的一组码率点，即实现了码率的离散可变。

这一块关于代码的部分和论文中个人觉得有冲突，这里的lambda仍然是一组数据，关于lambda*D的部分和论文中是一致的，求出了一组lambda与失真的乘积和，但是这里的码率取的仍然是单独一个码率点，而没有求和，这里感觉存在问题。

1.5 连续码率可变

连续码率可变部分其实不在训练当中，而是在压缩当中实现的，离散的几个码率点在训练中可以得到，那么有了离散的码率点后，我们引入一个变量l，其范围在（0，1）

通过上述变换，可以得到两个离散码率点中间的码率点，进而实现连续可调。
在这里插入图片描述

2 数据集

I use a part of the OpenImages Dataset to train the models (train06, train07, train08, about 54w images). You can download from hereDownload OpenImages.
Maybe train08 (14w images) is enough.
原作者的这个链接好像并不能免费下载，于是我们采用自己的数据集,imagenet测试集10w张图片
imagenet数据集下载
在这里插入图片描述
但是这个数据集后面遇到了问题，图片大小不够，如下面的遇到的问题，所以我们换成了imageNet的训练集120w，排序取图片长宽大于256前10w的训练集+后面1w的图片作为测试集

3 环境配置

相关安装的包如下

alabaster==0.7.12
appdirs==1.4.4
appnope==0.1.2
argon2-cffi==20.1.0
astroid==2.4.2
async-generator==1.10
attrs==20.3.0
Babel==2.9.0
backcall==0.2.0
black==20.8b1
bleach==3.3.0
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
click==7.1.2
coverage==5.4
cycler==0.10.0
decorator==4.4.2
defusedxml==0.6.0
docutils==0.16
entrypoints==0.3
idna==2.10
imagesize==1.2.0
iniconfig==1.1.1
ipykernel==5.4.3
ipython==7.20.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
isort==5.7.0
jedi==0.18.0
Jinja2==2.11.3
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.11
jupyter-console==6.2.0
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
kiwisolver==1.3.1
lazy-object-proxy==1.4.3
MarkupSafe==1.1.1
matplotlib==3.3.4
mccabe==0.6.1
mistune==0.8.4
mypy-extensions==0.4.3
nbclient==0.5.2
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.5.1
notebook==6.2.0
numpy==1.20.1
packaging==20.9
pandocfilters==1.4.3
parso==0.8.1
pathspec==0.8.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.1.0
pluggy==0.13.1
prometheus-client==0.9.0
prompt-toolkit==3.0.16
ptyprocess==0.7.0
py==1.10.0
pycparser==2.20
Pygments==2.7.4
pylint==2.6.0
pyparsing==2.4.7
pyrsistent==0.17.3
pytest==6.2.2
pytest-cov==2.11.1
python-dateutil==2.8.1
pytz==2021.1
pyzmq==22.0.2
qtconsole==5.0.2
QtPy==1.9.0
regex==2020.11.13
requests==2.25.1
scipy==1.6.0
Send2Trash==1.5.0
six==1.15.0
snowballstemmer==2.1.0
Sphinx==3.4.3
sphinx-rtd-theme==0.5.1
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
terminado==0.9.2
testpath==0.4.4
toml==0.10.2
torch==1.7.1
torchvision==0.8.2
tornado==6.1
traitlets==5.0.5
typed-ast==1.4.2
typing-extensions==3.7.4.3
urllib3==1.26.3
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.5.1
wrapt==1.12.1

其实主要仍然是基于compressai运行的，那么则需要在compressai中补充相应的增益模块的代码
在这里插入图片描述

这两块复制github上相应的代码

4、评估模型

4.1 评估代码学习

核心代码如下
在这里插入图片描述

def evalGain(model, dataset_path, logfile):
    '''
        Eval for continuous variable rate model. Channel Gain Module support vbr capability.
    '''
    device = next(model.parameters()).device
    dataset_path = Path(dataset_path)
    if not dataset_path.is_dir():
        raise RuntimeError(f'Invalid directory "{
     dataset_path}"')

    l_step = 0.1
    for s in range(0, model.levels - 1):
        for l in np.arange(0.0, 1.0 + l_step, l_step):
            if l == 1.0 and s != model.levels - 2:
                continue
            print(f"--------------------Testing s:{
     s} l:{
     l:.2f}--------------------------")
            PSNR, MSSSIM, BPP, Esti_Bpp = [], [], [], []
            for img_path in dataset_path.iterdir():
                image = Image.open(img_path)
                image = transforms.ToTensor()(image).to(device)

                out = inference_gain(model, image, s=s, l=l)

                PSNR.append(out["psnr"])
                MSSSIM.append(out["ms-ssim"])
                BPP.append(out["bpp"])
                Esti_Bpp.append(out["estimate_bpp"])

            PSNR = np.array(PSNR)
            MSSSIM = np.array(MSSSIM)
            BPP = np.array(BPP)
            Esti_Bpp = np.array(Esti_Bpp)
            logfile.write(f's+l = {
     s+l:.2f}  ')
            logfile.write(f'PSNR_AVE: {
     PSNR.mean():.3f}  MS-SSIM_AVE: {
     MSSSIM.mean():.3f}  BPP_AVE: {
     BPP.mean():.3f} Est_BPP_AVE: {
     Esti_Bpp.mean():.3f}\n')

评估流程

确定在哪个设备上运行代码
判断验证数据集地址的有效性
s是对应不同lambda的模型，他是离散的，范围在0到model.levels - 1
l是控制连续码率变化的参数，范围是0到1直接变化，变化步长是l_step，目前l_step=0.1（可以自行调节）
特殊情况考虑:if l == 1.0 and s != model.levels - 2，则跳出循环
打开每一张图片，通过模型的压缩与重建方法，对比原图与重建图的差异进行性能的评估

4.2 评估命令实现

训练结束会有这样的模型
在这里插入图片描述
评估命令

python3 eval_gain.py -d /path/to/your/image/dataset/ --checkpoint /path/to/your/model.pth --logpath /path/to/save/result/log/ --cuda --mode (gain/scgain)

我的评估命令

python3 ./train/eval_gain.py -d /home/ll/datasets/kodak --checkpoint /home/ll/code/VariableRate/gainVAE/train_models/gainVAEcheckpoint_gainmshp_epoch198.pth --logpath /home/ll/code/VariableRate/gainVAE/eval_result/eval.out --cuda --mode gain > eval.out

在这里插入图片描述

5、实验结果（10wand1w的数据集）

5.1 gain_01(10wand1w的数据集198次迭代)

gain_01
Eval Time: 2023-08-30 02:12:51
Eval model:/home/ll/code/VariableRate/gainVAE/train_models/gainVAEcheckpoint_gainmshp_epoch198.pth
s+l = 0.00  PSNR_AVE: 35.226  MS-SSIM_AVE: 0.985  BPP_AVE: 0.881 Est_BPP_AVE: 0.878
s+l = 0.10  PSNR_AVE: 35.153  MS-SSIM_AVE: 0.985  BPP_AVE: 0.866 Est_BPP_AVE: 0.878
s+l = 0.20  PSNR_AVE: 35.068  MS-SSIM_AVE: 0.985  BPP_AVE: 0.852 Est_BPP_AVE: 0.878
s+l = 0.30  PSNR_AVE: 34.980  MS-SSIM_AVE: 0.984  BPP_AVE: 0.837 Est_BPP_AVE: 0.878
s+l = 0.40  PSNR_AVE: 34.895  MS-SSIM_AVE: 0.984  BPP_AVE: 0.823 Est_BPP_AVE: 0.878
s+l = 0.50  PSNR_AVE: 34.814  MS-SSIM_AVE: 0.984  BPP_AVE: 0.808 Est_BPP_AVE: 0.878
s+l = 0.60  PSNR_AVE: 34.738  MS-SSIM_AVE: 0.983  BPP_AVE: 0.794 Est_BPP_AVE: 0.878
s+l = 0.70  PSNR_AVE: 34.657  MS-SSIM_AVE: 0.983  BPP_AVE: 0.778 Est_BPP_AVE: 0.878
s+l = 0.80  PSNR_AVE: 34.573  MS-SSIM_AVE: 0.983  BPP_AVE: 0.762 Est_BPP_AVE: 0.878
s+l = 0.90  PSNR_AVE: 34.479  MS-SSIM_AVE: 0.982  BPP_AVE: 0.744 Est_BPP_AVE: 0.878
s+l = 1.00  PSNR_AVE: 34.376  MS-SSIM_AVE: 0.982  BPP_AVE: 0.727 Est_BPP_AVE: 0.726
s+l = 1.10  PSNR_AVE: 34.316  MS-SSIM_AVE: 0.981  BPP_AVE: 0.718 Est_BPP_AVE: 0.726
s+l