深度学习学习记录-1【端到端压缩/compressai/自编码器】

mastermind13

已于 2024-08-05 16:33:56 修改

阅读量929

点赞数 22

文章标签：学习

于 2024-08-05 16:19:42 首次发布

本文链接：https://blog.csdn.net/m0_52392211/article/details/140656994

版权

1.compressai库安装

1.1安装步骤

1.2报错与解决：

1.2.1 No module named 'compressai._CXX'

1.2.2 关于C++的报错，一般建议升级项目所在环境中的gcc/g++版本（linux系统）

1.compressai库安装

1.1安装步骤

建议项目安装至虚拟环境

git clone https://github.com/InterDigitalInc/CompressAI compressai
cd compressai
pip install -U pip && pip install -e .

1.2报错与解决：

1.2.1 No module named 'compressai._CXX'

报错来源：运行train.py时，在from compressai_.CXX处，compressai文件下的两个pyd文件无法被正常调用（在本地pycharm上模型可以正常训练，代码上传至服务器后不可以）： _CXX.cp39-win_amd64.pyd和ans.cp39-win_amd64.pyd。

解决思路：在服务器中重新安装一遍compressai库，模型可用正常训练，不再报错

1.2.2 关于C++的报错，一般建议升级项目所在环境中的gcc/g++版本（linux系统）

解决思路：不同环境根据需求安装不同版本GCC，需要保证各版本共存

1.安装gcc/g++

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt update
sudo apt install gcc-9 g++-9

在运行第一行时出现如下报错，可用直接运行第三行：

Cannot add PPA: 'ppa:~jonathonf/ubuntu/gcc-9.4'.
The user named '~jonathonf' has no PPA named 'ubuntu/gcc-9.4'

检测apt源是否有想要安装的包的版本：

sudo apt-cache search gcc # gcc可替换为其他包名，同样是有效的查询
sudo apt-cache show gcc #展示版本号

2.检查是否安装成功

运行：gcc --version 和 g++ --version，出现以下即安装新版本成功

如果出现的版本仍为原来的版本，采用3.中的方法（设置版本优先级）

3.设置默认版本

查看已安装版本：dpkg -l | grep gcc

使用update-alternatives管理系统多版本,将安装好的新版本设置为默认版本（通过优先级设置）。其中主命令--install 将/usr/bin/gcc gcc更新至/usr/bin/gcc-4.7版本，20是优先级别（数字越大，优先级别越高），--slave将随主命令的更新而更新，保证gcc和g++编译器版本一致。

#安装软链接
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 20 --slave /usr/bin/g++ g++ /usr/bin/g++-9
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 10 --slave /usr/bin/g++ g++ /usr/bin/g++-7
#删除软链接
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 --slave /usr/bin/g++ g++ /usr/bin/g++-9

查看可选择的gcc版本：sudo update-alternatives --config gcc

前面带*即现在的版本，可以输入selection列对应数字选择想要的版本

2.模型训练(examples)

examples中包含三类对象（image、pointcloud、video）的模型训练代码，本项目只关注image。compressai库中包含多个压缩模型，各模型都是在base class:CompressionModel的基础上添加压缩/重建模块构成自编码器。以bmshj2018-factorized为例：

注释给出了模型框架：输入图像数据x，通过编码器g_a（将输入映射到一个低维的潜在空间latent space，形成编码code或潜在表示latent representation）得到输出y，经过量化Q、熵瓶颈EB得到y_hat，最后输入解码器g_s得到重建数据x_hat。

class FactorizedPrior(CompressionModel):
    r"""Factorized Prior model from J. Balle, D. Minnen, S. Singh, S.J. Hwang,
    N. Johnston: `"Variational Image Compression with a Scale Hyperprior"
    <https://arxiv.org/abs/1802.01436>`_, Int Conf. on Learning Representations
    (ICLR), 2018.

    .. code-block:: none

                  ┌───┐    y
            x ──►─┤g_a├──►─┐
                  └───┘    │
                           ▼
                         ┌─┴─┐
                         │ Q │
                         └─┬─┘
                           │
                     y_hat ▼
                           │
                           ·
                        EB :
                           ·
                           │
                     y_hat ▼
                           │
                  ┌───┐    │
        x_hat ──◄─┤g_s├────┘
                  └───┘

        EB = Entropy bottleneck

    Args:
        N (int): Number of channels
        M (int): Number of channels in the expansion layers (last layer of the
            encoder and last layer of the hyperprior decoder)
    """

    def __init__(self, N, M, **kwargs):
        super().__init__(**kwargs)

        self.entropy_bottleneck = EntropyBottleneck(M)
        #编码
        self.g_a = nn.Sequential(
            conv(3, N),
            GDN(N),
            conv(N, N),
            GDN(N),
            conv(N, N),
            GDN(N),
            conv(N, M),
        )
        #解码
        self.g_s = nn.Sequential(
            deconv(M, N),
            GDN(N, inverse=True),
            deconv(N, N),
            GDN(N, inverse=True),
            deconv(N, N),
            GDN(N, inverse=True),
            deconv(N, 3),
        )

        self.N = N
        self.M = M
    #下采样因子，每经过一个卷积层，图像像素减小一半
    @property
    def downsampling_factor(self) -> int:
        return 2**4
    #前向传播
    def forward(self, x):
        y = self.g_a(x)
        y_hat, y_likelihoods = self.entropy_bottleneck(y)
        x_hat = self.g_s(y_hat)

        return {
            "x_hat": x_hat,
            "likelihoods": {
                "y": y_likelihoods,
            },
        }
    #从预训练中得到通道数
    @classmethod
    def from_state_dict(cls, state_dict):
        """Return a new model instance from `state_dict`."""
        N = state_dict["g_a.0.weight"].size(0)
        M = state_dict["g_a.6.weight"].size(0)
        net = cls(N, M)
        net.load_state_dict(state_dict)
        return net
    #压缩
    def compress(self, x):
        y = self.g_a(x)
        y_strings = self.entropy_bottleneck.compress(y)
        return {"strings": [y_strings], "shape": y.size()[-2:]}
    #重建
    def decompress(self, strings, shape):
        assert isinstance(strings, list) and len(strings) == 1
        y_hat = self.entropy_bottleneck.decompress(strings[0], shape)
        x_hat = self.g_s(y_hat).clamp_(0, 1)
        return {"x_hat": x_hat}

2.1模型训练：

只进行forward

python train.py -m model -d dataset/path --cuda --save

可以采用compressai.zoo中自带的预训练模型，main函数中具体操作为：

#args.model是采用的模型，调用img_model连接到zoo而不是models
#quality可选1-8，metric是评价指标(mse/ms-ssim)，pretrained是否调用预训练
#quality的选择决定了输入输出通道数/预训练质量
net = image_models[args.model](quality, metric="mse", pretrained=False, progress=True)

2.2模型评估（参考）：

import torch.nn.functional as F
from torchvision import transforms
import pandas as pd
import torch
import os
import sys
import math
import argparse
import time
from pytorch_msssim import ms_ssim
from PIL import Image
from net import FactorizedPrior

print(torch.cuda.is_available())


def compute_psnr(a, b):
    mse = torch.mean((a - b) ** 2).item()
    return -10 * math.log10(mse)


def compute_msssim(a, b):
    return -10 * math.log10(1 - ms_ssim(a, b, data_range=1.).item())


def compute_bpp(out_net):
    size = out_net['x_hat'].size()
    num_pixels = size[0] * size[2] * size[3]
    return sum(torch.log(likelihoods).sum() / (-math.log(2) * num_pixels)
               for likelihoods in out_net['likelihoods'].values()).item()


def pad(x, p):
    h, w = x.size(2), x.size(3)
    new_h = (h + p - 1) // p * p
    new_w = (w + p - 1) // p * p
    padding_left = (new_w - w) // 2
    padding_right = new_w - w - padding_left
    padding_top = (new_h - h) // 2
    padding_bottom = new_h - h - padding_top
    x_padded = F.pad(
        x,
        (padding_left, padding_right, padding_top, padding_bottom),
        mode="constant",
        value=0,
    )
    return x_padded, (padding_left, padding_right, padding_top, padding_bottom)


def crop(x, padding):
    return F.pad(
        x,
        (-padding[0], -padding[1], -padding[2], -padding[3]),
    )


def parse_args(argv):
    parser = argparse.ArgumentParser(description="Example testing script.")
    parser.add_argument("--cuda", action="store_true", help="Use cuda")
    parser.add_argument(
        "--clip_max_norm",
        default=1.0,
        type=float,
        help="gradient clipping max norm (default: %(default)s",
    )
    parser.add_argument("--checkpoint", type=str, help="Path to a checkpoint")
    parser.add_argument("--data", type=str, help="Path to dataset")
    parser.add_argument(
        "--real", action="store_true", default=True
    )
    parser.set_defaults(real=False)
    args = parser.parse_args(argv)
    return args


def main(argv):
    args = parse_args(argv)
    p = 128
    path = args.data
    img_list = []
    for file in os.listdir(path):
        if file[-3:] in ["jpg", "png", "peg"]:
            img_list.append(file)
    if args.cuda:
        device = 'cuda:0'
    else:
        device = 'cpu'
    net = FactorizedPrior(64,128)
    net = net.to(device)
    net.eval()
    count = 0
    PSNR = 0
    Bit_rate = 0
    MS_SSIM = 0
    total_time = 0
    dictory = {}
    if args.checkpoint:  # load from previous checkpoint
        print("Loading", args.checkpoint)
        checkpoint = torch.load(args.checkpoint, map_location=device)
        for k, v in checkpoint["state_dict"].items():
            dictory[k.replace("module.", "")] = v
        net.load_state_dict(dictory)
    if args.real:
        net.update()
        for img_name in img_list:
            img_path = os.path.join(path, img_name)
            img = transforms.ToTensor()(Image.open(img_path).convert('RGB')).to(device)
            x = img.unsqueeze(0)
            x_padded, padding = pad(x, p)
            count += 1
            with torch.no_grad():
                if args.cuda:
                    torch.cuda.synchronize()
                s = time.time()
                out_enc = net.compress(x_padded)
                out_dec = net.decompress(out_enc["strings"], out_enc["shape"])
                if args.cuda:
                    torch.cuda.synchronize()
                e = time.time()
                total_time += (e - s)
                out_dec["x_hat"] = crop(out_dec["x_hat"], padding)
                num_pixels = x.size(0) * x.size(2) * x.size(3)
                print(f'Bitrate: {(sum(len(s[0]) for s in out_enc["strings"]) * 8.0 / num_pixels):.3f}bpp')
                print(f'MS-SSIM: {compute_msssim(x, out_dec["x_hat"]):.2f}dB')
                print(f'PSNR: {compute_psnr(x, out_dec["x_hat"]):.2f}dB')
                Bit_rate += sum(len(s[0]) for s in out_enc["strings"]) * 8.0 / num_pixels
                PSNR += compute_psnr(x, out_dec["x_hat"])
                MS_SSIM += compute_msssim(x, out_dec["x_hat"])

    else:
        for img_name in img_list:
            img_path = os.path.join(path, img_name)
            img = Image.open(img_path).convert('RGB')
            x = transforms.ToTensor()(img).unsqueeze(0).to(device)
            x_padded, padding = pad(x, p)
            count += 1
            with torch.no_grad():
                if args.cuda:
                    torch.cuda.synchronize()
                s = time.time()
                out_net = net.forward(x_padded)
                if args.cuda:
                    torch.cuda.synchronize()
                e = time.time()
                total_time += (e - s)
                out_net['x_hat'].clamp_(0, 1)
                out_net["x_hat"] = crop(out_net["x_hat"], padding)
                print(f'PSNR: {compute_psnr(x, out_net["x_hat"]):.2f}dB')
                print(f'MS-SSIM: {compute_msssim(x, out_net["x_hat"]):.2f}dB')
                print(f'Bit-rate: {compute_bpp(out_net):.3f}bpp')
                PSNR += compute_psnr(x, out_net["x_hat"])
                MS_SSIM += compute_msssim(x, out_net["x_hat"])
                Bit_rate += compute_bpp(out_net)
    PSNR = PSNR / count
    MS_SSIM = MS_SSIM / count
    Bit_rate = Bit_rate / count
    total_time = total_time / count
    print(f'average_PSNR: {PSNR:.2f}dB')
    print(f'average_MS-SSIM: {MS_SSIM:.4f}')
    print(f'average_Bit-rate: {Bit_rate:.3f} bpp')
    print(f'average_time: {total_time:.3f} ms')


if __name__ == "__main__":
    print(torch.cuda.is_available())
    main(sys.argv[1:])

2.3模型推理：单张图片

import math
import torch
from torchvision import transforms

from PIL import Image

from pytorch_msssim import ms_ssim
from compressai.zoo import bmshj2018_factorized
from compressai.models.google import FactorizedPrior

def compute_psnr(a, b):
    mse = torch.mean((a - b) ** 2).item()
    return -10 * math.log10(mse)


def compute_msssim(a, b):
    return ms_ssim(a, b, data_range=1.).item()


def compute_bpp(out_net):
    size = out_net['x_hat'].size()
    num_pixels = size[0] * size[2] * size[3]
    return sum(torch.log(likelihoods).sum() / (-math.log(2) * num_pixels)
               for likelihoods in out_net['likelihoods'].values()).item()


if __name__ == '__main__':
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    print(device)
    checkpoint_path = 'checkpoint_loss_best.pth.tar'
    checkpoint = torch.load(checkpoint_path, map_location=device)
    net=FactorizedPrior(128,192)
    net.load_state_dict(checkpoint['state_dict'])
    print(f'Parameters: {sum(p.numel() for p in net.parameters())}')

    img = Image.open('Kodak24/kodim01.png').convert('RGB')
    transform = transforms.Compose([
        transforms.Resize((512, 512)),
        transforms.ToTensor(),
    ])
    x = transform(img).unsqueeze(0).to(device)

    with torch.no_grad():
        out_net = net.forward(x)
        out_net['x_hat'].clamp_(0, 1)
        print(out_net.keys())

    print(f'PSNR: {compute_psnr(x, out_net["x_hat"]):.2f}dB')
    print(f'MS-SSIM: {compute_msssim(x, out_net["x_hat"]):.4f}')
    print(f'Bit-rate: {compute_bpp(out_net):.3f} bpp')