【实验记录】在本地运行基于stable diffusion的图像压缩案例Stable Diffusion based lossy image compression

最新推荐文章于 2024-09-28 15:21:11 发布

0x211

最新推荐文章于 2024-09-28 15:21:11 发布

阅读量959

点赞数 29

分类专栏：实验文章标签： stable diffusion 人工智能

本文链接：https://blog.csdn.net/m0_52911108/article/details/142204224

版权

实验专栏收录该内容

3 篇文章 0 订阅

订阅专栏

作者原文发布链接：https://pub.towardsai.net/stable-diffusion-based-image-compresssion-6f1f0a399202

原始案例发布链接：Stable Diffusion based lossy image compression.ipynb - Colab (google.com)

声明

本文是个人在尝试离线部署压缩图像压缩项目的记录，仅供参考。几乎所有文件均为外部下载然后导入服务器，如果完全依照原始案例进行运行则此文不必观看。

前置实验

仿照案例，在本地跑通stable diffuison：Stable Diffusion 跑通总结_sd-v1-4.ckpt 下载-CSDN博客

我自己在进行前置实验的时候的一些排错记录：【排错记录】在Ubuntu上部署stable diffusion（非webui）-CSDN博客

准备工作

在已经能够跑通stable diffusion的服务器上开始部署jupyter notebook

我使用的实验环境是前置实验中配置的ldm1虚拟conda环境，在安装之前要先

conda activate ldm1

然后再在这个环境中安装JN

安装完毕后添加代码自动补全插件（个人习惯，在自己的电脑上这些插件都有，在服务器也安装插件会让我心情变好）。插件安装参考：Jupyter Notebook实现自动补全功能（并且可以让Jupyter Notebook界面更好看）_jupyter notebook代码自动补全-CSDN博客

改变notebook的默认文件夹到自己喜欢的位置，参考方案：Ubuntu更改jupyter-notebook的默认文件目录_为什么ubuntu的jupyter文件路径-CSDN博客

将ipynb文件下载下来，放入本地，准备工作完毕，下面开始进行实验

实验步骤（出错了，可运行的源码见后文，此小节仅记录我的错误路径

cell1

运行第一个cell的文件，把环境给补全一下

阅读后面的cell的文件，理清代码的思路

cell2

cell2的作用是配置好输入输出路径、huggingface的准入token以及预训练模型的路径。

由于先前的实验已经把v1-4模型给下载到本地并已经可用成功txt2img了，在此处我就自定义设置了。我的设置：

在jupyter默认路径（我已经修改为Documents/jupyter_files文件夹了）下我新建了input和output两个文件夹，作为输入和输出路径；token不设置因为我在此处不需要通过huggingface下载东西；模型路径定义为本地模型所在的绝对路径。

{编辑于编写cell7的内容：实际上在运行了后面的cell7，会提示没有文件，因此在这里要把input和output都改为绝对路径

}

cell3

cell3的作用是链接到谷歌云盘，加载测试压缩图像

本次实验中测试图像的来源：https://pub.towardsai.net/stable-diffusion-based-image-compresssion-6f1f0a399202

进入作者发布的文章，手动下载原初的jpg文件，并放置在cell2中设置好的input_folder文件夹路径中去。

在此处我选择直接注释掉，不运行（注释是防止有人点击jupyter的重启内核+运行全部cell）

cell4

作用：链接到Huggingface（拥抱黄豆）然后下载所需的模型。由于前面的实验已经在本地配置完毕了模型，且在cell2的时候已经把模型路径更换为本地的绝对路径了，在此处我的选择也是把整个cell注释掉，原因同上。

cell5

定义辅助函数，尝试直接运行。报错：

错因：在cell4中import torch是被直接注释掉了，在运行cell5之前额外添加一个cell用来import torch即可。

{编辑于cell7内容之后：在最后运行也报错了，因为在cell4中需要的导入autocast，因此需要再来一个导入包：

}

cell6

定义压缩方案，直接运行即可。

cell7

循环输入文件夹中的图像，将其大小调整为 512x512，然后压缩。

阅读代码和解释框后得到信息：

在input文件夹中创建一个名为rescaled的文件夹，用来保存压缩为512x512的图像。访问该文佳佳中的图片进行压缩处理。

在输入文件夹的根目录（在这里是在我的）中创建一个名为“compression_test”的文件夹，并在compression_test内创建一个名为“input”的文件夹。将一些 JPG 和 PNG 测试图像放入输入文件夹中，然后运行以下单元格，这将创建一个输出文件夹并显示压缩结果

有一个小问题：在cell2中填入的input是绝对路径，因此需要改正一下。（已在cell2中纠正）

运行后报错：

报错原因：在cell4中没有运行

from torch.cuda.amp import autocast

导入autocast后尝试运行，又一次报错：

emmm事实证明不能绕过cell4的下载阶段来实现图像的压缩。由于多种原因，采用外地下载，导入本地的方式来加载所需要的各种预训练参数。

再次尝试实验

已经把huggingface上面的整个v1-4库给放置在本地了，现在问题的关键就是对cell4中代码的处理。要想知道cell4中一些关键函数的意义，那就不得不去查API了。

在自定义修改一些东西后尝试运行cell4的内容，发现甚至import阶段就开始出错了

经过查询信息得知，无法导入是因为diffusers和transformers库的版本不兼容，把transformers库进行更新后，发现可以正常导入diffusers库了，此问题得到解决。

新的问题：'PNDMScheduler' object has no attribute 'set_format'

解决：把.set_format("pt")删去

再次尝试运行，结果：

ValueError Traceback (most recent call last)
Cell In[12], line 26
24 f = os.path.join(rescaled_folder, filename)
25 if os.path.isfile(f):
---> 26 compress_input(f, os.path.splitext(os.path.join(output_folder, filename))[0])
27 time.sleep(0.1) # sleep so execution can be interrupted

Cell In[11], line 49, in compress_input(input_file, output_path)
47 display(img_from_latents)
48 print('VAE roundtrip')
---> 49 print_metrics(gt_img, img_from_latents)
51 # Quantize latent representation and save as lossless webp image
52 quantized = quantize(latents)

Cell In[10], line 50, in print_metrics(gt, img)
48 img = np.array(img)
49 print('PSNR: ' + str(get_psnr(gt, img)))
---> 50 print('SSIM: ' + str(get_ssim(gt, img, multichannel=True, data_range=img.max() - img.min())))

File ~/anaconda3/envs/ldm1/lib/python3.8/site-packages/skimage/metrics/_structural_similarity.py:178, in structural_similarity(im1, im2, win_size, gradient, data_range, channel_axis, gaussian_weights, full, **kwargs)
175 win_size = 7 # backwards compatibility
177 if np.any((np.asarray(im1.shape) - win_size) < 0):
--> 178 raise ValueError(
179 'win_size exceeds image extent. '
180 'Either ensure that your images are '
181 'at least 7x7; or pass win_size explicitly '
182 'in the function call, with an odd value '
183 'less than or equal to the smaller side of your '
184 'images. If your images are multichannel '
185 '(with color channels), set channel_axis to '
186 'the axis number corresponding to the channels.')
188 if not (win_size % 2 == 1):
189 raise ValueError('Window size must be odd.')

ValueError: win_size exceeds image extent. Either ensure that your images are at least 7x7; or pass win_size explicitly in the function call, with an odd value less than or equal to the smaller side of your images. If your images are multichannel (with color channels), set channel_axis to the axis number corresponding to the channels.

解决方案：

手动设置win_size的大小：

在cell5的最后一行变更为：

print('SSIM: ' + str(get_ssim(gt, img, multichannel=True, data_range=img.max() - img.min(), win_size=3)))

结果：

输入：

输出：

感觉还是很糊啊，达不到原文的效果

排查错误（用了好几天

我是↑↓

在ubuntu上只能双击显示jpg图像，无法显示webp图像，而实际上最终的输出是webp格式的图像，这点误导我了，难怪打开jpg只看到了压缩，而没有进行精细化处理。

但是啊，结果和原文中的还有很大的差距，原文中的处理结果是真的细致，链钥匙扣都精细化突出了。

再次查看原文，发现我下载的原始数据竟然是被压缩为jpg的格式，输入的就是糊的很的图像，输出只会很相似，因此也糊

继续编辑：

我是↑↓

jpg和webp的输出只不过是进行一个对比试验，对比PSNR值和SSIM值，以及肉眼观测对比，目的是为了突出表示用sd压缩的方案好，文件大小比较小且肉眼观测精度比较高

实验结果

总的来说生成了文件占用低且效果比较好的图片，但是，这个低占用的并不是以图片的形式进行保存，而是在output文件夹中以.bin二进制文件的形式（骆驼例子中，二进制文件大小实际占用5.1kb，计算得到的大小占用4.77kb）进行保存。如果将其转化为jpg格式或者jpeg格式，得到的结果是50kb；转化为png格式进行保存，实际占用480kb；转化为webp格式，占用45.6kb。

多次实验探究得到：生成的压缩图像是无论如何都无法获得超过原始图像的清晰度的。这不是简单的stable diffusion里面的img2img，这是一个通过内部作用，使得最终结果逼近原始输出的案例。

因此，上传的原始照片本来就很糊，得到的结果也不会清晰。

经过u-net去噪后得到的最终结果，输出在jupyter notbook中，而不是以图片的形式保存在本地

修改后的源码（按照cell逐个粘贴）

下面的源码适配本地手动导入必要的模型。懒人直接按照原始案例中的运行即可

# 在新环境下需要运行次cell来安装所需支持，后续运行把下面注释掉

!pip install -qq diffusers["training"] transformers ftfy
!pip install -qq libimagequant
!pip install -qq mozjpeg-lossless-optimization
!pip install -qq scikit-image
!pip install Pillow -U

input文件夹和output文件夹自定义到本地。可以预先下载好huggingface中的项目到本地，把目录写入第四行中。第三行注释了因为我已经下载好到本地了，不需要再使用token获取下载。

input_folder = '/home/user/Documents/jupyter_files/input'
output_folder = '/home/user/Documents/jupyter_files/output'
# huggingface_token = ''
pretrained_model_name_or_path = "/home/user/Documents/sd-v1.4-for-compression/"

from diffusers import AutoencoderKL, UNet2DConditionModel, UNet2DModel, StableDiffusionImg2ImgPipeline
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
from diffusers.schedulers import DDIMScheduler, LMSDiscreteScheduler, PNDMScheduler
import torch
from torch.cuda.amp import autocast

#torch_device = "cpu"
torch_device = "cuda"

vae = AutoencoderKL.from_pretrained(
    pretrained_model_name_or_path, subfolder="vae",local_files_only=True
).to(torch_device)

unet = UNet2DConditionModel.from_pretrained(
    pretrained_model_name_or_path, subfolder="unet",local_files_only=True
).to(torch_device)

scheduler = PNDMScheduler(
    beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear",
    num_train_timesteps=1000, skip_prk_steps=True
)
# scheduler = scheduler.to(torch_device)

text_encoder = CLIPTextModel.from_pretrained(
    pretrained_model_name_or_path, subfolder="text_encoder",local_files_only=True
)

tokenizer = CLIPTokenizer.from_pretrained(
    pretrained_model_name_or_path,
    subfolder="tokenizer",
    local_files_only=True
    #revision=pretrained_model_revision, torch_dtype=torch.float16
)

uncond_input = tokenizer([""], padding="max_length", max_length=tokenizer.model_max_length, return_tensors="pt")
with torch.no_grad():
  uncond_embeddings = text_encoder(uncond_input.input_ids)[0].to(torch_device)

下面的代码要注意，我在最后一行添加了参数win_size=3，如果使用默认的7，在我的本地是无法运行的。

import PIL
from PIL import Image
import numpy as np
import inspect
import io
import libimagequant as liq
import zlib
import gc
import time
import mozjpeg_lossless_optimization
from skimage.metrics import structural_similarity as get_ssim
from skimage.metrics import peak_signal_noise_ratio as get_psnr

@torch.no_grad()
def to_latents(img:Image):
  np_img = (np.array(img).astype(np.float32) / 255.0) * 2.0 - 1.0
  np_img = np_img[None].transpose(0, 3, 1, 2)
  torch_img = torch.from_numpy(np_img)
  with autocast():
    generator = torch.Generator("cuda").manual_seed(0)
    latents = vae.encode(torch_img.to(vae.dtype).to(torch_device)).latent_dist.sample(generator=generator)
  return latents

@torch.no_grad()
def to_img(latents):
  with autocast():
    torch_img = vae.decode(latents.to(vae.dtype).to(torch_device)).sample
  torch_img = (torch_img / 2 + 0.5).clamp(0, 1)
  np_img = torch_img.cpu().permute(0, 2, 3, 1).detach().numpy()[0]
  np_img = (np_img * 255.0).astype(np.uint8)
  img = Image.fromarray(np_img)
  return img

def resize_to_512(input_file, output_file):
  img = Image.open(input_file).convert('RGB')
  #center crop image
  maxdim = max(img.width, img.height)
  mindim = min(img.width, img.height)
  left = max(0, (img.width - img.height) // 2 - 1)
  top = max(0, (img.height - img.width) // 2 - 1)
  img = img.crop((left, top, left + mindim - 1, top + mindim - 1))
  #resize
  img = img.resize((512,512), Image.LANCZOS)
  img.save(output_file, lossless = True, quality = 100)

def print_metrics(gt, img):
  gt = np.array(gt)
  img = np.array(img)
  print('PSNR: ' + str(get_psnr(gt, img)))
  print('SSIM: ' + str(get_ssim(gt, img, multichannel=True, data_range=img.max() - img.min(), win_size=3)))

def quantize(latents):
  quantized_latents = (latents / (255 * 0.18215) + 0.5).clamp(0,1)
  quantized = quantized_latents.cpu().permute(0, 2, 3, 1).detach().numpy()[0]
  quantized = (quantized * 255.0 + 0.5).astype(np.uint8)
  return quantized

def unquantize(quantized):
  unquantized = quantized.astype(np.float32) / 255.0
  unquantized = unquantized[None].transpose(0, 3, 1, 2)
  unquantized_latents = (unquantized - 0.5) * (255 * 0.18215)
  unquantized_latents = torch.from_numpy(unquantized_latents)
  return unquantized_latents.to(torch_device)

@torch.no_grad()
def denoise(latents):
  latents = latents * 0.18215
  step_size = 15
  num_inference_steps = scheduler.config.get("num_train_timesteps", 1000) // step_size
  strength = 0.04
  scheduler.set_timesteps(num_inference_steps)
  offset = scheduler.config.get("steps_offset", 0)
  init_timestep = int(num_inference_steps * strength) + offset
  init_timestep = min(init_timestep, num_inference_steps)
  timesteps = scheduler.timesteps[-init_timestep]
  timesteps = torch.tensor([timesteps], dtype=torch.long, device=torch_device)
  extra_step_kwargs = {}
  if "eta" in set(inspect.signature(scheduler.step).parameters.keys()):
    extra_step_kwargs["eta"] = 0.9
  latents = latents.to(unet.dtype).to(torch_device)
  t_start = max(num_inference_steps - init_timestep + offset, 0)
  with autocast():
    for i, t in enumerate(scheduler.timesteps[t_start:]):
      noise_pred = unet(latents, t, encoder_hidden_states=uncond_embeddings).sample
      latents = scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
  #reset scheduler to free cached noise predictions
  scheduler.set_timesteps(1)
  return latents / 0.18215

def compress_input(input_file, output_path):
  gt_img = Image.open(input_file)
  display(gt_img)
  print('Ground Truth')

  # Display VAE roundtrip image
  latents = to_latents(gt_img)
  img_from_latents = to_img(latents)
  display(img_from_latents)
  print('VAE roundtrip')
  print_metrics(gt_img, img_from_latents)

  # Quantize latent representation and save as lossless webp image
  quantized = quantize(latents)
  del latents
  quantized_img = Image.fromarray(quantized)
  quantized_img.save(output_path + "_sd_quantized_latents.webp", lossless=True, quality=100)

  # Display VAE decoded image from 8-bit quantized latents
  unquantized_latents = unquantize(quantized)
  unquantized_img = to_img(unquantized_latents)
  display(unquantized_img)
  del unquantized_latents
  print('VAE decoded from 8-bit quantized latents')
  print_metrics(gt_img, unquantized_img)

  # further quantize to palette. Use libimagequant for Dithering
  attr = liq.Attr()
  attr.speed = 1
  attr.max_colors = 256
  input_image = attr.create_rgba(quantized.flatten('C').tobytes(),
                                 quantized_img.width,
                                 quantized_img.height,
                                 0)
  quantization_result = input_image.quantize(attr)
  quantization_result.dithering_level = 1.0
  # Get the quantization result
  out_pixels = quantization_result.remap_image(input_image)
  out_palette = quantization_result.get_palette()
  np_indices = np.frombuffer(out_pixels, np.uint8)
  np_palette = np.array([c for color in out_palette for c in color], dtype=np.uint8)

  sd_palettized_bytes = io.BytesIO()
  np.savez_compressed(sd_palettized_bytes, w=64, h=64, i=np_indices.flatten(), p=np_palette)
  with open(output_path + ".npz", "wb") as f:
    f.write(sd_palettized_bytes.getbuffer())

  # Compress the dithered 8-bit latents using zlib and save them to disk
  compressed_bytes = zlib.compress(
      np.concatenate((np_palette, np_indices), dtype=np.uint8).tobytes(),
      level=9
      )
  with open(output_path + ".bin", "wb") as f:
    f.write(compressed_bytes)
  sd_bytes = len(compressed_bytes)

  # Display VAE decoding of dithered 8-bit latents
  np_indices = np_indices.reshape((64,64))
  palettized_latent_img = Image.fromarray(np_indices, mode='P')
  palettized_latent_img.putpalette(np_palette, rawmode='RGBA')
  latents = np.array(palettized_latent_img.convert('RGBA'))
  latents = unquantize(latents)
  palettized_img = to_img(latents)
  display(palettized_img)
  print('VAE decoding of palettized and dithered 8-bit latents')
  print_metrics(gt_img, palettized_img)

  # Use Stable Diffusion U-Net to de-noise the dithered latents
  latents = denoise(latents)
  denoised_img = to_img(latents)
  display(denoised_img)
  del latents
  print('VAE decoding of de-noised dithered 8-bit latents')
  print('size: {}b = {}kB'.format(sd_bytes, sd_bytes/1024.0))
  print_metrics(gt_img, denoised_img)

  # 下面的四种保存图片类型均可进行注释。

  denoised_img.save('denoised_image.png')  # 保存为PNG格式
  print('Denoised image saved as denoised_image.png')

  denoised_img.save('denoised_image.jpg')  # 保存为jpg格式
  print('Denoised image saved as denoised_image.jpg')

  denoised_img.save('denoised_image.webp')  # 保存为webp格式
  print('Denoised image saved as denoised_image.webp')

  denoised_img.save('denoised_image.jpeg')  # 保存为jpeg格式
  print('Denoised image saved as denoised_image.jpeg')


  # 以下内容可以全部进行注释，只是起到一个对比的作用。
  # 分别输出的是jpg压缩和webp压缩的结果
    
#   # Find JPG compression settings that result in closest data size that is larger than SD compressed data
#   jpg_bytes = io.BytesIO()
#   q = 0
#   while jpg_bytes.getbuffer().nbytes < sd_bytes:
#     jpg_bytes = io.BytesIO()
#     gt_img.save(jpg_bytes, format="JPEG", quality=q, optimize=True, subsampling=1)
#     jpg_bytes.flush()
#     jpg_bytes.seek(0)
#     jpg_bytes = io.BytesIO(mozjpeg_lossless_optimization.optimize(jpg_bytes.read()))
#     jpg_bytes.flush()
#     q += 1

#   with open(output_path + ".jpg", "wb") as f:
#     f.write(jpg_bytes.getbuffer())
#   jpg = Image.open(jpg_bytes)
#   try:
#     display(jpg)
#     print('JPG compressed with quality setting: {}'.format(q))
#     print('size: {}b = {}kB'.format(jpg_bytes.getbuffer().nbytes, jpg_bytes.getbuffer().nbytes / 1024.0))
#     print_metrics(gt_img, jpg)
#   except:
#     print('something went wrong compressing {}.jpg'.format(output_path))

#   webp_bytes = io.BytesIO()
#   q = 0
#   while webp_bytes.getbuffer().nbytes < sd_bytes:
#     webp_bytes = io.BytesIO()
#     gt_img.save(webp_bytes, format="WEBP", quality=q, method=6)
#     webp_bytes.flush()
#     q += 1

#   with open(output_path + ".webp", "wb") as f:
#     f.write(webp_bytes.getbuffer())
#   try:
#     webp = Image.open(webp_bytes)
#     display(webp)
#     print('WebP compressed with quality setting: {}'.format(q))
#     print('size: {}b = {}kB'.format(webp_bytes.getbuffer().nbytes, webp_bytes.getbuffer().nbytes / 1024.0))
#     print_metrics(gt_img, webp)
#   except:
#     print('something went wrong compressing {}.webp'.format(output_path))

import os
import shutil
import time
from tqdm import tqdm

rescaled_folder = input_folder + "/rescaled"

if not os.path.isdir(rescaled_folder):
  os.mkdir(rescaled_folder)
print('rescaling images to 512x512')
for i, filename in tqdm(enumerate(os.listdir(input_folder))):
  f_in = os.path.join(input_folder, filename)
  f_out = os.path.join(rescaled_folder, os.path.splitext(filename)[0] + ".png")
  if os.path.isfile(f_in) and not os.path.isfile(f_out):
    try:
      resize_to_512(f_in, f_out)
    except:
      print("skipping {} because the file could not be opened.".format(filename))

if os.path.isdir(output_folder):
  shutil.rmtree(output_folder)
os.mkdir(output_folder)
for filename in os.listdir(rescaled_folder):
  f = os.path.join(rescaled_folder, filename)
  if os.path.isfile(f):
    compress_input(f, os.path.splitext(os.path.join(output_folder, filename))[0])
    time.sleep(0.1) # sleep so execution can be interrupted

逐个运行上面的cell，即可在jupyter notebook中得到输出结果。