END-TO-END OPTIMIZED IMAGE COMPRESSION代码报错记录（tensorflow-gpu2.5版本，代码2.2版）

小夭。

已于 2022-09-28 18:58:06 修改

阅读量3.1k

点赞数 2

分类专栏：文献代码复现 TensorFlow 图像压缩文章标签： tensorflow 深度学习人工智能

于 2022-03-23 19:50:06 首次发布

本文链接：https://blog.csdn.net/m0_47146037/article/details/123694480

版权

本文记录了使用tensorflow-gpu2.5进行端到端图像压缩的代码执行过程，包括训练、GPU资源使用、JSON解析问题以及压缩流程。遇到的挑战包括GPU版本不匹配、内存不足和No algorithm worked错误，最终通过调整CUDA版本和cudnn解决了问题。此外，文章还详细介绍了如何计算不同码率下Kodak数据集的平均PSNR值。

摘要由CSDN通过智能技术生成

对应github地址
gihub
首先安装tensorflow-compression

pip install tensorflow-compression==2.2

安装tensorflow-datasets

pip install tensorflow-datasets

~~数据集用的是imagenet的验证集~~
数据集改用CLIC dataset CLIC

前言

一些命令的记录
（1）运行python文件的训练模型命令（只针对当前代码）

python bls2017.py --verbose train --train_glob="/home/ll/END-TO-END-OPTIMIZED-IMAGE-COMPRESSION/image/*JPEG"

如果默认数据集CLIC下载

python bls2017.py -V train

（2）linux下的数据集压缩包解压
tar

tar -xvf ILSVRC2012_img_val.tar -C image

在这里插入图片描述
zip

unzip kodak.zip

在这里插入图片描述

在这里插入图片描述
（3）screen
screen -S <作业名称>　　　　　　　　　创建新的页

screen -ls　　　　　　　　　　　　　　　查询已经存在的页面

screen -r <作业名称/作业编号>　　　　　　进入页面

Ctrl+a+d　　　　　　　　　　　　　　　　离开页面，页面进入后台

Screen -S screenID -X quit 删除对应编号screen

进入页面后 exit　　　　　　　　　　　　　退出页面

快键键Ctrl+a+d实现分离

详细教程可以参考screen基本操作
（4）输出训练日志
输出训练日志

训练

1、默认按照命令训练

如果默认数据集CLIC下载

python bls2017.py -V train

没有报错
训练成功后的部分日志
在这里插入图片描述

:01:41.058725: W tensorflow/core/lib/png/png_io.cc:88] PNG warning: iCCP: known incorrect sRGB profile
2022-03-30 19:01:41.059903: W tensorflow/core/lib/png/png_io.cc:88] PNG warning: iCCP: known incorrect sRGB profile
2022-03-30 19:01:41.063225: W tensorflow/core/lib/png/png_io.cc:88] PNG warning: iCCP: known incorrect sRGB p1000/1000 [==============================] - 540s 540ms/step - loss: 0.9337 - bpp: 0.4624 - mse: 47.1311 - val_loss: 0.8934 - val_bpp: 0.3957 - val_mse: 49.7729
bls2017 args.model_path
W0330 19:01:48.329876 140611135039296 continuous_batched.py:276] Computing quantization offsets using offset heuristic within a tf.function. Ideally, the offset heuristic should only be used to determine offsets once after training. Depending on the prior, estimating the offset might be computationally expensive.
W0330 19:01:49.131051 140611135039296 continuous_batched.py:276] Computing quantization offsets using offset heuristic within a tf.function. Ideally, the offset heuristic should only be used to determine offsets once after training. Depending on the prior, estimating the offset might be computationally expensive.
W0330 19:01:49.467364 140611135039296 continuous_batched.py:276] Computing quantization offsets using offset heuristic within a tf.function. Ideally, the offset heuristic should only be used to determine offsets once after training. Depending on the prior, estimating the offset might be computationally expensive.
W0330 19:01:49.860946 140611135039296 continuous_batched.py:276] Computing quantization offsets using offset heuristic within a tf.function. Ideally, the offset heuristic should only be used to determine offsets once after training. Depending on the prior, estimating the offset might be computationally expensive.
W0330 19:01:52.657727 140611135039296 save.py:265] Found untraced functions such as gdn_0_layer_call_fn, gdn_0_layer_call_and_return_conditional_losses, gdn_1_layer_call_fn, gdn_1_layer_call_and_return_conditional_losses, igdn_0_layer_call_fn while saving (showing 5 of 8). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: bls2017/assets
I0330 19:01:54.141711 140611135039296 builder_impl.py:780] Assets written to: bls2017/assets

2、整理训练流程

这里我在打断点看到整个代码的运行过程的时候，发现在model中只有model.compile和model.fit，但是具体的损失还有bpp、mse的计算都在model类的call方法中，但是没有找到在哪里调用了call

def train(args):
  """Instantiates and trains the model.实例化并训练模型。"""
  if args.check_numerics:
    tf.debugging.enable_check_numerics() # 张量数字有效检查

  model = BLS2017Model(args.lmbda, args.num_filters)
  # 配置训练方法，算bpp、mse、lose的加权平均
  model.compile(
      optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4), # 用优化器传入学习率进行梯度下降
  )

  if args.train_glob: # 给了数据集路径（不过滤大小直接裁剪）
    train_dataset = get_custom_dataset("train", args)
    validation_dataset = get_custom_dataset("validation", args)
  else: # 没给数据集路径，默认下载clic数据集（过滤大小、裁剪）
    train_dataset = get_dataset("clic", "train", args)
    validation_dataset = get_dataset("clic", "validation", args)
  validation_dataset = validation_dataset.take(args.max_validation_steps)

  model.fit(
      train_dataset.prefetch(8), # 开启预加载数据
      epochs=args.epochs,
      steps_per_epoch=args.steps_per_epoch,
      validation_data=validation_dataset.cache(),
      validation_freq=1,
      callbacks=[
          tf.keras.callbacks.TerminateOnNaN(),
          tf.keras.callbacks.TensorBoard(
              log_dir=args.train_path,
              histogram_freq=1, update_freq="epoch"),
          tf.keras.callbacks.experimental.BackupAndRestore(args.train_path),
      ],
      verbose=int(args.verbose), # 日志显示
  )
  print(args.model_path, 'args.model_path')
  model.save(args.model_path)

所以去查询了一下tensorflow2.x的init、build、call
Tensorflow2自定义Layers之__init__,build和call详解
这边博客的代码如下

import tensorflow as tf
class MyDenseLayer(tf.keras.layers.Layer):
  def __init__(self, num_outputs):
    super(MyDenseLayer, self).__init__()
    print('init 被执行')
    self.num_outputs = num_outputs
    self.i = 0
    print('Init:This is i',self.i)
    self.i = self.i +1
  def build(self,input_shape):
    print('build 被执行')
    print('input_shape',input_shape)
    print('Build:This is i',self.i)
    self.kernel = self.add_weight("kernel",
                                  shape=[int(input_shape[-1]),
                                         self.num_outputs])
  
  def call(self, input):
    print('call 被执行')
    return tf.matmul(input, self.kernel)

layer = MyDenseLayer(10)
_ = layer(tf.zeros([10, 5])) # Calling the layer `.builds` it.
print([var.name for var in layer.trainable_variables])
_ = layer(tf.ones([10, 5]))
print([var.name for var in layer.trainable_variables])

执行结果可以看原博客，整个执行过程就是当第一次类被实例化之后，__ init__、build、call都会被执行一次，后面每次调用一次，call都会重新执行一次。

应用到我们的代码当中

class BLS2017Model(tf.keras.Model):
  """Main model class."""

  def __init__(self, lmbda, num_filters): # 这里的self就是实例化对象
    super().__init__()
    self.lmbda = lmbda
    self.analysis_transform = AnalysisTransform(num_filters)
    self.synthesis_transform = SynthesisTransform(num_filters)
    self.prior = tfc.NoisyDeepFactorized(batch_shape=(num_filters,)) #先验概率
    self.build((None, None, None, 3))

  def call(self, x, training):
    """Computes rate and distortion losses."""
    # 该库中的熵模型类简化了设计率失真优化代码的过程。在训练期间，它们的行为类似于似然模型。
    entropy_model = tfc.ContinuousBatchedEntropyModel(
        self.prior, coding_rank=3, compression=False)
    y = self.analysis_transform(x)
    y_hat, bits = entropy_model(y, training=training)
    x_hat = self.synthesis_transform(y_hat)
    # Total number of bits divided by total number of pixels.
    #  tf.reduce_prod 计算一个张量的各个维度上元素的乘积.（长乘宽）
    num_pixels =