1. Generative Adversarial Networks for Extreme Learned Image Compression 2019
量化操作:
# [-2., -1., 0., 1., 2.]
centers = tf.cast(tf.range(-2,3), tf.float32)
# Partition W into the Voronoi tesellation over the centers
w_stack = tf.stack([w for _ in range(L)], axis=-1)
w_hard = tf.cast(tf.argmin(input=tf.abs(w_stack - centers), axis=-1), tf.float32) + tf.reduce_min(input_tensor=centers)
# softmax的结果,最小的值代表最接近的的量化值
smx = tf.nn.softmax(-1.0/temperature * tf.abs(w_stack - centers), axis=-1)
# Contract last dimension
w_soft = tf.einsum('ijklm,m->ijkl', smx, centers) # w_soft = tf.tensordot(smx, centers, axes=((-1),(0)))
# Treat quantization as differentiable for optimization
w_bar = tf.round(tf.stop_gradient(w_hard - w_soft) + w_soft)
详细介绍:
>>> import tensorflow as tf
>>> centers=tf.cast(tf.range(-2,3),tf.float32)
>>> centers
<tf.Tensor: shape=(5,), dtype=float32, numpy=array([-2., -1., 0., 1., 2.], dtype=float32)>
>>> w=tf.range(-1,2,0.5)
<tf.Tensor: shape=(6,), dtype=float32, numpy=array([-1. , -0.5, 0. , 0.5, 1. , 1.5], dtype=float32)>
>>> w_stack = tf.stack([w for _ in range(L)], axis=-1)
<tf.Tensor: shape=(6, 5), dtype=float32, numpy=
array([[-1. , -1. , -1. , -1. , -1. ],
[-0.5, -0.5, -0.5, -0.5, -0.5],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5, 0.5, 0.5],
[ 1. , 1. , 1. , 1. , 1. ],
[ 1.5, 1.5, 1.5, 1.5, 1.5]], dtype=float32)>
>>> w_stack-centers
<tf.Tensor: shape=(6, 5), dtype=float32, numpy=
array([[ 1. , 0. , -1. , -2. , -3. ],
[ 1.5, 0.5, -0.5, -1.5, -2.5],
[ 2. , 1. , 0. , -1. , -2. ],
[ 2.5, 1.5, 0.5, -0.5, -1.5],
[ 3. , 2. , 1. , 0. , -1. ],
[ 3.5, 2.5, 1.5, 0.5, -0.5]], dtype=float32)>
>>> w_hard = tf.cast(tf.argmin(input=tf.abs(w_stack - centers), axis=-1), tf.float32) + tf.reduce_min(input_tensor=centers)
>>> w_hard
<tf.Tensor: shape=(6,), dtype=float32, numpy=array([-1., -1., 0., 0., 1., 1.], dtype=float32)>
>>> smx = tf.nn.softmax(-1.0/temperature * tf.abs(w_stack - centers), axis=-1)
>>> smx
<tf.Tensor: shape=(6, 5), dtype=float32, numpy=
array([[0.191516 , 0.52059436, 0.191516 , 0.07045478, 0.02591887],
[0.12813215, 0.34829926, 0.34829926, 0.12813215, 0.04713718],
[0.06745081, 0.18335032, 0.4983978 , 0.18335032, 0.06745081],
[0.04713718, 0.12813215, 0.34829926, 0.34829926, 0.12813215],
[0.02591887, 0.07045478, 0.19151598, 0.52059436, 0.19151598],
[0.01950138, 0.05301026, 0.14409684, 0.39169577, 0.39169577]],
dtype=float32)>
>>> w_soft = tf.einsum('lm,m->l', smx, centers)
>>> w_soft
<tf.Tensor: shape=(6,), dtype=float32, numpy=
array([-0.7813338 , -0.3821571 , 0. , 0.38215706, 0.7813338 ,
1.0830743 ], dtype=float32)>
>>> w_bar = tf.round(tf.stop_gradient(w_hard - w_soft) + w_soft)
>>> w_bar
<tf.Tensor: shape=(6,), dtype=float32, numpy=array([-1., -1., 0., 0., 1., 1.], dtype=float32)>
w_hard = tf.cast(tf.argmin(input=tf.abs(w_stack - centers), axis=-1), tf.float32) + tf.reduce_min(input_tensor=centers)
该代码右边第一项是求得数据与量化值最相近的索引值,但是为了要把它变成量化值,需要我们再分别加上最小的量化值。
既然这里已经有量化值了,那为甚还有后面的操作步骤呢?
为我的理解的话是在这一行代码(非常巧妙):
w_bar = tf.round(tf.stop_gradient(w_hard - w_soft) + w_soft)
这一行中,我们通过一个加和减的操作,在保证了总体结果不变的情况下下好讷讷感保证梯度的顺利反向传播。在这里,因为不可导的部分体现在w_hard,所以为了可导我们取消了和w_hard有关的梯度。
参考代码为社区代码:GAN