SSIM是一种衡量两幅图片相似度的指标。
出处来自于2004年的一篇TIP,
标题为:Image Quality Assessment: From Error Visibility to Structural Similarity
地址为:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1284395
与PSNR一样,SSIM也经常用作图像质量的评价。
先了解SSIM的输入
SSIM的输入就是两张图像,我们要得到其相似性的两张图像。其中一张是未经压缩的无失真图像(即ground truth),另一张就是你恢复出的图像。所以,SSIM可以作为super-resolution质量的指标。
假设我们输入的两张图像分别是x和y,那么
S
S
I
M
(
x
,
y
)
=
[
l
(
x
,
y
)
]
α
[
c
(
x
,
y
)
]
β
[
s
(
x
,
y
)
]
γ
−
−
−
(
1
)
SSIM(x,y)=[l(x,y)]^\alpha[c(x,y)]^\beta[s(x,y)]^\gamma ---(1)
SSIM(x,y)=[l(x,y)]α[c(x,y)]β[s(x,y)]γ−−−(1)
α
>
0
\alpha>0
α>0,
β
>
0
\beta>0
β>0,and
γ
>
0
\gamma>0
γ>0.
式1是SSIM的数学定义,其中:
l
(
x
,
y
)
=
2
μ
x
μ
y
+
c
1
μ
x
2
+
μ
y
2
+
c
1
,
l(x,y)=\frac{2\mu_x\mu_y+c_1}{\mu_x^2+\mu_y^2+c_1},
l(x,y)=μx2+μy2+c12μxμy+c1,
c
(
x
,
y
)
=
2
σ
x
y
+
c
2
σ
x
2
+
σ
y
2
+
c
2
,
c(x,y)=\frac{2\sigma_{xy}+c_2}{\sigma_x^2+\sigma_y^2+c_2},
c(x,y)=σx2+σy2+c22σxy+c2,
s
(
x
,
y
)
=
σ
x
y
+
c
3
σ
x
σ
y
+
c
3
s(x,y)=\frac{\sigma_{xy}+c_3}{\sigma_x\sigma_y+c_3}
s(x,y)=σxσy+c3σxy+c3
其中l(x, y)是亮度比较,c(x,y)是对比度比较,s(x,y)是结构比较。
μ
x
\mu_x
μx和
μ
y
\mu_y
μy分别代表x,y的平均值,
σ
x
\sigma_x
σx和
σ
y
\sigma_y
σy分别代表x,y的标准差。
σ
x
y
\sigma_{xy}
σxy代表x和y的协方差。而
c
1
c_1
c1,
c
2
c_2
c2,
c
3
c_3
c3分别为常数,避免分母为0带来的系统错误。
在实际工程计算中,我们一般设定
α
=
β
=
γ
=
1
\alpha=\beta=\gamma=1
α=β=γ=1,以及
c
3
=
c
2
/
2
c_3=c_2/2
c3=c2/2,可以将SSIM简化为下:
S
S
I
M
(
x
,
y
)
=
(
2
μ
x
μ
y
+
c
1
)
(
σ
x
y
+
c
2
)
(
μ
x
2
+
μ
y
2
+
c
1
)
(
σ
x
2
+
σ
y
2
+
c
2
)
SSIM(x, y)= \frac{(2\mu_x\mu_y+c_1)(\sigma_{xy}+c_2)}{(\mu_x^2+\mu_y^2+c_1)(\sigma_x^2+\sigma_y^2+c_2)}
SSIM(x,y)=(μx2+μy2+c1)(σx2+σy2+c2)(2μxμy+c1)(σxy+c2)
总结:
- SSIM具有对称性,即SSIM(x,y)=SSIM(y,x)
- SSIM是一个0到1之间的数,越大表示输出图像和无失真图像的差距越小,即图像质量越好。当两幅图像一模一样时,SSIM=1;
如PSNR一样,SSIM这种常用计算函数也被tensorflow收编了,我们只需在tf中调用ssim就可以了:
tf.image.ssim(x, y, 255)
源代码如下:
def ssim(img1, img2, max_val):
"""Computes SSIM index between img1 and img2.
This function is based on the standard SSIM implementation from:
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image
quality assessment: from error visibility to structural similarity. IEEE
transactions on image processing.
Note: The true SSIM is only defined on grayscale. This function does not
perform any colorspace transform. (If input is already YUV, then it will
compute YUV SSIM average.)
Details:
- 11x11 Gaussian filter of width 1.5 is used.
- k1 = 0.01, k2 = 0.03 as in the original paper.
The image sizes must be at least 11x11 because of the filter size.
Example:
# Read images from file.
im1 = tf.decode_png('path/to/im1.png')
im2 = tf.decode_png('path/to/im2.png')
# Compute SSIM over tf.uint8 Tensors.
ssim1 = tf.image.ssim(im1, im2, max_val=255)
# Compute SSIM over tf.float32 Tensors.
im1 = tf.image.convert_image_dtype(im1, tf.float32)
im2 = tf.image.convert_image_dtype(im2, tf.float32)
ssim2 = tf.image.ssim(im1, im2, max_val=1.0)
# ssim1 and ssim2 both have type tf.float32 and are almost equal.
img1: First image batch.
img2: Second image batch.
max_val: The dynamic range of the images (i.e., the difference between the
maximum the and minimum allowed values).
Returns:
A tensor containing an SSIM value for each image in batch. Returned SSIM
values are in range (-1, 1], when pixel values are non-negative. Returns
a tensor with shape: broadcast(img1.shape[:-3], img2.shape[:-3]).
"""
_, _, checks = _verify_compatible_image_shapes(img1, img2)
with ops.control_dependencies(checks):
img1 = array_ops.identity(img1)
# Need to convert the images to float32. Scale max_val accordingly so that
# SSIM is computed correctly.
max_val = math_ops.cast(max_val, img1.dtype)
max_val = convert_image_dtype(max_val, dtypes.float32)
img1 = convert_image_dtype(img1, dtypes.float32)
img2 = convert_image_dtype(img2, dtypes.float32)
ssim_per_channel, _ = _ssim_per_channel(img1, img2, max_val)
# Compute average over color channels.
return math_ops.reduce_mean(ssim_per_channel, [-1])