pytorch源代码结构_关于pytorch中的结构相似性指数sim模拟理论代码

最新推荐文章于 2024-07-23 00:52:58 发布

weixin_26750481

最新推荐文章于 2024-07-23 00:52:58 发布

阅读量1.3k

点赞数

文章标签： java python

原文链接：https://medium.com/srm-mic/all-about-structural-similarity-index-ssim-theory-code-in-pytorch-6551b455541e

版权

本文深入探讨PyTorch的源代码结构，并重点关注结构相似性指数(SSIM)的理论和PyTorch实现。通过翻译自Medium的文章，读者将了解SSIM如何衡量图像质量并掌握其在PyTorch中的应用。

摘要由CSDN通过智能技术生成

pytorch源代码结构

Recently, while implementing a depth estimation paper, I came across the term Structural Similarity Index(SSIM). SSIM is used as a metric to measure the similarity between two given images. As this technique has been around since 2004, a lot of material exists explaining the theory behind SSIM but very few resources go deep into the details, that too specifically for a gradient-based implementation as SSIM is often used as a loss function. Hence, this article is my humble attempt to plug this gap!

最近，在实施深度评估论文时，我遇到了术语结构相似指数(SSIM) 。 SSIM用作度量来测量两个给定的图像之间的相似性 。由于自2004年以来就开始使用这种技术，因此存在大量材料来解释SSIM背后的理论，但是很少有资源深入到细节中，因为基于SSIM经常用作损失函数，因此对于基于梯度的实现也是如此。因此，本文是我为填补这一空白而作的谦卑尝试！

The objective of this article is two-fold,

本文的目的是双重的，

To explain the theory and intuition behind SSIM and explore some of its application in current cutting edge Deep Learning.
解释SSIM背后的理论和直觉，并探讨其在当前最先进的深度学习中的应用。
Go deep into a PyTorch implementation. You can skip to the code here. The full implementation can be found as a standalone notebook here. Just click on the “Open in Colab” link to start running the code!
深入研究PyTorch实现。您可以在此处跳至代码。完整的实现可以在这里作为独立笔记本找到。只需单击“在Colab中打开”链接即可开始运行代码！

So let’s begin!

让我们开始吧！

理论 (The Theory)

SSIM was first introduced in the 2004 IEEE paper, Image Quality Assessment: From Error Visibility to Structural Similarity. The abstract provides a good intuition into the idea behind the system proposed,

SSIM在2004年IEEE论文《 图像质量评估：从错误可见性到结构相似性》中首次引入。摘要为我们提出的系统背后的思想提供了很好的直觉，

Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information.

传统上，用于评估感知图像质量的客观方法尝试使用人类视觉系统的各种已知属性来量化失真图像和参考图像之间的错误(差异)的可见性。在人类视觉感知非常适合从场景中提取结构信息的假设下，我们引入了基于结构信息退化的质量评估的替代补充框架。

Summary: The authors make 2 essential points,

摘要：作者提出了两个要点，

Most Image quality assessment techniques rely on quantifying errors between a reference and a sample image. A common metric is to quantify the difference in the values of each of the corresponding pixels between the sample and the reference images (By using, for example, Mean Squared Error).
大多数图像质量评估技术都依赖于量化参考图像和样本图像之间的误差。通用度量是量化样本和参考图像之间的每个对应像素的值的差异(通过使用例如均方误差) 。
The Human visual perception system is highly capable of identifying structural information from a scene and hence identifying the differences between the information extracted from a reference and a sample scene. Hence, a metric that replicates this behavior will perform better on tasks that involve differentiating between a sample and a reference image.
人类视觉感知系统具有从场景中识别结构信息的能力，因此可以识别从参考场景和样本场景中提取的信息之间的差异 。因此，复制此行为的度量将在涉及区分样本和参考图像的任务上表现更好。

The Structural Similarity Index (SSIM) metric extracts 3 key features from an image:

结构相似性索引(SSIM)度量标准从图像中提取3个关键特征：

Luminance
亮度
Contrast
对比
Structure
结构体

The comparison between the two images is performed on the basis of these 3 features.

基于这三个特征进行两个图像之间的比较。

Fig 1 given below shows the arrangement and flow of the Structural Similarity Measurement system. Signal X and Signal Y refer to the Reference and Sample Images.

下面给出的图1显示了结构相似性测量系统的布置和流程。 信号X和信号Y是参考图像和样本图像。

Image for post — Fig 1: The Structural Similarity Measurement System. Source: https://www.cns.nyu.edu/pub/eero/wang03-reprint.pdf

但是，该指标计算了什么？ (But what does this metric calculate?)

This system calculates the Structural Similarity Index between 2 given images which is a value between -1 and +1. A value of +1 indicates that the 2 given images are very similar or the same while a value of -1 indicates the 2 given images are very different. Often these values are adjusted to be in the range [0, 1], where the extremes hold the same meaning.

该系统计算两个给定图像之间的结构相似性指数 ，该值是-1和+1之间的值。 值+1表示2个给定图像非常相似或相同，而值-1表示2个给定图像非常不同。 通常将这些值调整到[0，1]范围内，其中极值具有相同的含义。

Now, let’s explore briefly, how these features are represented mathematically, and how they contribute to the final SSIM score.

现在，让我们简要地探讨一下这些功能如何以数学方式表示，以及它们如何对最终SSIM得分做出贡献。

Luminance: Luminance is measured by averaging over all the pixel values. Its denoted by μ (Mu) and the formula is given below,
亮度：通过对所有像素值求平均值来测量亮度。其用μ(Mu)表示，公式如下：

Contrast: It is measured by taking the standard deviation (square root of variance) of all the pixel values. It is denoted by σ (sigma) and represented by the formula below,
对比度 ：通过获取所有像素值的标准偏差(方差的平方根)进行测量。它由σ(sigma)表示，并由以下公式表示，

Structure: The structural comparison is done by using a consolidated formula (more on that later) but in essence, we divide the input signal with its standard deviation so that the result has unit standard deviation which allows for a more robust comparison.
结构：结构比较是通过使用合并公式来完成的(稍后会详细介绍)，但是本质上，我们将输入信号与其标准偏差相除，以便结果具有单位标准偏差，从而可以进行更可靠的比较。

So now we have established the mathematical intuition behind the three parameters. But hold on! We are not yet done with the math, a little bit more. What we lack now, are comparison functions that can compare the two given images on these parameters, and finally, a combination function that combines them all. Here, we define the comparison functions and finally the combination function that yields the similarity index value

因此，现在我们已经建立了这三个参数背后的数学直觉。但是等一下！我们还没有完成数学运算，还有更多。我们现在缺少的是可以在这些参数上比较两个给定图像的比较函数 ，最后是将它们全部组合在一起的组合函数 。在这里，我们定义比较函数，最后定义产生相似性指标值的组合函数

Luminance comparison function: It is defined by a function, l(x, y) which is shown below. μ (mu) represents the mean of a given image. x and y are the two images being compared.
亮度比较函数：它由一个函数l(x，y)定义 ，如下所示。 μ(μ)表示给定图像的平均值。 x和y是要比较的两个图像。

where C1 is a constant to ensure stability when the denominator becomes 0. C1 is given by,

其中C1是一个常数，以确保分母变为0时的稳定性。

Contrast comparison function: It is defined by a function c(x, y) which is shown below. σ denotes the standard deviation of a given image. x and y are the two images being compared.
对比度比较函数：由函数c(x，y)定义 ，如下所示。 σ表示给定图像的标准偏差。 x和y是要比较的两个图像。

where C2 is given by,

C2是由

Structure comparison function: It is defined by the function s(x, y) which is shown below. σ denotes the standard deviation of a given image. x and y are the two images being compared.
结构比较函数：由函数s(x，y)定义 ，如下所示。 σ表示给定图像的标准偏差。 x和y是要比较的两个图像。

where σ(xy) is defined as,

其中σ(xy)定义为

And finally, the SSIM score is given by,

最后，SSIM分数由下式得出：

where α > 0, β > 0, γ > 0 denote the relative importance of each of the metrics. To simplify the expression, if we assume, α = β = γ = 1 and C3 = C2/2, we can get,

其中α> 0，β> 0，γ> 0表示每个指标的相对重要性。为了简化表达式，假设a =β=γ= 1且C3 = C2 / 2，我们可以得出，

但是这里有一个阴谋！ (But there’s a plot twist!)

While you would be able to implement SSIM using the above formulas, chances are it won’t be as good as the ready-to-use implementations available, as the authors explain that,

尽管您可以使用上述公式来实现SSIM，但它可能不如可用的现成实现好，正如作者所解释的那样，

For image quality assessment, it is useful to apply the SSIM index locally rather than globally. First, image statistical features are usually highly spatially nonstationary. Second, image distortions, which may or may not depend on the local image statistics, may also be space-variant. Third, at typical viewing distances, only a local area in the image can be perceived with high resolution by the human observer at one time instance (because of the foveation feature of the HVS [49], [50]). And finally, localized quality measurement can provide a spatially varying quality map of the image, which delivers more information about the quality degradation of the image and may be useful in some applications.

对于图像质量评估，在本地而不是全局应用SSIM索引很有用。首先，图像统计特征通常在空间上非常不稳定。其次，可能或可能不取决于本地图像统计信息的图像失真也可能是空间变化的。第三，在典型的观看距离下，人类观察者一次只能以高分辨率分辨图像中的局部区域(由于HVS的偏心特征[49]，[50])。最后，局部质量测量可以提供图像的空间变化质量图，该图可以提供有关图像质量下降的更多信息，并且在某些应用中可能有用。

Summary: Instead of applying the above metrics globally (i.e. all over the image at once) it’s better to apply the metrics regionally (i.e. in small sections of the image and taking the mean overall).

简介：与其在全局上(即一次在整个图像上)应用上述指标，不如在区域内(即在图像的一小部分并整体取平均值)应用这些指标。

This method is often referred to as the Mean Structural Similarity Index.

该方法通常称为平均结构相似性指数。

Due to this change in approach, our formulas also deserve modifications to reflect the same (it should be noted that this approach is more common and will be used to explain the code).

由于方法的这种变化，我们的公式也应进行修改以反映相同的方法(应注意，这种方法更为普遍，将用于解释代码)。

(Note: If the content below seems a bit overwhelming, no worries! If you get the gist of it, then going through the code will give you a much clearer idea.)

( 注意：如果下面的内容似乎有些让人不知所措，请不要担心！如果您了解其中的要点，那么遍历代码将使您更加清楚。)

The authors use an 11x11 circular-symmetric Gaussian Weighing function (basically, an 11x11 matrix whose values are derived from a gaussian distribution) which moves pixel-by-pixel over the entire image. At each step, the local statistics and SSIM index are calculated within the local window. Since we are now calculating the metrics locally, our formulas are revised as,

作者使用11x11圆对称高斯加权函数( 基本上是11x11矩阵，其值从高斯分布得出)在整个图像上逐像素移动。在每个步骤中，都会在本地窗口中计算本地统计信息和SSIM索引。由于我们现在正在本地计算指标，因此我们将公式修改为：

Where wi is the gaussian weighting function.

其中wi是高斯加权函数。

If you found this a bit unintuitive, no worries! It suffices to imagine wi as a multiplicand that is used to calculate the required values with the help of some mathematical tricks.

如果您觉得这有点不直观，那就不用担心！只需将wi想象为被乘数，就可以借助一些数学技巧来计算所需的值。

Once computations are performed all over the image, we simply take the mean of all the local SSIM values and arrive at the global SSIM value.

一旦在整个图像上执行了计算，我们就简单地取所有本地SSIM值的平均值，然后得出全局 SSIM值。

Finally done with the theory! Now onto the code!

终于完成了理论！现在进入代码！

代码 (The Code)

Before we plunge into the code, it’s important to note that we won’t be going through every line but we will explore in-depth the essential ones. Let’s get started!

在深入研究代码之前，需要注意的是，我们不会遍历每行代码，但我们将深入探讨关键代码。让我们开始吧！

The full code can be found as a standalone notebook here. Just click on the “Open in Colab” button to start running the code! The explanation in this section will be referring to the notebook mentioned above.

完整代码可在此处作为独立笔记本找到。只需单击“在Colab中打开”按钮即可开始运行代码！本节中的解释将参考上述笔记本。

First, let’s explore some utility functions that perform some essential tasks.

首先，让我们探索一些执行一些基本任务的实用程序功能。

功能＃1：高斯(window_size，sigma) (Function #1: gaussian(window_size, sigma))

This function essentially generates a list of numbers (of length equal to window_size) sampled from a gaussian distribution. The sum of all the elements is equal to 1 and the values are normalized. Sigma is the standard deviation of the gaussian distribution.

此函数本质上会生成一个从高斯分布中采样的数字( 长度等于window_size )列表。所有元素的总和等于1，并将值标准化。 Sigma是高斯分布的标准偏差。

Note: This is used to generate the 11x11 gaussian window mentioned above.

注：这是用来生成提到的11×11高斯窗口上方。

Example:

例：

Code:gauss_dis = gaussian(11, 1.5)
print("Distribution: ", gauss_dis)
print("Sum of Gauss Distribution:", torch.sum(gauss_dis))Output: Distribution:  tensor([0.0010, 0.0076, 0.0360, 0.1094, 0.2130, 0.2660, 0.2130, 0.1094, 0.0360,0.0076, 0.0010]) Sum of Gauss Distribution: tensor(1.)

函数2：create_window(window_size，通道) (Function #2: create_window(window_size, channel))

While we generated a 1D tensor of gaussian values, the 1D tensor itself is of no use to us. Hence we gotta convert it to a 2D tensor (the 11x11 Tensor we talked about earlier). The steps taken in this function are as follows,

当我们生成高斯值的一维张量时，一维张量本身对我们没有用。因此，我们必须将其转换为2D张量(我们之前讨论过的11x11张量)。此功能采取的步骤如下：

Generate the 1D tensor using the gaussian function.
使用高斯函数生成一维张量。
Convert it to a 2D tensor by cross-multiplying with its transpose (this preserves the gaussian character).
通过对其转置进行交叉乘积将其转换为2D张量(这将保留高斯字符)。
Add two extra dimensions to convert it to 4D. (This is only when SSIM is used as a loss function in computer vision)
添加两个额外的尺寸以将其转换为4D。 (仅在将SSIM用作计算机视觉中的损失函数时)
Reshape to adhere to PyTorch weight’s format.
重塑形状以符合PyTorch重量的格式。

Code:window = create_window(11, 3)
print(window.shape)Output: torch.Size([3, 1, 11, 11])

Now that we have explored the two utility functions, let’s go through the main code! The core SSIM is implemented through the ssim() function which is explored below.

现在我们已经探索了两个实用程序功能，让我们看一下主要代码！核心SSIM是通过ssim()函数实现的，下面将进行探讨。

函数3：ssim(img1，img2，val_range，window_size = 11，window = None，size_average = True，full = False) (Function #3: ssim(img1, img2, val_range, window_size=11, window=None, size_average=True, full=False))

Before we move onto the essentials, let us explore what happens in the function before the ssim metrics are calculated,

在介绍基本要素之前，让我们在计算ssim指标之前探索函数中发生的情况，

We set the maximum value of the normalized pixels (implementation detail; needn’t worry)
我们设置归一化像素的最大值(实现细节；请放心)
We initialize the gaussian window by means of the create_window() function IF a window was not provided during the function call.
如果在函数调用期间未提供窗口，我们将通过create_window()函数来初始化高斯窗口。

Once these steps are completed, we go about calculating the various values (the sigmas and the mus of the world) which are needed to arrive at the final SSIM score.

完成这些步骤后，我们便开始计算达到最终SSIM分数所需的各种值(西格玛值和世界水平)。

Note: Since we are calculating local statistics and we need to make it computationally efficient, the formulas used are slightly different (They are just permutations of the formulas discussed above. Relevant mathematical materials are provided in the appendix.)

注意：由于我们正在计算局部统计量，因此需要提高计算效率，因此所使用的公式略有不同(它们只是上述公式的排列。附录中提供了相关的数学资料。)

We first calculate μ(x), μ(y), their squares, and μ(xy). channels here store the number of color channels of the input image. The groups parameter is used to apply a convolution filter to all the input channels. More information regarding groups can be found here.
我们首先计算μ(x)，μ(y)，它们的平方和μ(xy)。此处的通道存储输入图像的颜色通道数。 groups参数用于将卷积滤波器应用于所有输入通道。有关组的更多信息，请参见此处。

channels, height, width = img1.size()mu1 = F.conv2d(img1, window, padding=pad, groups=channels)
mu2 = F.conv2d(img2, window, padding=pad, groups=channels)mu1_sq = mu1 ** 2
mu2_sq = mu2 ** 2mu12 = mu1 * mu2

We then go on to calculate the squares of σ(x), σ(y), and σ(xy). For more math, check Appendix 1.1.
然后，我们继续计算σ(x)，σ(y)和σ(xy)的平方。有关更多数学信息，请参见附录1.1 。

sigma1_sq = F.conv2d(img1 * img1, window, padding=pad, groups=channels) - mu1_sqsigma2_sq = F.conv2d(img2 * img2, window, padding=pad, groups=channels) - mu2_ssigma12 =  F.conv2d(img1 * img2, window, padding=pad, groups=channels) - mu12

Thirdly, we calculate the contrast metric according to the formula mentioned here,
第三，我们根据此处提到的公式计算对比度指标，

contrast_metric = (2.0 * sigma12 + C2) / (sigma1_sq + sigma2_sq + C2)contrast_metric = torch.mean(contrast_metric)

Finally, we calculate the SSIM score and return the mean according to the formula mentioned here.
最后，我们计算SSIM得分并根据此处提到的公式返回平均值。

numerator1 = 2 * mu12 + C1
numerator2 = 2 * sigma12 + C2
denominator1 = mu1_sq + mu2_sq + C1
denominator2 = sigma1_sq + sigma2_sq + C2ssim_score = (numerator1 * numerator2) / (denominator1 * denominator2)return ssim_score.mean()

好多！现在让我们看看代码是如何执行的！ (That was a lot! Now let’s see how the code performs!)

We are going to test the code in three cases to check how does it perform. Let’s get going!

我们将在三种情况下测试代码以检查其性能。我们走吧！

Case #1: True Image vs False Image
案例1：真实图片与错误图片

In the first scenario, we are going to run 2 very different Images through SSIM. One of them is considered the True Image while the other is considered the False Image. (Since we are measuring the difference, the Truth and Falsity labels are essentially interchangeable; They are being used only as reference points.)

在第一种情况下，我们将通过SSIM运行2个非常不同的图像。其中一个被视为真实图像，而另一个被视为错误图像。 (由于我们正在测量差异，因此“真相”和“虚假”标签实际上是可互换的；它们仅用作参考点。)

The images are,

图像是

The code below is for representation purposes only although not much different from the code in the notebook. For more detail and visualization, check the notebook.

下面的代码仅用于表示目的，尽管与笔记本中的代码没有太大区别。有关详细信息和可视化，请检查笔记本。

Code: img1 = load_images("img1.jpg") # helper function to load images
img2 = load_images("img2.jpg")_img1 = tensorify(img1) # helper function to convert cv2 image to tensors _img2 = tensorify(img2)ssim_score = ssim(_img1, _img2, 225)
print(True vs False Image SSIM Score: ", ssim_score)Output:True vs False Image SSIM Score: tensor(0.3385)

Case #2: True Image vs True Image with Gaussian Noise
案例2：真实图像与具有高斯噪声的真实图像

In this scenario, we compare the true image and a heavily noised version of it. The images are shown below,

在这种情况下，我们将真实图像与噪声严重的版本进行比较。图像如下所示，

On running the same piece of code as above we get,

运行与上面相同的代码，我们得到的是

Code: noise = np.random.randint(0, 255, (640, 480, 3)).astype(np.float32)
noisy_img = img1 + noise_img1 = tensorify(img1)
_img2 = tensorify(noisy_img)true_vs_false = ssim(_img1, _img2, val_range=255)print("True vs Noised True Image SSIM Score:", true_vs_false)Output:True vs Noised True Image SSIM Score: tensor(0.0185)

Case #3: True Image vs True Image
案例3：真实图片与真实图片

In the final case, we compare the True Image against itself. Hence, the image shown below is compared to itself. If our SSIM code is working perfectly, the score should be one.

在最后一种情况下，我们将真实图像与自身进行比较。因此，将下面显示的图像与其自身进行比较。如果我们的SSIM代码运行正常，则分数应为1。

On running the piece of code shown below, we can confirm that the SSIM score for this given scenario is indeed one.

运行下面显示的代码，我们可以确认该给定方案的SSIM分数确实为1。

Code:_img1 = tensorify(img1)true_vs_false = ssim(_img1, _img1, val_range=255)print("True vs True Image SSIM Score:", true_vs_false)Output:True vs True Image SSIM Score: tensor(1.)

结论 (Conclusion)

Finally, we are here! In this article, we covered the theory behind SSIM and the code that goes into implementing it. In the References, some additional materials are provided including links to Computer Vision literature where SSIM is used in some form.

最后，我们在这里！在本文中，我们介绍了SSIM背后的理论以及实现它的代码。在参考文献中，提供了一些其他材料，包括指向计算机视觉文献的链接，其中以某种形式使用了SSIM。

Hope understanding SSIM was much easier for you than it was for me :). I tried to focus on the areas that I personally found complicated and difficult to understand, hoping to not only consolidate my learnings but also in the process, help somebody else stumbling along the same path ;).

希望了解SSIM对您而言比对我而言容易得多:)。我尝试着重于我个人认为复杂且难以理解的领域，希望不仅巩固我的学习经验，而且希望在此过程中，帮助其他人跌跌撞撞；)。

Would really appreciate feedback positive or negative. You can drop it in the comments section or reach out to me at pranjaldatta99@gmail.com.

会非常感激正面或负面的反馈。您可以将其放在评论部分，或通过pranjaldatta99@gmail.com与我联系。

附录 (Appendix)

1.0: Since using local statistics, the formula for the mean (μ), changes from

1.0：由于使用局部统计量，因此平均值( μ)的公式从

1.1: The variance (square of standard deviation) formula used in the Python code can be derived as,

1.1： Python代码中使用的方差(标准偏差的平方)公式可以推导为：

While the derivation above shows the general method, in our context the final formula becomes,

尽管以上推导显示了通用方法，但在我们的上下文中，最终公式变为：

翻译自: https://medium.com/srm-mic/all-about-structural-similarity-index-ssim-theory-code-in-pytorch-6551b455541e