论文速看 Improved Precision and Recall Metric for Assessing Generative Models

最新推荐文章于 2024-07-23 08:34:25 发布

Miao kristoff

最新推荐文章于 2024-07-23 08:34:25 发布

阅读量1.8k

点赞数 48

分类专栏：论文速看文章标签：深度学习人工智能生成对抗网络

本文链接：https://blog.csdn.net/weixin_45372906/article/details/135742666

版权

论文速看专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Improved Precision and Recall Metric for Assessing Generative Models

在这里插入图片描述
Year: 2019
Paper Link: link
Github Link: link

文章目录

Improved Precision and Recall Metric for Assessing Generative Models

简述

作者就是stylegan的作者团队们，他们提出了新的对于准确率和召回率的定义与计算方法，更好的对生成模型的两方面进行评价，一是评价生成图像的quality（生成图像的真实性，生成图像与真实图像的相近程度），二是评价生成图像的多样性和图像质量。相比于FID、IS、KID等指标，是更详细的评价。有了P与R指标，可以看到FID指标侧重图像的多样性。也可以进一步对模型的改进进行分析，即模型的改进到底是提高了quality还是多样性。
需要依据用户的需要选取满足要求的生成模型，有的需要图像的quality高，有的需要生成具有多样性。抛开用户的需求，P与R是一对互斥的指标，最好的模型应该为P与R均高。
对truncation方法进行了分析。也可以对一张图像进行评价。详细见原文。
在P和R的计算上，提出使用最近邻距离对图像特征集的流形进行估计。沿用了之前方法的思路[25]。具体计算见原文。
[25] M. S. M. Sajjadi, O. Bachem, M. Lucic, O. Bousquet, and S. Gelly. Assessing generative models via precision and recall. CoRR, abs/1806.00035, 2018.

Contribution

Our primary contribution is an improved precision and recall metric (Section 2) which provides explicit visibility of the tradeoff between sample quality and variety.We demonstrate the effectiveness of our metric using two recent generative models (Section 3), StyleGAN [13] and BigGAN [4].
We then use our metric to analyze several variants of StyleGAN (Section 4) to better understand the design decisions that determine result quality, and identify new variants that improve the state-of-the-art.
We also perform the ﬁrst principled analysis of truncation methods.
Finally, we extend our metric to estimate the quality of individual generated samples (Section 5), offering a way to measure the quality of latent space interpolations.

Motivation

Sajjadi et al. [25] introduce the classic concepts of precision and recall to the study of generative models, motivated by the observation that FID and related density metrics cannot be used for making conclusions about precision and recall: a low FID may indicate high precision (realistic images), high recall (large amount of variation), or anything in between.

Improved precision and recall metric using k-nearest neighbors

在这里插入图片描述
简要思想：将生成图像集与真实图像集通过VGG映射到高维空间，通过计算每个高维特征的最近邻来对集合的流形进行估计。通过一个二分函数，判断某个特征是否在流形中。

进而定义准确率与召回率

即通过query每张生成图像是否在真实图像的流形中，得到准确率。
对应的，通过query每张真实图像是否在生成图像的流形中，得到召回率。

其余细节见论文。

Experiments

Looking at FID, setups B and D appear almost equally good, illustrating how much weight FID places on variation compared to image quality, also evidenced by the high FID of setup A. The ideal tradeoff between quality and variation depends on the intended application, but it is unclear which application might favor setup D where practically all images are broken over setup B that produces high-quality samples at a lower variation. (hhh)
Our metric provides explicit visibility on this tradeoff and allows quantifying the suitability of a given model for a particular application.

在这里插入图片描述
Figure 5 applies gradually stronger truncation [18, 4, 14, 13] on precision and recall using a single StyleGAN generator.

Using precision and recall to analyze and improve StyleGAN

Generative models have seen rapid improvements recently, and FID has risen as the de facto standard for determining whether a proposed technique is considered beneﬁcial or not. However, as we have shown in Section 3, relying on FID alone may hide important qualitative differences in the results and it may inadvertently favor a particular tradeoff between precision and recall that is not necessarily aligned with the actual goals.
To avoid making assumptions about the desired tradeoff, we identify the Pareto frontier, i.e., the minimal subset of snapshots that is guaranteed to contain the optimal choice for any given tradeoff.
在这里插入图片描述

Conclusion

We have demonstrated through several experiments that the separate assessment of precision and recall can reveal interesting insights about generative models and can help to improve them further. We believe that the separate quantiﬁcation of precision can also be useful in the context of image-to-image translation [34], where the quality of individual images is of great interest.

Using our metric, we have identiﬁed previously unknown training conﬁguration-related effects in Section 4.1, raising the question whether truncation is really necessary (Wow!) if similar tradeoffs can be achieved by modifying the training conﬁguration appropriately. We leave the in-depth study of these effects for future work.

Things do not know

Pareto frontier帕累托边界
“i.e.” 是拉丁语短语 “id est” 的缩写，意思是“即，也就是说”，用于对前面提到的事物进行解释、澄清或重新表述。在句子中使用 “i.e.” 可以用来进一步阐明前文提到的内容。
linux下计数所有图片

ls -1 /path/to/your/folder/*.png 2>/dev/null | wc -l

tar指令解压

tar -xzvf test.tar.gz //解压
tar -czvf test.tar.gz a.c  // 压缩a.c文件为test.tar.gz文件

Words do not know

uninformative“不具信息性的”或“无信息量的”
manifold流形
visibility 物体在某个环境中的清晰程度、可观察到的程度，或者是某个概念、问题或现象被广泛认识或注意到的程度
ambiguity 含糊、模棱两可，或者存在多种解释的状态
extrema 极值
disentangling解耦，从数据中分离出不同的特征或属性，以便更好地理解数据的结构和模式。
qualitative定性的
inadvertently 无意地，不经意地，因疏忽而发生
consecutive 连续的，相邻的。
amortize分期偿还（债务、成本等），分摊