论文速看 Improved Precision and Recall Metric for Assessing Generative Models

Improved Precision and Recall Metric for Assessing Generative Models

在这里插入图片描述
Year: 2019
Paper Link: link
Github Link: link

简述

  • 作者就是stylegan的作者团队们,他们提出了新的对于准确率和召回率的定义与计算方法,更好的对生成模型的两方面进行评价,一是评价生成图像的quality(生成图像的真实性,生成图像与真实图像的相近程度),二是评价生成图像的多样性和图像质量。相比于FID、IS、KID等指标,是更详细的评价。有了P与R指标,可以看到FID指标侧重图像的多样性。也可以进一步对模型的改进进行分析,即模型的改进到底是提高了quality还是多样性。
  • 需要依据用户的需要选取满足要求的生成模型,有的需要图像的quality高,有的需要生成具有多样性。抛开用户的需求,P与R是一对互斥的指标,最好的模型应该为P与R均高。
  • 对truncation方法进行了分析。也可以对一张图像进行评价。详细见原文。
  • 在P和R的计算上,提出使用最近邻距离对图像特征集的流形进行估计。沿用了之前方法的思路[25]。具体计算见原文。
    [25] M. S. M. Sajjadi, O. Bachem, M. Lucic, O. Bousquet, and S. Gelly. Assessing generative models via precision and recall. CoRR, abs/1806.00035, 2018.

Contribution

  • Our primary contribution is an improved precision and recall metric (Section 2) which provides explicit visibility of the tradeoff between sample quality and variety.We demonstrate the effectiveness of our metric using two recent generative models (Section 3), StyleGAN [13] and BigGAN [4].
  • We then use our metric to analyze several variants of StyleGAN (Section 4) to better understand the design decisions that determine result quality, and identify new variants that improve the state-of-the-art.
  • We also perform the first principled analysis of truncation methods.
  • Finally, we extend our metric to estimate the quality of individual generated samples (Section 5), offering a way to measure the quality of latent space interpolations.
Motivation

Sajjadi et al. [25] introduce the classic concepts of precision and recall to the study of generative models, motivated by the observation that FID and related density metrics cannot be used for making conclusions about precision and recall: a low FID may indicate high precision (realistic images), high recall (large amount of variation), or anything in between.

Improved precision and recall metric using k-nearest neighbors

在这里插入图片描述
简要思想:将生成图像集与真实图像集通过VGG映射到高维空间,通过计算每个高维特征的最近邻来对集合的流形进行估计。通过一个二分函数,判断某个特征是否在流形中。
在这里插入图片描述
进而定义准确率与召回率
在这里插入图片描述
即通过query每张生成图像是否在真实图像的流形中,得到准确率。
对应的,通过query每张真实图像是否在生成图像的流形中,得到召回率。

其余细节见论文。

Experiments

Looking at FID, setups B and D appear almost equally good, illustrating how much weight FID places on variation compared to image quality, also evidenced by the high FID of setup A. The ideal tradeoff between quality and variation depends on the intended application, but it is unclear which application might favor setup D where practically all images are broken over setup B that produces high-quality samples at a lower variation. (hhh)
Our metric provides explicit visibility on this tradeoff and allows quantifying the suitability of a given model for a particular application.

在这里插入图片描述
Figure 5 applies gradually stronger truncation [18, 4, 14, 13] on precision and recall using a single StyleGAN generator.
在这里插入图片描述

Using precision and recall to analyze and improve StyleGAN

Generative models have seen rapid improvements recently, and FID has risen as the de facto standard for determining whether a proposed technique is considered beneficial or not. However, as we have shown in Section 3, relying on FID alone may hide important qualitative differences in the results and it may inadvertently favor a particular tradeoff between precision and recall that is not necessarily aligned with the actual goals.
To avoid making assumptions about the desired tradeoff, we identify the Pareto frontier, i.e., the minimal subset of snapshots that is guaranteed to contain the optimal choice for any given tradeoff.
在这里插入图片描述

Conclusion

We have demonstrated through several experiments that the separate assessment of precision and recall can reveal interesting insights about generative models and can help to improve them further. We believe that the separate quantification of precision can also be useful in the context of image-to-image translation [34], where the quality of individual images is of great interest.

Using our metric, we have identified previously unknown training configuration-related effects in Section 4.1, raising the question whether truncation is really necessary (Wow!) if similar tradeoffs can be achieved by modifying the training configuration appropriately. We leave the in-depth study of these effects for future work.

Things do not know

  • Pareto frontier帕累托边界
  • “i.e.” 是拉丁语短语 “id est” 的缩写,意思是“即,也就是说”,用于对前面提到的事物进行解释、澄清或重新表述。在句子中使用 “i.e.” 可以用来进一步阐明前文提到的内容。
  • linux下计数所有图片
ls -1 /path/to/your/folder/*.png 2>/dev/null | wc -l
  • tar指令解压
tar -xzvf test.tar.gz //解压
tar -czvf test.tar.gz a.c  // 压缩a.c文件为test.tar.gz文件

Words do not know

uninformative“不具信息性的”或“无信息量的”
manifold流形
visibility 物体在某个环境中的清晰程度、可观察到的程度,或者是某个概念、问题或现象被广泛认识或注意到的程度
ambiguity 含糊、模棱两可,或者存在多种解释的状态
extrema 极值
disentangling解耦,从数据中分离出不同的特征或属性,以便更好地理解数据的结构和模式。
qualitative定性的
inadvertently 无意地,不经意地,因疏忽而发生
consecutive 连续的,相邻的。
amortize分期偿还(债务、成本等),分摊

  • 48
    点赞
  • 34
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值