T2I diffusion模型是零样本分类器笔记

最新推荐文章于 2024-10-18 17:48:19 发布

umbrellazg

最新推荐文章于 2024-10-18 17:48:19 发布

阅读量1k

点赞数 8

文章标签：笔记人工智能

本文链接：https://blog.csdn.net/m0_51576139/article/details/136959932

版权

1 tle

Text-to-Image Diffusion Models are Zero-Shot Classifiers（Kevin Clark, Priyank Jaini）【NeurIPS Proceedings 2023】

2 Conclusion

This study investigates diffusion models by proposing a method for evaluating them as zero-shot classifiers. The key idea is using a diffusion model’s ability to denoise a noised image given a text description of a label as a proxy for that label’s likelihood.

3 Good Sentences

1、We show text-to-image diffusion models can be used as effective zero-shot classifiers. While using too much compute to be very practical on downstream tasks, the method provides a
way of quantitatively studying what the models learn.（The main contribution and the remaining shortcomings）
2、More specifically, the method repeatedly noises and denoises the input image while conditioning the model on a different text prompt for each possible class. The class whose text prompt results in the best denoising ability is predicted. This procedure is expensive because it requires denoising many times per class (with different noise levels)（The essence of this method and its shortcomings：very expensive）
3、Our paper is complementary to concurrent work from Li et al. (2023), who use Stable Diffusion as a zero-shot classifier and explore some different tasks like relational reasoning. While their approach is similar to ours, they perform different analysis, and their results are slightly worse than ours due to them using a simple hand-tuned class pruning method and no timestep weighting.（The advance of this study when compare to concurrent works）

在互联网的大规模数据上预先训练的大型模型可以有效地适应各种下游任务，比如用于图像的CLIP和用于文字的GPT-3，越来越多的模型被用于零样本分类任务，这篇文章把diffusion模型用于零样本分类，效果跟CLIP-2接近，但是计算资源需求量很大。

首先计算多个时间步长内每个标签提示的去噪分数，以生成分数矩阵。然后，通过在时间步长上使用加权函数聚合每个类别的分数来对图像进行分类。图像被分配给具有最低总分的类