【提示+图像编辑】Prompt-to-Prompt Image Editing with Cross Attention Control

Prompt-to-Prompt Image Editing with Cross Attention Control 通过交叉注意力控制进行 "提示到提示 "图像编辑

2022.08

论文地址
代码地址
Prompt-to-prompt:让生成的图像保持一致
【AIGC第六篇】Prompt-to-Prompt:基于cross-attention控制的图像编辑技术
请添加图片描述
请添加图片描述

Abstract

Recent large-scale text-driven synthesis models have attracted much attention thanks to their remarkable capabilities of generating highly diverse images that follow given text prompts. Such text-based synthesis methods are particularly appealing to humans who are used to verbally describe their intent. Therefore, it is only natural to extend the text-driven image synthesis to text-driven image editing. Editing is challenging for these generative models, since an innate property of an editing technique is to preserve most of the original image, while in the text-based models, even a small modification of the text prompt often leads to a completely different outcome. State-of-the-art methods mitigate this by requiring the users to provide a spatial mask to localize the edit, hence, ignoring the original structure and content within the masked region. In this paper, we pursue an intuitive prompt-to-prompt editing framework, where the edits are controlled by text only. To this end, we analyze a text-conditioned model in depth and observe that the cross-attention layers are the key to controlling the relation between the spatial layout of the image to each word in the prompt. With this observation, we present several applications which monitor the image synthesis by editing the textual prompt only. This includes localized editing by replacing a word, global editing by adding a specification, and even delicately controlling the extent to which a word is reflected in the image. We present our results over diverse images and prompts, demonstrating high-quality synthesis and fidelity to the edited prompts.

最近的大规模文本驱动合成模型因其能够根据给定的文本提示生成高度多样化的图像而备受关注。这种基于文本的合成方法对习惯于口头描述自己意图的人类特别有吸引力。因此,将文本驱动的图像合成扩展到文本驱动的图像编辑也就顺理成章了。

对于这些生成模型来说,编辑是一项挑战,因为编辑技术的一个固有属性是保留大部分原始图像,而在基于文本的模型中,即使对文本提示稍作修改,也往往会导致完全不同的结果。

最先进的方法通过要求用户提供空间遮罩来定位编辑,从而忽略了遮罩区域内的原始结构和内容,从而缓解了这一问题。

在本文中,我们追求一种直观的 "提示到提示 "编辑框架,即编辑仅由文本控制

为此,我们深入分析了以文本为条件的模型,发现交叉注意层是控制图像空间布局与提示中每个单词之间关系的关键。有鉴于此,我们提出了几种仅通过编辑文字提示来监控图像合成的应用。这包括:

  • 通过替换单词进行局部编辑
  • 通过添加说明进行全局编辑
  • 甚至微妙地控制单词在图像中的反映程度

我们展示了对各种图像和提示进行编辑的结果,显示了高质量的合成和对编辑提示的忠实性。

Method

见文首链接及论文

  • 19
    点赞
  • 21
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值