【文本生成图像 风格保护】InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 从文本到图像生成中实现风格保护的免费午餐

2024.4.3

论文地址
代码地址

请添加图片描述

Principle

Separating Content from Image. Benefit from the good characterization of CLIP global features, after subtracting the content text fea- tures from the image features, the style and content can be explicitly decoupled. Although simple, this strategy is quite effective in mitigating content leakage.
从图像中分离内容。得益于 CLIP 全局特征的良好特性,从图像特征中减去内容文本特征后,风格和内容就可以明确地分离开来。这一策略虽然简单,却能有效减少内容泄露。

请添加图片描述
Injecting into Style Blocks Only. Empirically, each layer of a deep network captures different semantic information the key observation in our work is that there exists two specific attention layers handling style. Specifically, we find up blocks.0.attentions.1 and down blocks.2.attentions.1 capture style (color, material, atmosphere) and spatial layout (structure, composition) respectively.
仅注入风格块。经验表明,深度网络的每一层都能捕捉到不同的语义信息。具体来说,我们发现向上块 0.attentions.1 和向下块 2.attentions.1 分别捕捉风格(颜色、材料、氛围)和空间布局(结构、组成)。
请添加图片描述

Abstract

Tuning-free diffusion-based models have demonstrated significant potential in the realm of image personalization and customization. However, despite this notable progress, current models continue to grapple with several complex challenges in producing style-consistent image generation. Firstly, the concept of style is inherently underdetermined, encompassing a multitude of elements such as color, material, atmosphere, design, and structure, among others. Secondly, inversion-based methods are prone to style degradation, often resulting in the loss of fine-grained details. Lastly, adapter-based approaches frequently require meticulous weight tuning for each reference image to achieve a balance between style intensity and text controllability. In this paper, we commence by examining several compelling yet frequently overlooked observations. We then proceed to introduce InstantStyle, a framework designed to address these issues through the implementation of two key strategies: 1) A straightforward mechanism that decouples style and content from reference images within the feature space, predicated on the assumption that features within the same space can be either added to or subtracted from one another. 2) The injection of reference image features exclusively into style-specific blocks, thereby preventing style leaks and eschewing the need for cumbersome weight tuning, which often characterizes more parameter-heavy designs.Our work demonstrates superior visual stylization outcomes, striking an optimal balance between the intensity of style and the controllability of textual elements. Our codes will be available at https://github.com/InstantStyle/InstantStyle.

基于无调谐扩散的模型已在图像个性化和定制领域展现出巨大潜力。

然而,尽管取得了显著进展,目前的模型在生成风格一致的图像时仍然面临着一些复杂的挑战。

  • 首先,风格的概念本质上是不确定的,它包含多种元素,如颜色、材料、氛围、设计和结构等等。
  • 其次,基于反转的方法容易造成风格退化,往往会导致精细细节的丢失。
  • 最后,基于适配器的方法经常需要对每张参考图像进行细致的权重调整,以实现风格强度和文本可控性之间的平衡。

在本文中,

  • 我们首先研究了几个引人注目但却经常被忽视的问题。
  • 然后,我们继续介绍 InstantStyle,这是一个旨在通过实施两个关键策略来解决这些问题的框架:
    1. 一种直接的机制,将风格和内容与特征空间内的参考图像解耦,其前提是同一空间内的特征可以相互添加或减去。
    2. 将参考图像特征完全注入特定风格块中,从而防止了风格泄漏,并避免了繁琐的权重调整(这通常是参数较多的设计的特点)。

我们的工作展示了卓越的视觉风格化成果,在风格强度和文本元素的可控性之间取得了最佳平衡。

demos

Stylized Synthesis 风格化合成

请添加图片描述
请添加图片描述

Image-based Stylized Synthesis 基于图像的风格化合成

请添加图片描述

Comparison with Previous Works 与前作比较

在这里插入图片描述

  • 16
    点赞
  • 28
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值