深度学习系列43：引入注意力的SAGAN/BigGAN和big_sleep

IE06

已于 2023-10-15 13:47:40 修改

阅读量1.5k

点赞数

分类专栏：深度学习系列文章标签：深度学习 pytorch 人工智能

于 2022-06-14 17:08:30 首次发布

本文链接：https://blog.csdn.net/kittyzc/article/details/125273079

版权

深度学习系列专栏收录该内容

79 篇文章

订阅专栏

本文介绍了SAGAN（Self-Attention Generative Adversarial Networks）和其升级版BigGAN的原理与改进，BigGAN通过增加注意力机制和参数调整提升了图像生成质量。同时，文章提到了big_sleep模型，它是BigGAN与CLIP的结合，用于计算图像和文本的损失，增强生成图像的语义一致性。此外，提供了在GPU环境下使用big_sleep库进行图像生成的代码示例和参数调整技巧，包括如何通过命令行或Python接口创建多阶段图像生成任务。最后，还分享了适用于Windows的图形用户界面工具链接。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. 从SAGAN到BigGAN

sa_gan是Self-Attention Generative Adversarial Networks的缩写。
动机：一般的dc_gan(deep convolution)模型擅长处理含有大量纹理的类型，比如天空、风景等，但在结构上的表现比较差，比如不能正确生成人脸、四肢等。其原因是卷积核不足以覆盖较大的区域。因此，我们加入attention机制。
下图中的fgh类似注意力机制中的kqv
在这里插入图片描述

convolution feature maps的尺寸为[C, W, H]
f/g后的尺寸为[C/8, W*H], h后的尺寸仍旧是[C, W, H]
attention map的尺寸为[W*H,W*H]
attention map和h相乘得到的o，尺寸为[C, W, H]

BigGAN是SAGAN的升级版，包括：

batchsize*8
parameters*2~4
noise truncate：截断先验分布z，在保证多样性的同时，防止生成坏图片。

2. big_sleep模型介绍

big_sheep是结合了CLIP的多模态版本big_gan。git地址为：https://github.com/lucidrains/big-sleep。
由于没有论文，我们简单看下代码：

# 使用biggan生成图像
model = BigGAN.from_pretrained('biggan-deep-512')
out = model(*lats(), 1)

# 使用clip计算图像和文字的损失，加入到discriminator的损失函数中
perceptor, preprocess = clip.load('ViT-B/32')
tx = clip.tokenize('''a cityscape in the style of Van Gogh''')
t = perceptor.encode_text(tx.cuda())
i = perceptor.encode_image(out)
loss1 = latents损失
loss2 = 分类损失
loss3 =-100*torch.cosine_similarity(t, i, dim=-1).mean() # 图像与文本相似度损失

# 其他的步骤和biggan相同

3. 使用方法

在有gpu的机器上，调用pip install big-sleep，

然后直接执行$ dream "a pyramid made of ice"就可以获得图片了。在colab上使用3个半小时，生成的图片如下：
在这里插入图片描述

如果内存足够，可以用大模型：$ dream "storm clouds rolling in over a white barnyard" --larger-model
想要保存的话，添加下面的参数：$ dream "a bowl of apples next to the fireplace" --save-progress --save-every 100
或者保存最佳：$ dream "a room with a view of the ocean" --save-best

在python中调用方法如下：

from big_sleep import Imagine

dream = Imagine(
    text = "fire in the sky",
    lr = 5e-2,
    save_every = 25,
    save_progress = True
)

dream()

高阶玩法：建立一个pipline，逐步生成更高级的图片。用|分割即可：

from big_sleep import Imagine

dream = Imagine(
    text = "an armchair in the form of pikachu|an armchair imitating pikachu|abstract",
    lr = 5e-2,
    save_every = 25,
    save_progress = True
)

dream()

避免模糊和放大：可以添加text_min参数：

from big_sleep import Imagine

dream = Imagine(
    text = "an armchair in the form of pikachu|an armchair imitating pikachu|abstract",
    text_min = "blur|zoom",
)
dream()

如果你用的是windows机器，这里还有一个界面：
https://softology.pro/voc.htm
在这里插入图片描述