Self-Attention Generative Adversarial Networks (SAGAN) 论文模型复现

最新推荐文章于 2022-07-24 21:39:43 发布

Dic0k

最新推荐文章于 2022-07-24 21:39:43 发布

阅读量1k

点赞数 1

分类专栏：深度学习文章标签： SAGAN GAN 模型复现

本文链接：https://blog.csdn.net/dickdick111/article/details/90110758

版权

深度学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Lab midterm report —— Self-Attention Generative Adversarial Networks

Paper Title: Self-Attention Generative Adversarial Networks

Paper Authors: Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena

Year: 2018

Student: Liang Yinglin-16340132

Description of the problem

Image synthesis is an important problem in computer vision. There has been remarkable progress in this direction with the emergence of Generative Adversarial Networks (GANs). However, GAN model excels at synthesizing image classes with few structural constraints , but it fails to capture geometric or structural patterns that occur consistently in some classes. Since the convolution operator has a local receptive field, long range dependencies can only be processed after passing through several convolutional layers.

One disadvantage of GANs is that after training on large datasets containing multiple types of images, they can not clearly distinguish image categories, and it is difficult to capture the structure, texture and details of these images. Therefore, we can not use a GAN to generate a large number of high-quality images with different categories.

On the other hand, although increasing the size of the convolution core (receptive field) can retain more representations, it is at the expense of efficiency and computation.

Introduction of the method

The authors propose Self-Attention Generative Adversarial Networks (SAGANs), which introduce a self-attention mechanism into convolutional GANs.

The self-attention module is complementary to convolutions and helps with modeling long range, multi-level dependencies across image regions. Armed with self-attention, the generator can draw images in which fine details at every location are carefully coordinated with fine details in distant portions of the image. Moreover, the discriminator can also more accurately enforce complicated geometric constraints on the global image structure.

In addition to self-attention, They propose enforcing good conditioning of GAN generators using the spectral normalization technique that has previously been applied only to the discriminator.

As a result, SAGAN significantly outperforms the state of the art in image synthesis by boosting the best reported Inception score from 36.8 to 52.52 and reducing Fréchet Inception distance from 27.62 to 18.65.

Preliminary results of the experiment

The structure of SGAN model

Generator includes five layers and two self-attention layer:

Layer one
- x = ConvTranspose2d(128, 512, kernel_size=(4, 4), stride=(1, 1))
- x = SpectralNorm(x)
- x = BatchNorm2d(512)
- x = ReLU(x)
Layer two
- x = ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
- x = SpectralNorm(x)
- x = BatchNorm2d(256)
- x = ReLU(x)
Layer three
- ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
- x = SpectralNorm(x)
- x = BatchNorm2d(128)
- x = ReLU(x)
Self_Attn(128)
Layer four
- ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
- x = SpectralNorm(x)
- x = BatchNorm2d(64)
- x = ReLU(x)
Self_Attn(64)
Layer five
- x = ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
- x = Tanh()

Generator(
  (l1): Sequential(
    (0): SpectralNorm(
      (module): ConvTranspose2d(128, 512, kernel_size=(4, 4), stride=(1, 1))
    )
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (l2): Sequential(
    (0): SpectralNorm(
      (module): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    )
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (l3): Sequential(
    (0): SpectralNorm(
      (module): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    )
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (l4): Sequential(
    (0): SpectralNorm(
      (module): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    )
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (last): Sequential(
    (0): ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): Tanh()
  )
  (attn1): Self_Attn(
    (query_conv): Conv2d(128, 16, kernel_size=(1, 1), stride=(1, 1))
    (key_conv): Conv2d(128, 16, kernel_size=(1, 1), stride=(1, 1))
    (value_conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
    (softmax): Softmax()
  )
  (attn2): Self_Attn(
    (query_conv): Conv2d(64, 8, kernel_size=(1, 1), stride=(1, 1))
    (key_conv): Conv2d(64, 8, kernel_size=(1, 1), stride=(1, 1))
    (value_conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
    (softmax): Softmax()
  )
)

Discriminator also includes five layers and two self-attention layer:

Layer one
- x = Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)
- x = LeakyReLU(negative_slope=0.1)
Layer two
- x = Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
- x = SpectralNorm(x)
- x = LeakyReLU(negative_slope=0.1)
Layer three
- x = Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
- x = SpectralNorm(x)
- x = LeakyReLU(negative_slope=0.1)
Self_Attn(256)
Layer four
- x = Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
- x = SpectralNorm(x)
- x = LeakyReLU(negative_slope=0.1)
Self_Attn(512)
Layer five
- x = Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1))

Discriminator(
  (l1): Sequential(
    (0): SpectralNorm(
      (module): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    )
    (1): LeakyReLU(negative_slope=0.1)
  )
  (l2): Sequential(
    (0): SpectralNorm(
      (module): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    )
    (1): LeakyReLU(negative_slope=0.1)
  )
  (l3): Sequential(
    (0): SpectralNorm(
      (module): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    )
    (1): LeakyReLU(negative_slope=0.1)
  )
  (l4): Sequential(
    (0): SpectralNorm(
      (module): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    )
    (1): LeakyReLU(negative_slope=0.1)
  )
  (last): Sequential(
    (0): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1))
  )
  (attn1): Self_Attn(
    (query_conv): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))
    (key_conv): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))
    (value_conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
    (softmax): Softmax()
  )
  (attn2): Self_Attn(
    (query_conv): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
    (key_conv): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
    (value_conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
    (softmax): Softmax()
  )
)

The hyperparameter of SGAN model

batch_size = 64
g_lr = 0.0001
d_lr = 0.0004
lr_decay = 0.95
imsize = 64
total_step = 100000
optimizer = 'Adam'
beta1 = 0.0
beta2 = 0.9

The training set

def load_lsun(self, classes='church_outdoor_train'):
    lsun_transforms = transforms.Compose([
        transforms.Resize((self.imsize,self.imsize)),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

    dataset = dsets.LSUN(self.path, classes=[classes], transform=lsun_transforms)
    return dataset

the ground truth

the generated photos

after 1000 steps:

Elapsed [0:09:07.832233], G_step [1000/100000], D_step[1000/100000], d_out_real: 1.1565,  ave_gamma_l3: -0.0323, ave_gamma_l4: -0.0486

after 10000 steps:

Elapsed [0:45:39.616833], G_step [10000/100000], D_step[10000/100000], d_out_real: 0.7750,  ave_gamma_l3: -0.1495, ave_gamma_l4: -0.2459

after 35000 steps:

Elapsed [2:32:44.314908], G_step [35000/100000], D_step[35000/100000], d_out_real: 0.2414,  ave_gamma_l3: -0.2588, ave_gamma_l4: -0.3762

The planned work

Compare the Spectral Normalization with other normalization in this experiment
Use two-timescale update rule(TTUR) specifically to compensate for the problem of slow learning in a regularized discriminator, making it possible to use fewer generator steps per discriminator step.
Prove the effect of self-attention module on the experimental results.
Adjust hyperparameter to train model.