infoGAN笔记

最新推荐文章于 2024-04-21 10:42:24 发布

昨日啊萌

最新推荐文章于 2024-04-21 10:42:24 发布

阅读量394

点赞数

分类专栏： GAN

本文链接：https://blog.csdn.net/qq_37395293/article/details/104839369

版权

GAN 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

info-GAN学习笔记

算法结构：在这里插入图片描述

相比于普通的GAN网络，infoGAN把输入向量z分成两部分，c和z’。c代表了某些明确的特征，而z可以理解为不可解释的噪声或者特征。算法希望通过约束c与output的关系，使得c的维度对应output的语义特征，比如人物的头发长度，侧脸程度等等。

c的每一个维度对生成的图像都有一个明确的影响，这里比较有意思的是，c不是因为代表了某些特征而被归类的，而是被classifier归类之后的一些特征。

以下是部分代码，仅列出一些比较重要的部分。代码来源是网上的GAN-master里的程序，我改了改，用来找动漫头像的特征参数，在此仅做参考和学习。

生成器：

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        input_dim = opt.latent_dim + opt.n_classes + opt.code_dim

        self.init_size = opt.img_size // 4  # Initial size before upsampling
        self.l1 = nn.Sequential(nn.Linear(input_dim, 128 * self.init_size ** 2))

        self.conv_blocks = nn.Sequential(
            nn.BatchNorm2d(128),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(128, 128, 3, stride=1, padding=1),
            nn.BatchNorm2d(128, 0.8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(128, 64, 3, stride=1, padding=1),
            nn.BatchNorm2d(64, 0.8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(64, opt.channels, 3, stride=1, padding=1),
            nn.Tanh(),
        )

辨别器

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()

        def discriminator_block(in_filters, out_filters, bn=True):
            """Returns layers of each discriminator block"""
            block = [nn.Conv2d(in_filters, out_filters, 3, 2, 1), nn.LeakyReLU(0.2, inplace=True), nn.Dropout2d(0.25)]
            if bn:
                block.append(nn.BatchNorm2d(out_filters, 0.8))
            return block

        self.conv_blocks = nn.Sequential(
            *discriminator_block(opt.channels, 16, bn=False),
            *discriminator_block(16, 32),
            *discriminator_block(32, 64),
            *discriminator_block(64, 128),
        )

        # The height and width of downsampled image
        ds_size = opt.img_size // 2 ** 4

        # Output layers
        self.adv_layer = nn.Sequential(nn.Linear(128 * ds_size ** 2, 1))
        self.aux_layer = nn.Sequential(nn.Linear(128 * ds_size ** 2, opt.n_classes), nn.Softmax())
        self.latent_layer = nn.Sequential(nn.Linear(128 * ds_size ** 2, opt.code_dim))

这里第一个output_layer用来度量生成器欺骗鉴别器的能力
第二个output_laye用来给生成图集的图集分类
第三个应该是生成c的code编码
损失：

# Loss functions
adversarial_loss = torch.nn.MSELoss()
categorical_loss = torch.nn.CrossEntropyLoss()
continuous_loss = torch.nn.MSELoss()

导入图像数据：

data_dir = 'D:/data/faces/'
train_imgs = ImageFolder(os.path.join(data_dir))#会自动索引文件夹下的子文件夹，并进行调用
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_augs = transforms.Compose([
        transforms.Grayscale(1),#将三通道转化为单通道，为了减少计算量
        transforms.Resize(opt.img_size),
        transforms.ToTensor(),
        transforms.Normalize([0.5], [0.5]),
    ])
dataloader = DataLoader(ImageFolder(os.path.join(data_dir,), transform=train_augs),batch_size=128)

之后就是迭代训练了
贴上我训练了一下午的结果图，可以看到，能够较为明显地区分出人物的侧脸程度，头发长短，眼睛大小，抬头低头等特征。

在这里插入图片描述

昨日啊萌

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
infoGAN笔记

info-GAN学习笔记算法结构：相比于普通的GAN网络，infoGAN把输入向量z分成两部分，c和z’。c代表了某些明确的特征，而z可以理解为不可解释的噪声或者特征。算法希望通过约束c与output的关系，使得c的维度对应output的语义特征，比如人物的头发长度，侧脸程度等等。c的每一个维度对生成的图像都有一个明确的影响，这里比较有意思的是，c不是因为代表了某些特征而被归类的，而是被cl...
复制链接

扫一扫