深度学习之生成对抗网络(3)DCGAN实战


 本节我们来完成一个二次元动漫头像图片生成实战,参考DCGAN的网络结构,其中判别器D利用普通卷积层实现,生成器G利用转置卷积层实现,其网络结构如下图所示:

在这里插入图片描述

DCGAN网络结构

1. 动漫图片数据集

 这里使用的是一组二次元动漫头像数据集[1][2],共51223张图片,无标注信息,图片主体已裁剪、对齐并统一缩放到96×96大小,部分样片如下图所示:

在这里插入图片描述

动漫头像图片数据集

[1] 数据集整理自:https://github.com/chenyuntc/pytorch-book
[2] 数据集下载参考:https://zhuanlan.zhihu.com/p/351083489

 对于自定义的数据集,需要自行完成数据的加载和预处理工作,我们这里聚焦在GAN算法本身,后续自定义数据集一章会详细介绍如何加载自己的数据集,这里直接通过预编写好的make_anime_dataset函数返回已经处理好的数据集对象。代码如下:

from Chapter13.dataset import make_anime_dataset
from tensorflow import keras
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
batch_size = 64  # batch size
# 数据集路径
img_path = glob.glob(r'/Users/XXX/Documents/faces_test/*.jpg')
print('images num:', len(img_path))
# 构建数据集对象,返回数据集Dataset类和图片大小
dataset, img_shape, _ = make_anime_dataset(img_path, batch_size, resize=64)
print(dataset, img_shape)


其中dataset对象就是tf.data.Dataset类实例,已经完成了随机打散、预处理和批量化等操作,img_shape是预处理后的图片大小。运行结果如下所示:

<PrefetchDataset shapes: (64, 64, 64, 3), types: tf.float32> (64, 64, 3)

2. 生成器

 生成网络G由5个转置卷积层单元堆叠而成,实现特征图高宽的层层放大,特征图通道数的层层减少。首先将长度为100的隐藏向量 z \boldsymbol z z通过Reshape操作调整为 [ b , 1 , 1 , 100 ] [b,1,1,100] [b,1,1,100]的4维张量,并依序通过卷积层,放大高宽维度,减少通道数维度,最后得到高宽为64,通道数为3的彩色图片。每个卷积层中间插入BN层来提高训练稳定性,卷积层选择不使用偏置向量。生成器的类代码实现如下:

class Generator(keras.Model):
    # 生成器网络
    def __init__(self):
        super(Generator, self).__init__()
        filter = 64
        # 转置卷积层1,输出channel为filter*8,核大小4,步长1,不使用padding,不使用偏置
        self.conv1 = layers.Conv2DTranspose(filter*8, 4, 1, 'valid', use_bias=False)
        self.bn1 = layers.BatchNormalization()
        # 转置卷积层2
        self.conv2 = layers.Conv2DTranspose(filter*4, 4, 2, 'same', use_bias=False)
        self.bn2 = layers.BatchNormalization()
        # 转置卷积层3
        self.conv3 = layers.Conv2DTranspose(filter*2, 4, 2, 'same', use_bias=False)
        self.bn3 = layers.BatchNormalization()
        # 转置卷积层4
        self.conv4 = layers.Conv2DTranspose(filter*1, 4, 2, 'same', use_bias=False)
        self.bn4 = layers.BatchNormalization()
        # 转置卷积层5
        self.conv5 = layers.Conv2DTranspose(3, 4, 2, 'same', use_bias=False)


 生成网络G的前向传播过程实现如下:

def call(self, inputs, training=None):
    x = inputs  # [z, 100]
    # Reshape乘4D张量,方便后续转置卷积运算:(b, 1, 1, 100)
    x = tf.reshape(x, (x.shape[0], 1, 1, x.shape[1]))
    x = tf.nn.relu(x)  # 激活函数
    # 转置卷积-BN-激活函数:(b, 4, 4, 512)
    x = tf.nn.relu(self.bn1(self.conv1(x), training=training))
    # 转置卷积-BN-激活函数:(b, 8, 8, 256)
    x = tf.nn.relu(self.bn2(self.conv2(x), training=training))
    # 转置卷积-BN-激活函数:(b, 16, 16, 128)
    x = tf.nn.relu(self.bn3(self.conv3(x), training=training))
    # 转置卷积-BN-激活函数:(b, 32, 32, 64)
    x = tf.nn.relu(self.bn4(self.conv4(x), training=training))
    # 转置卷积-激活函数:(b, 64, 64, 3)
    x = self.conv5(x)
    x = tf.tanh(x)  # 输出x范围-1~1,与预处理一致

    return x


生成网络的输出大小为 [ b , 64 , 64 , 3 ] [b,64,64,3] [b,64,64,3]的图片张量,数值范围为 − 1 ∼ 1 -1\sim1 11


3. 判别器

 判别网络D与普通的分类网络相同,接受大小为 [ b , 64 , 64 , 3 ] [b,64,64,3] [b,64,64,3]的图片张量,连续通过5个卷积层实现特征的层层提取,卷积层最终输出大小为 [ b , 2 , 2 , 1024 ] [b,2,2,1024] [b,2,2,1024],再通过池化层GlobalAveragePooling2D将特征大小转换为 [ b , 1024 ] [b,1024] [b,1024],最后通过一个全连接层获得二分类任务的概率。判别网络D类的代码实现如下:

class Discriminator(keras.Model):
    # 判别器
    def __init__(self):
        super(Discriminator, self).__init__()
        filter = 64
        # 卷积层1
        self.conv1 = layers.Conv2D(filter, 4, 2, 'valid', use_bias=False)
        self.bn1 = layers.BatchNormalization()
        # 卷积层2
        self.conv2 = layers.Conv2D(filter*2, 4, 2, 'valid', use_bias=False)
        self.bn2 = layers.BatchNormalization()
        # 卷积层3
        self.conv3 = layers.Conv2D(filter*4, 4, 2, 'valid', use_bias=False)
        self.bn3 = layers.BatchNormalization()
        # 卷积层4
        self.conv4 = layers.Conv2D(filter*8, 3, 1, 'valid', use_bias=False)
        self.bn4 = layers.BatchNormalization()
        # 卷积层5
        self.conv5 = layers.Conv2D(filter*16, 3, 1, 'valid', use_bias=False)
        self.bn5 = layers.BatchNormalization()
        # 全局池化层
        self.pool = layers.GlobalAveragePooling2D()
        # 特征打平
        self.flatten = layers.Flatten()
        # 二分类全连接层
        self.fc = layers.Dense(1)


判别器D的前向计算过程实现如下:

def call(self, inputs, training=None):
    # 卷积-BN-激活函数:(4, 31, 31, 64)
    x = tf.nn.leaky_relu(self.bn1(self.conv1(inputs), training=training))
    # 卷积-BN-激活函数:(4, 14, 14, 128)
    x = tf.nn.leaky_relu(self.bn2(self.conv2(x), training=training))
    # 卷积-BN-激活函数:(4, 6, 6, 256)
    x = tf.nn.leaky_relu(self.bn3(self.conv3(x), training=training))
    # 卷积-BN-激活函数:(4, 4, 4, 512)
    x = tf.nn.leaky_relu(self.bn4(self.conv4(x), training=training))
    # 卷积-BN-激活函数:(4, 2, 2, 1024)
    x = tf.nn.leaky_relu(self.bn5(self.conv5(x), training=training))
    # 卷积-BN-激活函数:(4, 1024)
    x = self.pool(x)
    # 打平
    x = self.flatten(x)
    # 输出,[b, 1024] => [b, 1]
    logits = self.fc(x)

    return logits


判别器的输出大小为 [ b , 1 ] [b,1] [b,1],类内部没有使用Sigmoid激活函数,通过Sigmoid激活函数后可获得 b b b个样本属于真实样本的概率。


4. 训练与可视化

判别网络

 根据:
min ϕ   max ϕ L ( D , G ) = E x r ∼ p r ( ⋅ ) log ⁡ D θ ( x r ) + E x f ∼ p g ( ⋅ ) log ⁡ ( 1 − D θ ( x f ) ) = E x ∼ p r ( ⋅ ) log ⁡ D θ ( x ) + E z ∼ p z ( ⋅ ) log ⁡ ( 1 − D θ ( G ϕ ( z ) ) ) \begin{aligned}\underset{ϕ}{\text{min}} \ \underset{ϕ}{\text{max}}\mathcal L(\text{D},\text{G})&=\mathbb E_{\boldsymbol x_r\sim p_r (\cdot) } \text{log}⁡D_θ (\boldsymbol x_r )+\mathbb E_{\boldsymbol x_f\sim p_g (\cdot) } \text{log}⁡(1-D_θ (\boldsymbol x_f ))\\ &=\mathbb E_{\boldsymbol x\sim p_r (\cdot) } \text{log}⁡D_θ (\boldsymbol x)+\mathbb E_{\boldsymbol z\sim p_z (\cdot)} \text{log}⁡(1-D_θ (G_ϕ (\boldsymbol z)))\end{aligned} ϕmin ϕmaxL(D,G)=Exrpr()logDθ(xr)+Exfpg()log(1Dθ(xf))=Expr()logDθ(x)+Ezpz()log(1Dθ(Gϕ(z)))
判别网络的训练目标是最大化 L ( D , G ) \mathcal L(\text{D},\text{G}) L(D,G)函数,使得真实样本预测为真的概率接近于1,生成样本预测为真的概率接近于0。我们将判别器的误差函数实现在d_loss_fn函数中,将所有真实样本标注为1,所有生成样本标注为0,并通过最小化对应的交叉熵损失函数来实现最大化 L ( D , G ) \mathcal L(\text{D},\text{G}) L(D,G)函数。d_loss_fn函数实现如下:

def d_loss_fn(generator, discriminator, batch_z, batch_x, is_training):
    # 计算判别器的误差函数
    # 采样生成图片
    fake_image = generator(batch_z, is_training)
    # 判定生成图片
    d_fake_logits = discriminator(fake_image, is_training)
    # 判定真实图片
    d_real_logits = discriminator(batch_x, is_training)
    # 真实图片与1之间的误差
    d_loss_real = celoss_ones(d_real_logits)
    # 生成图片与0之间的误差
    d_loss_fake = celoss_zeros(d_fake_logits)
    # 合并误差
    loss = d_loss_fake + d_loss_real

    return loss


其中celoss_ones函数计算当前预测概率与标签1之间的交叉熵损失,代码如下:

def celoss_ones(logits):
    # 计算属于与标签为1的交叉熵
    y = tf.ones_like(logits)
    loss = keras.losses.binary_crossentropy(y, logits, from_logits=True)
    return tf.reduce_mean(loss)


celoss_zeros函数计算当前预测概率与标签0之间的交叉熵损失,代码如下:

def celoss_zeros(logits):
    # 计算属于与便签为0的交叉熵
    y = tf.zeros_like(logits)
    loss = keras.losses.binary_crossentropy(y, logits, from_logits=True)
    return tf.reduce_mean(loss)

生成网络

 生成网络的训练目标是最小化 L ( D , G ) \mathcal L(\text{D},\text{G}) L(D,G)目标函数,由于真实样本与生成器无关,因此误差函数只需要考虑最小化 E z ∼ p z ( ⋅ ) log ⁡ ( 1 − D θ ( G ϕ ( z ) ) ) \mathbb E_{\boldsymbol z\sim p_z (\cdot)} \text{log}⁡(1-D_θ (G_ϕ (\boldsymbol z))) Ezpz()log(1Dθ(Gϕ(z)))项即可。可以通过将生成的样本标注为1,最小化此时的交叉熵误差。需要注意的是,在反向传播误差的过程中,判别器也参与了计算图的构建,但是此阶段只需要更新生成器网络参数,而不更新判别器的网络参数。生成器的误差函数代码如下:

def g_loss_fn(generator, discriminator, batch_z, is_training):
    # 采样生成图片
    fake_image = generator(batch_z, is_training)
    # 在训练生成网络时,需要迫使生成图片判定为真
    d_fake_logits = discriminator(fake_image, is_training)
    # 计算生成图片与1之间的误差
    loss = celoss_ones(d_fake_logits)

    return loss

网络训练

 在每个Epoch,首先从先验分布 p z ( ⋅ ) p_z (\cdot) pz()中随机采样隐藏向量,从真实数据集中随机采样真实图片,通过生成器和判别器计算判别器网络的损失,并优化判别器网络参数 θ θ θ。在训练生成器时,需要借助于判别器来计算误差,但是只计算生成器的梯度信息并更新 ϕ ϕ ϕ。这里设定判别器训练 k = 5 k=5 k=5后,生成器训练一次。

 首先创建生成网络和判别网络,并分别创建对应的优化器。代码如下:

generator = Generator()  # 创建生成器
generator.build(input_shape=(4, z_dim))
discriminator = Discriminator()  # 创建判别器
discriminator.build(input_shape=(4, 64, 64, 3))
# 分别为生成器和判别器创建优化器
g_optimizer = keras.optimizers.Adam(learning_rate=learning_rate, beta_1=0.5)
d_optimizer = keras.optimizers.Adam(learning_rate=learning_rate, beta_1=0.5)


 主训练部分代码实现如下:

for epoch in range(epochs):  # 训练epochs次
    # 1. 训练判别器
    for _ in range(1):
        # 采样隐藏向量
        batch_z = tf.random.normal([batch_size, z_dim])
        batch_x = next(db_iter)  # 采样真实图片
        # 判别器前向计算
        with tf.GradientTape() as tape:
            d_loss = d_loss_fn(generator, discriminator, batch_z, batch_x, is_training)
        grads = tape.gradient(d_loss, discriminator.trainable_variables)
        d_optimizer.apply_gradients(zip(grads, discriminator.trainable_variables))
    # 2. 训练生成器
    # 采样隐藏向量
    batch_z = tf.random.normal([batch_size, z_dim])
    batch_x = next(db_iter)  # 采样真实图片
    # 生成器前向计算
    with tf.GradientTape() as tape:
        g_loss = g_loss_fn(generator, discriminator, batch_z, is_training)
    grads = tape.gradient(g_loss, generator.trainable_variables)
    g_optimizer.apply_gradients(zip(grads, generator.trainable_variables))


每间隔100个Epoch,进行一次图片生成测试。通过从先验分布中随机采样隐向量,送入生成器生成图片,并保存为文件。

 如下图所示,展示了DCGAN模型在训练过程中保存的生成图片样例,可以观察到,大部分图片主体明确,色彩逼真,图片多样性较丰富,图片效果较为贴近数据集中真实的图片。同时也能发现仍有少量生成图片损坏,无法通过人眼辨识主体。

在这里插入图片描述

DCGAN图片生成效果

5. 完整代码

dataset

import multiprocessing

import tensorflow as tf


def make_anime_dataset(img_paths, batch_size, resize=64, drop_remainder=True, shuffle=True, repeat=1):

    # @tf.function
    def _map_fn(img):
        img = tf.image.resize(img, [resize, resize])
        # img = tf.image.random_crop(img,[resize, resize])
        # img = tf.image.random_flip_left_right(img)
        # img = tf.image.random_flip_up_down(img)
        img = tf.clip_by_value(img, 0, 255)
        img = img / 127.5 - 1  # -1~1
        return img

    dataset = disk_image_batch_dataset(img_paths,
                                          batch_size,
                                          drop_remainder=drop_remainder,
                                          map_fn=_map_fn,
                                          shuffle=shuffle,
                                          repeat=repeat)
    img_shape = (resize, resize, 3)
    len_dataset = len(img_paths) // batch_size

    return dataset, img_shape, len_dataset


def batch_dataset(dataset,
                  batch_size,
                  drop_remainder=True,
                  n_prefetch_batch=1,
                  filter_fn=None,
                  map_fn=None,
                  n_map_threads=None,
                  filter_after_map=False,
                  shuffle=True,
                  shuffle_buffer_size=None,
                  repeat=None):
    # set defaults
    if n_map_threads is None:
        n_map_threads = multiprocessing.cpu_count()
    if shuffle and shuffle_buffer_size is None:
        shuffle_buffer_size = max(batch_size * 128, 2048)  # set the minimum buffer size as 2048

    # [*] it is efficient to conduct `shuffle` before `map`/`filter` because `map`/`filter` is sometimes costly
    if shuffle:
        dataset = dataset.shuffle(shuffle_buffer_size)

    if not filter_after_map:
        if filter_fn:
            dataset = dataset.filter(filter_fn)

        if map_fn:
            dataset = dataset.map(map_fn, num_parallel_calls=n_map_threads)

    else:  # [*] this is slower
        if map_fn:
            dataset = dataset.map(map_fn, num_parallel_calls=n_map_threads)

        if filter_fn:
            dataset = dataset.filter(filter_fn)

    dataset = dataset.batch(batch_size, drop_remainder=drop_remainder)

    dataset = dataset.repeat(repeat).prefetch(n_prefetch_batch)

    return dataset


def memory_data_batch_dataset(memory_data,
                              batch_size,
                              drop_remainder=True,
                              n_prefetch_batch=1,
                              filter_fn=None,
                              map_fn=None,
                              n_map_threads=None,
                              filter_after_map=False,
                              shuffle=True,
                              shuffle_buffer_size=None,
                              repeat=None):
    """Batch dataset of memory data.

    Parameters
    ----------
    memory_data : nested structure of tensors/ndarrays/lists

    """
    dataset = tf.data.Dataset.from_tensor_slices(memory_data)
    dataset = batch_dataset(dataset,
                            batch_size,
                            drop_remainder=drop_remainder,
                            n_prefetch_batch=n_prefetch_batch,
                            filter_fn=filter_fn,
                            map_fn=map_fn,
                            n_map_threads=n_map_threads,
                            filter_after_map=filter_after_map,
                            shuffle=shuffle,
                            shuffle_buffer_size=shuffle_buffer_size,
                            repeat=repeat)
    return dataset


def disk_image_batch_dataset(img_paths,
                             batch_size,
                             labels=None,
                             drop_remainder=True,
                             n_prefetch_batch=1,
                             filter_fn=None,
                             map_fn=None,
                             n_map_threads=None,
                             filter_after_map=False,
                             shuffle=True,
                             shuffle_buffer_size=None,
                             repeat=None):
    """Batch dataset of disk image for PNG and JPEG.

    Parameters
    ----------
        img_paths : 1d-tensor/ndarray/list of str
        labels : nested structure of tensors/ndarrays/lists

    """
    if labels is None:
        memory_data = img_paths
    else:
        memory_data = (img_paths, labels)

    def parse_fn(path, *label):
        img = tf.io.read_file(path)
        img = tf.image.decode_jpeg(img, channels=3)  # fix channels to 3
        return (img,) + label

    if map_fn:  # fuse `map_fn` and `parse_fn`
        def map_fn_(*args):
            return map_fn(*parse_fn(*args))
    else:
        map_fn_ = parse_fn

    dataset = memory_data_batch_dataset(memory_data,
                                        batch_size,
                                        drop_remainder=drop_remainder,
                                        n_prefetch_batch=n_prefetch_batch,
                                        filter_fn=filter_fn,
                                        map_fn=map_fn_,
                                        n_map_threads=n_map_threads,
                                        filter_after_map=filter_after_map,
                                        shuffle=shuffle,
                                        shuffle_buffer_size=shuffle_buffer_size,
                                        repeat=repeat)

    return dataset

GAN

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers


class Generator(keras.Model):
    # 生成器网络
    def __init__(self):
        super(Generator, self).__init__()
        filter = 64
        # 转置卷积层1,输出channel为filter*8,核大小4,步长1,不使用padding,不使用偏置
        self.conv1 = layers.Conv2DTranspose(filter*8, 4, 1, 'valid', use_bias=False)
        self.bn1 = layers.BatchNormalization()
        # 转置卷积层2
        self.conv2 = layers.Conv2DTranspose(filter*4, 4, 2, 'same', use_bias=False)
        self.bn2 = layers.BatchNormalization()
        # 转置卷积层3
        self.conv3 = layers.Conv2DTranspose(filter*2, 4, 2, 'same', use_bias=False)
        self.bn3 = layers.BatchNormalization()
        # 转置卷积层4
        self.conv4 = layers.Conv2DTranspose(filter*1, 4, 2, 'same', use_bias=False)
        self.bn4 = layers.BatchNormalization()
        # 转置卷积层5
        self.conv5 = layers.Conv2DTranspose(3, 4, 2, 'same', use_bias=False)

    def call(self, inputs, training=None):
        x = inputs  # [z, 100]
        # Reshape乘4D张量,方便后续转置卷积运算:(b, 1, 1, 100)
        x = tf.reshape(x, (x.shape[0], 1, 1, x.shape[1]))
        x = tf.nn.relu(x)  # 激活函数
        # 转置卷积-BN-激活函数:(b, 4, 4, 512)
        x = tf.nn.relu(self.bn1(self.conv1(x), training=training))
        # 转置卷积-BN-激活函数:(b, 8, 8, 256)
        x = tf.nn.relu(self.bn2(self.conv2(x), training=training))
        # 转置卷积-BN-激活函数:(b, 16, 16, 128)
        x = tf.nn.relu(self.bn3(self.conv3(x), training=training))
        # 转置卷积-BN-激活函数:(b, 32, 32, 64)
        x = tf.nn.relu(self.bn4(self.conv4(x), training=training))
        # 转置卷积-激活函数:(b, 64, 64, 3)
        x = self.conv5(x)
        x = tf.tanh(x)  # 输出x范围-1~1,与预处理一致

        return x


class Discriminator(keras.Model):
    # 判别器
    def __init__(self):
        super(Discriminator, self).__init__()
        filter = 64
        # 卷积层1
        self.conv1 = layers.Conv2D(filter, 4, 2, 'valid', use_bias=False)
        self.bn1 = layers.BatchNormalization()
        # 卷积层2
        self.conv2 = layers.Conv2D(filter*2, 4, 2, 'valid', use_bias=False)
        self.bn2 = layers.BatchNormalization()
        # 卷积层3
        self.conv3 = layers.Conv2D(filter*4, 4, 2, 'valid', use_bias=False)
        self.bn3 = layers.BatchNormalization()
        # 卷积层4
        self.conv4 = layers.Conv2D(filter*8, 3, 1, 'valid', use_bias=False)
        self.bn4 = layers.BatchNormalization()
        # 卷积层5
        self.conv5 = layers.Conv2D(filter*16, 3, 1, 'valid', use_bias=False)
        self.bn5 = layers.BatchNormalization()
        # 全局池化层
        self.pool = layers.GlobalAveragePooling2D()
        # 特征打平
        self.flatten = layers.Flatten()
        # 二分类全连接层
        self.fc = layers.Dense(1)

    def call(self, inputs, training=None):
        # 卷积-BN-激活函数:(4, 31, 31, 64)
        x = tf.nn.leaky_relu(self.bn1(self.conv1(inputs), training=training))
        # 卷积-BN-激活函数:(4, 14, 14, 128)
        x = tf.nn.leaky_relu(self.bn2(self.conv2(x), training=training))
        # 卷积-BN-激活函数:(4, 6, 6, 256)
        x = tf.nn.leaky_relu(self.bn3(self.conv3(x), training=training))
        # 卷积-BN-激活函数:(4, 4, 4, 512)
        x = tf.nn.leaky_relu(self.bn4(self.conv4(x), training=training))
        # 卷积-BN-激活函数:(4, 2, 2, 1024)
        x = tf.nn.leaky_relu(self.bn5(self.conv5(x), training=training))
        # 卷积-BN-激活函数:(4, 1024)
        x = self.pool(x)
        # 打平
        x = self.flatten(x)
        # 输出,[b, 1024] => [b, 1]
        logits = self.fc(x)

        return logits


def main():

    d = Discriminator()
    g = Generator()

    x = tf.random.normal([2, 64, 64, 3])
    z = tf.random.normal([2, 100])

    prob = d(x)
    print(prob)
    x_hat = g(z)
    print(x_hat.shape)


if __name__ == '__main__':
    main()

GAN_train

import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
# from scipy.misc import toimage
from PIL import Image
import glob
from Chapter13.GAN import Generator, Discriminator
from Chapter13.dataset import make_anime_dataset


def save_result(val_out, val_block_size, image_path, color_mode):
    def preprocess(img):
        img = ((img + 1.0) * 127.5).astype(np.uint8)
        # img = img.astype(np.uint8)
        return img

    preprocesed = preprocess(val_out)
    final_image = np.array([])
    single_row = np.array([])
    for b in range(val_out.shape[0]):
        # concat image into a row
        if single_row.size == 0:
            single_row = preprocesed[b, :, :, :]
        else:
            single_row = np.concatenate((single_row, preprocesed[b, :, :, :]), axis=1)

        # concat image row to final_image
        if (b + 1) % val_block_size == 0:
            if final_image.size == 0:
                final_image = single_row
            else:
                final_image = np.concatenate((final_image, single_row), axis=0)

            # reset single row
            single_row = np.array([])

    if final_image.shape[2] == 1:
        final_image = np.squeeze(final_image, axis=2)
    Image.fromarray(final_image).save(image_path)


def celoss_ones(logits):
    # 计算属于与标签为1的交叉熵
    y = tf.ones_like(logits)
    loss = keras.losses.binary_crossentropy(y, logits, from_logits=True)
    return tf.reduce_mean(loss)


def celoss_zeros(logits):
    # 计算属于与便签为0的交叉熵
    y = tf.zeros_like(logits)
    loss = keras.losses.binary_crossentropy(y, logits, from_logits=True)
    return tf.reduce_mean(loss)


def d_loss_fn(generator, discriminator, batch_z, batch_x, is_training):
    # 计算判别器的误差函数
    # 采样生成图片
    fake_image = generator(batch_z, is_training)
    # 判定生成图片
    d_fake_logits = discriminator(fake_image, is_training)
    # 判定真实图片
    d_real_logits = discriminator(batch_x, is_training)
    # 真实图片与1之间的误差
    d_loss_real = celoss_ones(d_real_logits)
    # 生成图片与0之间的误差
    d_loss_fake = celoss_zeros(d_fake_logits)
    # 合并误差
    loss = d_loss_fake + d_loss_real

    return loss


def g_loss_fn(generator, discriminator, batch_z, is_training):
    # 采样生成图片
    fake_image = generator(batch_z, is_training)
    # 在训练生成网络时,需要迫使生成图片判定为真
    d_fake_logits = discriminator(fake_image, is_training)
    # 计算生成图片与1之间的误差
    loss = celoss_ones(d_fake_logits)

    return loss


def main():
    tf.random.set_seed(3333)
    np.random.seed(3333)
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
    assert tf.__version__.startswith('2.')

    z_dim = 100  # 隐藏向量z的长度
    epochs = 3000000  # 训练步数
    batch_size = 64  # batch size
    learning_rate = 0.0002
    is_training = True

    # 获取数据集路径
    # C:\Users\z390\Downloads\anime-faces
    # r'C:\Users\z390\Downloads\faces\*.jpg'
    # img_path = glob.glob(r'/Users/XXX/Documents/faces_test\*\*.jpg') + \
    #            glob.glob(r'/Users/XXX/Documents/faces_test\*\*.png')
    # 数据集路径
    img_path = glob.glob(r'/Users/XXX/Documents/faces_test/*.jpg')
    # img_path = glob.glob(r'C:\Users\z390\Downloads\getchu_aligned_with_label\GetChu_aligned2\*.jpg')
    # img_path.extend(img_path2)
    print('images num:', len(img_path))
    # 构建数据集对象,返回数据集Dataset类和图片大小
    dataset, img_shape, _ = make_anime_dataset(img_path, batch_size, resize=64)
    print(dataset, img_shape)
    sample = next(iter(dataset))  # 采样
    print(sample.shape, tf.reduce_max(sample).numpy(),
          tf.reduce_min(sample).numpy())
    dataset = dataset.repeat(100)  # 重复循环
    db_iter = iter(dataset)

    generator = Generator()  # 创建生成器
    generator.build(input_shape=(4, z_dim))
    discriminator = Discriminator()  # 创建判别器
    discriminator.build(input_shape=(4, 64, 64, 3))
    # 分别为生成器和判别器创建优化器
    g_optimizer = keras.optimizers.Adam(learning_rate=learning_rate, beta_1=0.5)
    d_optimizer = keras.optimizers.Adam(learning_rate=learning_rate, beta_1=0.5)

    # generator.load_weights('generator.ckpt')
    # discriminator.load_weights('discriminator.ckpt')
    # print('Loaded chpt!!')

    d_losses, g_losses = [], []
    for epoch in range(epochs):  # 训练epochs次
        # 1. 训练判别器
        for _ in range(1):
            # 采样隐藏向量
            batch_z = tf.random.normal([batch_size, z_dim])
            batch_x = next(db_iter)  # 采样真实图片
            # 判别器前向计算
            with tf.GradientTape() as tape:
                d_loss = d_loss_fn(generator, discriminator, batch_z, batch_x, is_training)
            grads = tape.gradient(d_loss, discriminator.trainable_variables)
            d_optimizer.apply_gradients(zip(grads, discriminator.trainable_variables))
        # 2. 训练生成器
        # 采样隐藏向量
        batch_z = tf.random.normal([batch_size, z_dim])
        batch_x = next(db_iter)  # 采样真实图片
        # 生成器前向计算
        with tf.GradientTape() as tape:
            g_loss = g_loss_fn(generator, discriminator, batch_z, is_training)
        grads = tape.gradient(g_loss, generator.trainable_variables)
        g_optimizer.apply_gradients(zip(grads, generator.trainable_variables))

        if epoch % 100 == 0:
            print(epoch, 'd-loss:', float(d_loss), 'g-loss:', float(g_loss))
            # 可视化
            z = tf.random.normal([100, z_dim])
            fake_image = generator(z, training=False)
            img_path = os.path.join('GAN_images_test', 'gan-%d.png' % epoch)
            save_result(fake_image.numpy(), 10, img_path, color_mode='P')

            d_losses.append(float(d_loss))
            g_losses.append(float(g_loss))

            if epoch % 10000 == 1:
                # print(d_losses)
                # print(g_losses)
                generator.save_weights('generator.ckpt')
                discriminator.save_weights('discriminator.ckpt')


if __name__ == '__main__':
    main()
  • 3
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值