Python 深度学习 第12章 生成式深度学习
内容概要
第12章探讨了生成式深度学习的应用,包括文本生成、DeepDream、神经风格迁移、变分自编码器(VAE)和生成对抗网络(GAN)。通过这些技术,读者将了解如何使用深度学习进行艺术创作和内容生成。通过本章,读者将掌握如何使用深度学习技术进行创意任务,如文本生成、图像风格迁移和新图像的生成。
主要内容
-
文本生成
- 序列数据生成:使用循环神经网络(RNN)或Transformer模型预测序列中的下一个词或字符。
- 采样策略:通过调整softmax温度控制生成文本的随机性。
- 语言模型:训练模型以预测序列中的下一个词,然后通过采样生成新文本。
-
DeepDream
- 工作原理:通过反向运行卷积神经网络,最大化特定层的激活,生成具有艺术效果的图像。
- 实现步骤:使用预训练的InceptionV3模型,通过梯度上升优化输入图像以最大化特定层的激活。
-
神经风格迁移
- 内容和风格损失:通过预训练的卷积神经网络(如VGG19)提取图像的内容和风格特征。
- 优化过程:通过梯度下降最小化内容损失和风格损失,生成结合目标图像内容和参考图像风格的新图像。
-
变分自编码器(VAE)
- 工作原理:通过编码器将输入图像映射到潜在空间的分布参数,然后通过解码器重建图像。
- 采样层:在潜在空间中采样以生成新的图像。
- 应用:用于图像生成和编辑,如MNIST手写数字的生成。
-
生成对抗网络(GAN)
- 工作原理:由生成器和判别器组成,生成器生成图像,判别器区分真实图像和生成图像。
- 训练过程:通过对抗训练使生成器生成的图像逐渐接近真实图像的分布。
关键代码和算法
1.1 文本生成回调
class TextGenerator(keras.callbacks.Callback):
def __init__(self, prompt, generate_length, model_input_length, temperatures=(1.,), print_freq=1):
self.prompt = prompt
self.generate_length = generate_length
self.model_input_length = model_input_length
self.temperatures = temperatures
self.print_freq = print_freq
def on_epoch_end(self, epoch, logs=None):
if (epoch + 1) % self.print_freq != 0:
return
for temperature in self.temperatures:
print(f"== Generating with temperature {temperature}")
sentence = self.prompt
for i in range(self.generate_length):
tokenized_sentence = text_vectorization([sentence])
predictions = self.model(tokenized_sentence)
next_token = sample_next(predictions[0, i, :])
sampled_token = tokens_index[next_token]
sentence += " " + sampled_token
print(sentence)
1.2 DeepDream实现
def gradient_ascent_step(image, learning_rate):
with tf.GradientTape() as tape:
tape.watch(image)
loss = compute_loss(image)
grads = tape.gradient(loss, image)
grads = tf.math.l2_normalize(grads)
image += learning_rate * grads
return loss, image
def gradient_ascent_loop(image, iterations, learning_rate, max_loss=None):
for i in range(iterations):
loss, image = gradient_ascent_step(image, learning_rate)
if max_loss is not None and loss > max_loss:
break
print(f"... Loss value at step {i}: {loss:.2f}")
return image
1.3 神经风格迁移
def compute_loss(combination_image, base_image, style_reference_image):
input_tensor = tf.concat([base_image, style_reference_image, combination_image], axis=0)
features = feature_extractor(input_tensor)
loss = tf.zeros(shape=())
layer_features = features[content_layer_name]
base_image_features = layer_features[0, :, :, :]
combination_features = layer_features[2, :, :, :]
loss = loss + content_weight * content_loss(base_image_features, combination_features)
for layer_name in style_layer_names:
layer_features = features[layer_name]
style_reference_features = layer_features[1, :, :, :]
combination_features = layer_features[2, :, :, :]
style_loss_value = style_loss(style_reference_features, combination_features)
loss += (style_weight / len(style_layer_names)) * style_loss_value
loss += total_variation_weight * total_variation_loss(combination_image)
return loss
1.4 VAE实现
class VAE(keras.Model):
def __init__(self, encoder, decoder, **kwargs):
super().__init__(**kwargs)
self.encoder = encoder
self.decoder = decoder
self.sampler = Sampler()
self.total_loss_tracker = keras.metrics.Mean(name="total_loss")
self.reconstruction_loss_tracker = keras.metrics.Mean(name="reconstruction_loss")
self.kl_loss_tracker = keras.metrics.Mean(name="kl_loss")
def train_step(self, data):
with tf.GradientTape() as tape:
z_mean, z_log_var = self.encoder(data)
z = self.sampler(z_mean, z_log_var)
reconstruction = self.decoder(z)
reconstruction_loss = tf.reduce_mean(
tf.reduce_sum(
keras.losses.binary_crossentropy(data, reconstruction),
axis=(1, 2)
)
)
kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))
total_loss = reconstruction_loss + tf.reduce_mean(kl_loss)
grads = tape.gradient(total_loss, self.trainable_weights)
self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
self.total_loss_tracker.update_state(total_loss)
self.reconstruction_loss_tracker.update_state(reconstruction_loss)
self.kl_loss_tracker.update_state(kl_loss)
return {
"total_loss": self.total_loss_tracker.result(),
"reconstruction_loss": self.reconstruction_loss_tracker.result(),
"kl_loss": self.kl_loss_tracker.result(),
}
1.5 GAN实现
class GAN(keras.Model):
def __init__(self, discriminator, generator, latent_dim):
super().__init__()
self.discriminator = discriminator
self.generator = generator
self.latent_dim = latent_dim
self.d_loss_metric = keras.metrics.Mean(name="d_loss")
self.g_loss_metric = keras.metrics.Mean(name="g_loss")
def compile(self, d_optimizer, g_optimizer, loss_fn):
super().compile()
self.d_optimizer = d_optimizer
self.g_optimizer = g_optimizer
self.loss_fn = loss_fn
def train_step(self, real_images):
batch_size = tf.shape(real_images)[0]
random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))
generated_images = self.generator(random_latent_vectors)
combined_images = tf.concat([generated_images, real_images], axis=0)
labels = tf.concat([tf.ones((batch_size, 1)), tf.zeros((batch_size, 1))], axis=0)
labels += 0.05 * tf.random.uniform(tf.shape(labels))
with tf.GradientTape() as tape:
predictions = self.discriminator(combined_images)
d_loss = self.loss_fn(labels, predictions)
grads = tape.gradient(d_loss, self.discriminator.trainable_weights)
self.d_optimizer.apply_gradients(zip(grads, self.discriminator.trainable_weights))
random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))
misleading_labels = tf.zeros((batch_size, 1))
with tf.GradientTape() as tape:
predictions = self.discriminator(self.generator(random_latent_vectors))
g_loss = self.loss_fn(misleading_labels, predictions)
grads = tape.gradient(g_loss, self.generator.trainable_weights)
self.g_optimizer.apply_gradients(zip(grads, self.generator.trainable_weights))
self.d_loss_metric.update_state(d_loss)
self.g_loss_metric.update_state(g_loss)
return {"d_loss": self.d_loss_metric.result(), "g_loss": self.g_loss_metric.result()}
精彩语录
-
中文:生成式深度学习的潜力不仅限于被动任务和反应性任务,还扩展到创意活动。
英文原文:The potential of artificial intelligence to emulate human thought processes goes beyond passive tasks such as object recognition and mostly reactive tasks such as driving a car. It extends well into creative activities.
解释:这句话强调了生成式深度学习在创意领域的广泛应用。 -
中文:深度学习语言模型捕捉的是语言的统计结构,而非其根本意义。
英文原文:Language models are all form and no substance.
解释:这句话指出语言模型的局限性,强调其缺乏真正的语义理解。 -
中文:变分自编码器(VAE)通过学习连续的潜在空间,使得图像编辑成为可能。
英文原文:VAEs result in highly structured, continuous latent representations. For this reason, they work well for doing all sorts of image editing in latent space.
解释:这句话总结了VAE在图像编辑中的优势。 -
中文:生成对抗网络(GAN)通过对抗训练生成逼真的图像。
英文原文:GANs enable the generation of fairly realistic synthetic images by forcing the generated images to be statistically almost indistinguishable from real ones.
解释:这句话介绍了GAN的核心思想。 -
中文:GAN的训练是一个动态过程,需要平衡生成器和判别器的能力。
英文原文:Training a GAN is a dynamic process rather than a simple gradient descent process with a fixed loss landscape.
解释:这句话强调了GAN训练的复杂性和挑战性。
总结
通过本章的学习,读者将掌握生成式深度学习的核心技术,包括文本生成、DeepDream、神经风格迁移、VAE和GAN。这些技术为艺术创作和内容生成提供了强大的工具,展示了深度学习在创意领域的巨大潜力。