ML、DL、CNN学习记录5
- VAE非常适合用于学习具有良好结构的潜在空间,其中特定方向表示数据中有意义的变化轴。
- GAN生成的图像可能非常逼真,但它的潜在空间可能没有良好结构,也没有足够的连续性。
VAE
Variational Autoencoder
图像生成:
- 图像生成的关键思想就是找到一个低维的表示潜在空间(latent space, 也是个向量空间),其中任意向量都可以被映射为一张图像。
- 能够实现这种映射的模块,即以潜在点作为输入并输出一-张图像 (像素网格),叫作生成器(generator, 对于GAN而言)或解码器(decoder, 对于VAE而言)。
- 一旦找到了这样的潜在空间,就可以从中有目的地或随机地对点进行采样,并将其映射到图像空间,从而生成新的图像。
- 一般的
Autoencoder
就是一个多层网络,中间的特征是一个固定的向量值。而VAE中间的特征是一种分布。 - 换句话说就是:
VAE
不是将输入图像压缩成潜在空间中的固定编码,而是将图像转换为统计分布的参数即平均值和方差。
图片编码之后,具体的图片编码分别使用两个向量进行表示:均值向量,标准差向量。
使用网络的时候的流程表示。
VAE 工作流程
VAE本质结构:
损失函数:
L ( x , x ‾ ) + ∑ j K L ( q j ( z ∣ x ) ∣ ∣ p ( z ) ) L(x, \overline x ) + \sum_j KL(q_j(z|x)||p(z)) L(x,x)+j∑KL(qj(z∣x)∣∣p(z))
q
j
q_j
qj:真实已知的分布
p
p
p:预测的分布
KL散度,衡量两个分布的距离
L
(
x
,
x
‾
)
L(x, \overline x )
L(x,x):重构损失
∑
j
K
L
(
q
j
(
z
∣
x
)
∣
∣
p
(
z
)
)
\sum_j KL(q_j(z|x)||p(z))
∑jKL(qj(z∣x)∣∣p(z)):正则项
VAE的参数通过两个损失函数来进行训练:
- 一个是重构损失(reconstruction loss),它使解码后的样本匹配初始输入;
- 另一个是正则化损失(regularizatinloss) ,它本质上就是在我们常规的自编码器的基础上,对encoder的结果(在VAE中对应着计算均值的网络)加上了“高斯噪声”,使得结果decoder能够对噪声有鲁棒性;而那个额外的KL loss目的是让均值为0,方差为1。
VAE’s Detail
使用mnist数据集,进行VAE
code+explain
# coding: utf-8
# # 输入数据进行编码
# # Encode the input into a mean and variance parameter
# z_mean, z_log_variance = encoder(input_img)
# # 做作为解码
# # Draw a latent point using a small random epsilon
# z = z_mean + exp(z_log_variance) * epsilon
# # Then decode z back to an image
# reconstructed_img = decoder(z)
# # 构建模型
# # Instantiate a model
# model = Model(input_img, reconstructed_img)
# # Then train the model using 2 losses:
# # a reconstruction loss and a regularization loss
import keras
import tensorflow
# 数据集合
from keras.datasets import mnist
from keras import layers
from keras.layers import Conv2D, Flatten, Dense, Lambda, Reshape, Conv2DTranspose
# https://keras.io/zh/backend/
from keras import backend as K
# 函数式模型
from keras.models import Model
import numpy as np
import matplotlib.pyplot as plt
#
from scipy.stats import norm
# 若tensorflow是2.0+的版本,不加这个会报错
tensorflow.compat.v1.disable_eager_execution()
# ————————————————————————————————————————————————————
# 解码器建立过程
# ————————————————————————————————————————————————————
# 使用iris的数据, 28*28*1的图片
img_shape = (28, 28, 1)
# 一次处理 16 个数据
batch_size = 16
latent_dim = 2 # Dimensionality of the latent space: a plane
# ========================================================================
# 均值、方差生成
# ========================================================================
# 作为输入的神经元
input_img = keras.Input(shape=img_shape)
# 卷积
x = Conv2D(32, 3, padding='same', activation='relu')(input_img)
# 步长为2 padding=‘same’ , 则 28*28 ==> 14*14
x = Conv2D(64, 3, padding='same', activation='relu', strides=(2, 2))(x)
x = Conv2D(64, 3, padding='same', activation='relu')(x)
x = Conv2D(64, 3, padding='same', activation='relu')(x)
# from keras import backend as K
# 获取x的大小 x.shape = (?, 14, 14, 64)
# shape_before_flattening = (None, 14, 14, 64)
# Returns the shape of tensor or variable as a tuple of int or None entries
shape_before_flattening = K.int_shape(x) # x.shape
x = Flatten()(x)
# 降维,64 -> 32
x = Dense(32, activation='relu')(x)
# 降维,32 -> 2 , 均值
z_mean = Dense(latent_dim)(x)
# 降维,32 -> 2 , 方差
z_log_var = Dense(latent_dim)(x)
def sampling(args):
z_mean, z_log_var = args
# 随机采样
epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim), mean=0., stddev=1.)
#
return z_mean + K.exp(z_log_var) * epsilon
# 将某一个函数封装为模型的一层
# 将sampling函数作为一层
# 采样的结果
z = Lambda(sampling)([z_mean, z_log_var])
# ========================================================================
# 解码器
# ========================================================================
# z作为 解码器的输入(z被认为是编码之后的向量)
# K.int_shape(z)[1:] == 2 ,【均值,方差】
decoder_input = layers.Input(K.int_shape(z)[1:])
# shape_before_flattening = (None, 14, 14, 64)
# 升维, 2 -> 14*14*64 = 12544
x = Dense(np.prod(shape_before_flattening[1:]), activation='relu')(decoder_input)
# reshape 为 14*14*64
x = Reshape(shape_before_flattening[1:])(x)
# 反卷积(转置卷积)
# 14*14*64 -> 28*28*32
x = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(x)
# 卷积, 28*28*1
x = Conv2D(1, 3, padding='same', activation='sigmoid')(x)
# 函数式模型,之前用的是 Squential 模型
# (输入, 模型配置)
decoder = Model(decoder_input, x)
# 将采样结果 放入解码器
z_decoded = decoder(z)
# ========================================================================
# 交叉熵
# ========================================================================
class CustomVariationalLayer(layers.Layer):
# 损失值计算(交叉熵)
def vae_loss(self, x, z_decoded):
x = K.flatten(x)
z_decoded = K.flatten(z_decoded)
xent_loss = keras.metrics.binary_crossentropy(x, z_decoded)
# KL 散度推导而来
kl_loss = -5e-4 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
# 总损失的均值
return K.mean(xent_loss + kl_loss)
def call(self, inputs):
x = inputs[0]
z_decoded = inputs[1]
loss = self.vae_loss(x, z_decoded)
self.add_loss(loss, inputs=inputs)
return x
# 整个模型的输入和输出
# 输入,解码器的输出结果
y = CustomVariationalLayer()([input_img, z_decoded])
# y就是损失,函数式模型
vae = Model(input_img, y)
# 手动给了损失,所以loss设置为None
vae.compile(optimizer='rmsprop', loss=None)
vae.summary()
# ========================================================================
# 训练模型
# ========================================================================
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_train = x_train.reshape(x_train.shape + (1,))
x_test = x_test.astype('float32') / 255.
x_test = x_test.reshape(x_test.shape + (1,))
vae.fit(x=x_train, y=None, shuffle=True, epochs=10, batch_size=batch_size, validation_data=(x_test, None))
# ========================================================================
# 展示
# ========================================================================
n = 15
digit_size = 28
# 生成图像的大小
figure = np.zeros((digit_size * n, digit_size * n))
# norm.ppf(x) 这里x是概率,整个函数的作用是找到正态分布中分布函数为x时对应的x轴的点(是求积分的反向操作)
# :正态分布在x处的概率密度函数的函数值f(x)
# 在这儿是将他们作为 采样结果 放入解码器进行预测,从而得出解码的结果
# https://blog.csdn.net/kudou1994/article/details/94012482?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param
grid_x = norm.ppf(np.linspace(0.05, 0.95, n))
grid_y = norm.ppf(np.linspace(0.05, 0.95, n))
for i, yi in enumerate(grid_x):
for j, xi in enumerate(grid_y):
# 样本
z_sample = np.array([[xi, yi]])
# 平铺函数 np.tile([1 2 3]], 3) = [1 2 3 1 2 3 1 2 3]
z_sample = np.tile(z_sample, batch_size).reshape(batch_size, 2)
# 进行预测
x_decoded = decoder.predict(z_sample, batch_size=batch_size)
# 取出图像
digit = x_decoded[0].reshape(digit_size, digit_size)
# 图像显示
figure[i * digit_size: (i + 1) * digit_size, j * digit_size: (j + 1) * digit_size] = digit
# 画图
plt.figure(figsize=(10, 10))
plt.imshow(figure, cmap='Greys_r')
plt.show()
Model
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 28, 28, 1)] 0
__________________________________________________________________________________________________
conv2d (Conv2D) (None, 28, 28, 32) 320 input_1[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 14, 14, 64) 18496 conv2d[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 14, 14, 64) 36928 conv2d_1[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 14, 14, 64) 36928 conv2d_2[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 12544) 0 conv2d_3[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 32) 401440 flatten[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 2) 66 dense[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 2) 66 dense[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 2) 0 dense_1[0][0]
dense_2[0][0]
__________________________________________________________________________________________________
model (Model) (None, 28, 28, 1) 56385 lambda[0][0]
__________________________________________________________________________________________________
custom_variational_layer (Custo (None, 28, 28, 1) 0 input_1[0][0]
model[1][0]
==================================================================================================
Total params: 550,629
Trainable params: 550,629
Non-trainable params: 0
Output
GAN
GAN(Generative Adversarial Network)
到目前为止,GAN主要应用于图像生成、人脸变换、 生成高质量图像、场景生成、半监督建模、图像混合、图像修复、RelD、 超分辨率重建、遮挡剔除、语义分割、目标检测、特征点检测、视频预测与合成、纹理与风格转换等。
GAN原理
一丢丢修改:
一般在经典的深度模型得到一个基准,然后使用GAN进行进一步的提升。
CRNN(多用于文字识别)
CNN(特征提取) -> RNN (分片处理)