Notes
keras 在 github 的网页有 VAE 的例子,借此例研究 keras 模型存取过程。
- 将 encoder 和 decoder 分写成两个类(各有其 Input 层),以期解耦和易于复用(在另一个文件重新加载模型);
- 训练时,要将 encoder 和 decoder 拼在一起组成完整的 VAE 作整体训练,验证这种写法梯度能否正确回传(因为我理解 keras 的
Input
层跟 tensorflow 的placeholder
类似,但 placeholder 好像会切断计算图?那在 decoder 的 Input 处会不会断开?) - VAE 和 encoder 共享 Input 层,故 encoder 类提供函数返回其 Input 层;
- 曾想 encoder 和 decoder 类内就不调
Model
函数建立模型、不写 Input 层了,直接将收到的 x 或 z 传给类内的 Dense 层,但遇到问题:Dense
层(其它层也是)接收的要是 tensor,而不能是 numpy.ndarray,但mnist.load_data()
加载的数据就是 ndarray;
试过用tensorflow.convert_to_tensor
将 ndarray 转成 tensor,在后面复用 decoder 生成图片传给 matplotlib 画图时出问题: Dense 层的输出也是 tensor,而 matplotlib 画图要 ndarray…
最终还是在类内调了 Model 函数,生成图片时用其predict
函数,输出的就是 ndarray; - 没有将 VAE 写成一个类,存模型时 encoder 和 decoder 分别存;
- 保存模型:
model.save(路径)
,加载模型:model = load_model(路径)
;
Code
一般模型抽象
import os
SAVE_P = ...
class _Model:
def __init__(self):
self.model = None
def __call__(self, inputs):
return self.model(inputs)
def predict(self, inputs):
return self.model.predict(inputs)
def save_weights(self, f_name):
self.model.save_weights(os.path.join(SAVE_P, f_name))
def load_weights(self, f_name):
self.model.load_weights(os.path.join(SAVE_P, f_name))
VAE例子
# -*- coding: utf8 -*-
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
set_session(tf.Session(config=config))
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
import keras
import keras.backend as K
from keras.datasets import mnist, fashion_mnist
from keras.losses import binary_crossentropy
from keras.layers import Input, Lambda, Dense
from keras.models import Model, load_model
class Encoder:
""" encoder """
def __init__(self, x_dim, h_dim, z_dim):
self.x_in = Input([x_dim])
h = Dense(h_dim, activation='relu')(self.x_in)
self.z_mean = Dense(z_dim)(h)
self.z_logvar = Dense(z_dim)(h)
def reparameterize(args):
mean, logvar = args
eps = K.random_normal(
[K.shape(mean)[0], Z_DIM], mean=0.0, stddev=1.0)
return mean + eps * K.exp(logvar / 2.)
self.z = Lambda(reparameterize, output_shape=[z_dim])(
[self.z_mean, self.z_logvar])
self.model = Model(
self.x_in, [self.z_mean, self.z_logvar, self.z], name='VAE_encoder')
def __call__(self, x_in):
""" x -> z
x_in 是 tensor,输出也是
"""
return self.model(x_in)
def input(self):
""" 返回输入层 """
return self.x_in
class Decoder:
""" decoder """
def __init__(self, x_dim, h_dim, z_dim):
self.z_in = Input([z_dim])
h_hat = Dense(h_dim, activation='relu')(self.z_in)
self.x_hat = Dense(x_dim, activation='sigmoid')(h_hat)
self.model = Model(self.z_in, self.x_hat, name='VAE_decoder')
def __call__(self, z_in):
""" z -> x^
z_in 是 tensor,输出也是
"""
return self.model(z_in)
def predict(self, z_in):
""" z -> x^
z_in 是 ndarray,输出也是
"""
return self.model.predict(z_in)
def save(self, path):
""" 保存模型 """
self.model.save(path)
def load_model(self, path):
""" 加载模型 """
# if self.model is not None:
# del self.model
self.model = load_model(path)
def show(name, G):
""" 生成图片看效果 """
PIXEL = 28
N_PICT = 30
grid_x = norm.ppf(np.linspace(0.05, 0.95, N_PICT))
grid_y = grid_x
figure = np.zeros([N_PICT * PIXEL, N_PICT * PIXEL])
for i, xi in enumerate(grid_x):
for j, yj in enumerate(grid_y):
noise = np.array([[xi, yj]]) # 必须秩为 2,两层中括号
x_gen = G.predict(noise)
x_gen = x_gen[0].reshape([PIXEL, PIXEL])
figure[i * PIXEL: (i+1) * PIXEL,
j * PIXEL: (j+1) * PIXEL] = x_gen
fig = plt.figure(figsize=(10, 10))
plt.imshow(figure, cmap='Greys_r')
fig.savefig(name)
plt.show()
BATCH = 128
N_CLASS = 10
EPOCH = 25
X_DIM = 28 * 28
H_DIM = 128
Z_DIM = 2
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], -1).astype('float32') / 255.
x_test = x_test.reshape(x_test.shape[0], -1).astype('float32') / 255.
enc = Encoder(X_DIM, H_DIM, Z_DIM)
dec = Decoder(X_DIM, H_DIM, Z_DIM)
# VAE
vae_in = enc.input() # 同 encoder 共享 Input 层
z_mean, z_logvar, z = enc(vae_in)
vae_out = dec(z) # 实验说明这里好像将计算图连上了!虽然 dec 有 Input 层
vae = Model(vae_in, vae_out, name='VAE') # 拼成完整 VAE
# KL = -1/2 * sum{ 1 + log(var) - mean^2 - var }
# sum along the dimention of z vector
loss_kl = 0.5 * K.sum(K.square(z_mean) +
K.exp(z_logvar) - 1. - z_logvar, axis=1)
loss_recon = binary_crossentropy(vae_in, vae_out) * X_DIM
loss_vae = K.mean(loss_kl + loss_recon)
vae.add_loss(loss_vae)
vae.compile(optimizer='rmsprop')
vae.summary()
# 训练前 -> 生成图片是乱来的
show('./before.png', dec)
vae.fit(x_train, # y_train, # 不能传 y_train
batch_size=BATCH,
epochs=EPOCH,
verbose=1,
validation_data=(x_test, None))
# 保存模型
dec.save('./decoder.h5')
# with open('./decoder.json', 'w') as f:
# f.write(dec.model.to_json())
# with open('./decoder.yaml', 'w') as f:
# f.write(dec.model.to_yaml())
# 删模型
# (假装要转去另一个训练文件)
del dec
# (假装这是另一个文件)
# 重建 decoder
dec = Decoder(X_DIM, H_DIM, Z_DIM)
# 此时复用 decoder 的模型和参数
dec.load_model('./decoder.h5')
# 训练后 -> 生成图片有意义
show('./after.png', dec)
Conclusion
- 看计算图是否正确还可以借助可视化方式,如 TensorBoard,keras 也有
keras.utils.plot_model
(官网例子中有用到); - 例子中虽然 decoder 有 Input 层,但实验结果表明训练正常,梯度能回传,计算图是连贯的,说明 Input 层和 placeholder 是不一样的:
新建 Input 层返回的是 tf.Tensor! 我现在理解是(猜的
):如果两层之间传的是 tensor,则图是连贯的,梯度在此处能回传,而传 ndarray 的(如 placeholder)就会断开,回传不了梯度;会不会是因为框架要用 Tensor 类才能完成自动求导之类的?瞎猜的
Future Work
将 VAE 扩展成 Conditional VAE,此时 decoder 的输入中 condition 的输入亦要与 cVAE 整体模型共享
Reference
如何保存Keras模型
keras/examples/variational_autoencoder.py
→
\rightarrow
→ 官网例子,可对比着看
keras预训练模型应用(3):VGG19提取任意层特征
→
\rightarrow
→ 参考共享输入层的写法
nnormandin/Conditional_VAE
CSDN Markdown简明教程4-UML图
markdown - 画图