这周的主要任务是先实现论文中使用的方法。
使用的是微软的COCO数据集,大小为17G.
首先来构建训练网络。
1. 首先利用yml文件 来进行输入,使用yml文件可以实现快速输入数据。在yml文件中要加入自己想要迁移风格的style图片的信息,包括图片的位置,大小等:
style_image: img/shuimo.jpg
naming: "shuimo"
model_path: models
2.按照论文中的方法利用了VGG16作为损失网络。通过输入traing的图片,按照论文里的方法经过卷积之后,在4_1层得到可以最好表示风格的输出,构建这样一个网络,如下。
with tf.variable_scope('conv1'):
conv1 = relu(instance_norm(conv2d(image, 3, 32, 9, 1)))
with tf.variable_scope('conv2'):
conv2 = relu(instance_norm(conv2d(conv1, 32, 64, 3, 2)))
with tf.variable_scope('conv3'):
conv3 = relu(instance_norm(conv2d(conv2, 64, 128, 3, 2)))
with tf.variable_scope('res1'):
res1 = residual(conv3, 128, 3, 1)
with tf.variable_scope('res2'):
res2 = residual(res1, 128, 3, 1)
with tf.variable_scope('res3'):
res3 = residual(res2, 128, 3, 1)
with tf.variable_scope('res4'):
res4 = residual(res3, 128, 3, 1)
with tf.variable_scope('res5'):
res5 = residual(res4, 128, 3, 1)
# print(res5.get_shape())
with tf.variable_scope('deconv1'):
# deconv1 = relu(instance_norm(conv2d_transpose(res5, 128, 64, 3, 2)))
deconv1 = relu(instance_norm(resize_conv2d(res5, 128, 64, 3, 2, training)))
with tf.variable_scope('deconv2'):
# deconv2 = relu(instance_norm(conv2d_transpose(deconv1, 64, 32, 3, 2)))
deconv2 = relu(instance_norm(resize_conv2d(deconv1, 64, 32, 3, 2, training)))
with tf.variable_scope('deconv3'):
# deconv_test = relu(instance_norm(conv2d(deconv2, 32, 32, 2, 1)))
deconv3 = tf.nn.tanh(instance_norm(conv2d(deconv2, 32, 3, 9, 1)))
y = (deconv3 + 1) * 127.5
# Remove border effect reducing padding.
height = tf.shape(y)[1]
width = tf.shape(y)[2]
y = tf.slice(y, [0, 10, 10, 0], tf.stack([-1, height - 20, width - 20, -1]))
3.激活函数选择了Relu:
def relu(input):
relu = tf.nn.relu(input)
nan_to_zero = tf.where(tf.equal(relu, relu), relu, tf.zeros_like(relu))
return nan_to_zero
4.batch大小选择了50,没进行50次更新一次网络中的权重和偏置。
beta = tf.Variable(tf.zeros([size]), name='beta')
scale = tf.Variable(tf.ones([size]), name='scale')
pop_mean = tf.Variable(tf.zeros([size]))
pop_var = tf.Variable(tf.ones([size]))
epsilon = 1e-3
batch_mean, batch_var = tf.nn.moments(x, [0, 1, 2])
train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
train_var = tf.assign(pop_var, pop_var * decay + batch_var * (1 - decay))
5.接下来定义了style图片的损失函数和内容图片的损失函数,由于采用了快速风格迁移的方法,因此使用了分开计算:
def style_loss(endpoints_dict, style_features_t, style_layers):
style_loss = 0
style_loss_summary = {}
for style_gram, layer in zip(style_features_t, style_layers):
generated_images, _ = tf.split(endpoints_dict[layer], 2, 0)
size = tf.size(generated_images)
layer_style_loss = tf.nn.l2_loss(gram(generated_images) - style_gram) * 2 / tf.to_float(size)
style_loss_summary[layer] = layer_style_loss
style_loss += layer_style_loss
return style_loss, style_loss_summary
def content_loss(endpoints_dict, content_layers):
content_loss = 0
for layer in content_layers:
generated_images, content_images = tf.split(endpoints_dict[layer], 2, 0)
size = tf.size(generated_images)
content_loss += tf.nn.l2_loss(generated_images - content_images) * 2 / tf.to_float(size) # remain the same as in the paper
return content_loss
6.按照论文中的方法来对两个损失函数一起进行约束,
def total_variation_loss(layer):
shape = tf.shape(layer)
height = shape[1]
width = shape[2]
y = tf.slice(layer, [0, 0, 0, 0], tf.stack([-1, height - 1, -1, -1])) - tf.slice(layer, [0, 1, 0, 0], [-1, -1, -1, -1])
x = tf.slice(layer, [0, 0, 0, 0], tf.stack([-1, -1, width - 1, -1])) - tf.slice(layer, [0, 0, 1, 0], [-1, -1, -1, -1])
loss = tf.nn.l2_loss(x) / tf.to_float(tf.size(x)) + tf.nn.l2_loss(y) / tf.to_float(tf.size(y))
return loss
7.网络基本构建完成,接下来就需要传训练集图片的数据,因为训练集中的图片都有编号,因此可以直接顺次读取相应的图片。利用tensorflow内置的decode_png方法实现对图片的读取。
def get_image(path, height, width, preprocess_fn):
png = path.lower().endswith('png')
img_bytes = tf.read_file(path)
image = tf.image.decode_png(img_bytes, channels=3) if png else tf.image.decode_jpeg(img_bytes, channels=3)
return preprocess_fn(image, height, width)
这里测试了一下,训练是没有问题的。接下来就要看结果了。