一、概述
style transfer是CNN的一个相当有趣的应用,它的特别之处在于训练对象是图片而不是权重。详细介绍可以参考下面两篇博客:
简单来说,现在有两张图片:content image和style image,我们需要做的就是找到一张图片,让它的内容(content)与content image越接近越好,画风(style)与style image越接近越好。衡量标准是content loss和style loss,二者的加权为total loss,我们需要训练神经网络,降低total loss。
这是Stanford的tensorflow课程的第二个大作业,主要有三个文件:
style_transfer.py:主要的文件,用来建立和训练模型。待完成。
load_vgg.py:载入已训练的vgg模型。待完成。
util.py:必要的基本功能。无需修改。
完整参考代码:style_transfer
二、实现
STEP 0: Initialize
def __init__(self, content_img, style_img, img_width, img_height):
'''
img_width and img_height are the dimensions we expect from the generated image.
We will resize input content image and input style image to match this dimension.
Feel free to alter any hyperparameter here and see how it affects your training.
'''
self.img_width = img_width
self.img_height = img_height
self.content_img = utils.get_resized_image(content_img, img_width, img_height)
self.style_img = utils.get_resized_image(style_img, img_width, img_height)
self.initial_img = utils.generate_noise_image(self.content_img, img_width, img_height)
###############################
## TO DO
## create global step (gstep) and hyperparameters for the model
self.content_layer = 'conv4_2'
self.style_layers = ['conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', 'conv5_1']
self.content_w = 0.01
self.style_w = 1
self.style_layer_w = [0.5, 1.0, 1.5, 3.0, 4.0]
self.gstep = tf.Variable(0, dtype=tf.int32,
trainable=False, name='global_step')
self.lr = 2.0
###############################
STEP 1:Define Interence
我们使用的是CNN模型,每层卷积层后跟着ReLU,并采用average pooling。
def conv2d_relu(self, prev_layer, layer_idx, layer_name):
""" Return the Conv2D layer with RELU using the weights,
biases from the VGG model at 'layer_idx'.
Don't forget to apply relu to the output from the convolution.
Inputs:
prev_layer: the output tensor from the previous layer
layer_idx: the index to current layer in vgg_layers
layer_name: the string that is the name of the current layer.
It's used to specify variable_scope.
Note that you first need to obtain W and b from from the corresponding VGG's layer
using the function _weights() defined above.
W and b returned from _weights() are numpy arrays, so you have
to convert them to TF tensors. One way to do it is with tf.constant.
Hint for choosing strides size:
for small images, you probably don't want to skip any pixel
"""
###############################
## TO DO
with tf.variable_scope(layer_name) as scope:
W, b = self._weights(layer_idx, layer_name)
W = tf.constant(W, name='weights')
b = tf.constant(b, name='bias')
conv2d = tf.nn.conv2d(prev_layer,
filter=W,
strides=[1, 1, 1, 1],
padding='SAME')
out = tf.nn.relu(conv2d + b)
###############################
setattr(self, layer_name, out)
在这里我们读取VGG模型已训练好的权重和偏差,注意此处它们不是我们的训练目标,设为常量。在conv2d中,步长设为常用的[1,1,1,1],padding设置为'SAME',即使用zero padding。关于'SAME'和'VAILD'的区别与计算方法参考tensorflow中的padding操作。
def avgpool(self, prev_layer, layer_name):
""" Return the average pooling layer. The paper suggests that
average pooling works better than max pooling.
Input:
prev_layer: the output tensor from the previous layer
layer_name: the string that you want to name the layer.
It's used to specify variable_scope.
Hint for choosing strides and kszie: choose what you feel appropriate
"""
###############################
## TO DO
with tf.variable_scope(layer_name):
out = tf.nn.avg_pool(prev_layer,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='SAME')
###############################
setattr(self, layer_name, out)
ksize(池化窗口大小)和strides(窗口移动步长)均设为[1,2,2,1]。
STEP 2: Create Loss functions
content loss的计算公式为:
content loss基本上是F与P的均方误差,其中F是生成图片的特征图层,P是内容图片的特征图层,论文中建议使用'conv4_2'的特征图层。此外,我们用1/(4s)替代1/2作为系数,s是P三个维度的乘积。
def _content_loss(self, P, F):
''' Calculate the loss between the feature representation of the
content image and the generated image.
Inputs:
P: content representation of the content image
F: content representation of the generated image
Read the assignment handout for more details
Note: Don't use the coefficient 0.5 as defined in the paper.
Use the coefficient defined in the assignment handout.
'''
# self.content_loss = None
###############################
## TO DO
self.content_loss = tf.reduce_sum((F - P) ** 2) / (4.0 * P.size)
style loss的计算公式为:
N是特征图层的第三个维度,M是前两个维度的乘积。需要注意的是,在tensorflow中,为了使用tf.nn.conv2d,我们额外增加了一维变为了四维图,所以第三个维度实际上是第四个维度。
G和A分别是生成图和原图的Gram Matrix。为了得到Gram Matrix,我们首先需要得到特定层产生的特征图层,然后将它变为M*N的二维张量,再与其转置相乘。我们将使用下列建议层的特征图层:
['conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', 'conv5_1']
得到了每一层的结果后,我们将其按权重相加即得到了最终的style loss。建议越深层的结果权重越大。
def _gram_matrix(self, F, N, M):
""" Create and return the gram matrix for tensor F
Hint: you'll first have to reshape F
"""
###############################
## TO DO
F = tf.reshape(F, (M, N))
return tf.matmul(tf.transpose(F), F)
###############################
def _single_style_loss(self, a, g):
""" Calculate the style loss at a certain layer
Inputs:
a is the feature representation of the style image at that layer
g is the feature representation of the generated image at that layer
Output:
the style loss at a certain layer (which is E_l in the paper)
Hint: 1. you'll have to use the function _gram_matrix()
2. we'll use the same coefficient for style loss as in the paper
3. a and g are feature representation, not gram matrices
"""
###############################
## TO DO
N = a.shape[3] # number of filters
M = a.shape[1] * a.shape[2] # height times width of the feature map
A = self._gram_matrix(a, N, M)
G = self._gram_matrix(g, N, M)
return tf.reduce_sum((G - A) ** 2 / ((2 * N * M) ** 2))
###############################
def _style_loss(self, A):
""" Calculate the total style loss as a weighted sum
of style losses at all style layers
Hint: you'll have to use _single_style_loss()
"""
n_layers = len(A)
E = [self._single_style_loss(A[i], getattr(self.vgg, self.style_layers[i])) for i in range(n_layers)]
###############################
## TO DO
self.style_loss = sum([self.style_layer_w[i] * E[i] for i in range(n_layers)])
###############################
最终我们得到total loss:
其中alpha/beta可以取0.001,0.0001,取1/20,1/50的结果表现也不错。
def losses(self):
with tf.variable_scope('losses') as scope:
with tf.Session() as sess:
# assign content image to the input variable
sess.run(self.input_img.assign(self.content_img))
gen_img_content = getattr(self.vgg, self.content_layer)
content_img_content = sess.run(gen_img_content)
self._content_loss(content_img_content, gen_img_content)
with tf.Session() as sess:
sess.run(self.input_img.assign(self.style_img))
style_layers = sess.run([getattr(self.vgg, layer) for layer in self.style_layers])
self._style_loss(style_layers)
##########################################
## TO DO: create total loss.
## Hint: don't forget the weights for the content loss and style loss
self.total_loss = self.content_w * self.content_loss + self.style_w * self.style_loss
##########################################
STEP 3: Create Optimizer
def optimize(self):
###############################
## TO DO: create optimizer
self.opt = tf.train.AdamOptimizer(self.lr).minimize(self.total_loss,
global_step=self.gstep)
###############################
STEP 4: Create Summary
def create_summary(self):
###############################
## TO DO: create summaries for all the losses
## Hint: don't forget to merge them
with tf.name_scope('summaries'):
tf.summary.scalar('content loss', self.content_loss)
tf.summary.scalar('style loss', self.style_loss)
tf.summary.scalar('total loss', self.total_loss)
self.summary_op = tf.summary.merge_all()
###############################
STEP 5: Train Your Model
def train(self, n_iters):
skip_step = 1
with tf.Session() as sess:
###############################
## TO DO:
## 1. initialize your variables
## 2. create writer to write your graph
sess.run(tf.global_variables_initializer())
writer = tf.summary.FileWriter('graphs/style_stranfer', sess.graph)
###############################
sess.run(self.input_img.assign(self.initial_img))
###############################
## TO DO:
## 1. create a saver object
## 2. check if a checkpoint exists, restore the variables
saver = tf.train.Saver()
ckpt = tf.train.get_checkpoint_state(os.path.dirname('checkpoints/style_transfer/checkpoint'))
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
##############################
initial_step = self.gstep.eval()
start_time = time.time()
for index in range(initial_step, n_iters):
if index >= 5 and index < 20:
skip_step = 10
elif index >= 20:
skip_step = 20
sess.run(self.opt)
if (index + 1) % skip_step == 0:
###############################
## TO DO: obtain generated image, loss, and summary
gen_image, total_loss, summary = sess.run([self.input_img,
self.total_loss,
self.summary_op])
###############################
# add back the mean pixels we subtracted before
gen_image = gen_image + self.vgg.mean_pixels
writer.add_summary(summary, global_step=index)
print('Step {}\n Sum: {:5.1f}'.format(index + 1, np.sum(gen_image)))
print(' Loss: {:5.1f}'.format(total_loss))
print(' Took: {} seconds'.format(time.time() - start_time))
start_time = time.time()
filename = 'outputs/%d.png' % (index)
utils.save_image(filename, gen_image)
if (index + 1) % 20 == 0:
###############################
## TO DO: save the variables into a checkpoint
saver.save(sess, 'checkpoints/style_stranfer/style_transfer', index)
###############################