文章的目的是对于给定的风格化模板,将其风格传输到输入图像中,风格传输采用的VGG的1-5个conv层,网络具体结构如下:
图中,底端图片从左到右分别为风格图像(style image),风格转换结果图像(result image),未风格转换的图像(content image),左边的网络的目的是使得style image 与result image 的风格特征尽量相似,这里用conv层的feature map的相关性表示风格特征,从而可以得到每个conv层的风格转换损失函数为:
EL(a,x)=14N2lM2l(Glij−Alij)
右边的网络为使得result image 与content image的特征尽量相似,即两张图像的每个conv层的feaure map 尽量相似,因此每层的特征转换损失函数为:
lcontent(l)=12∑i,j(Flij−Plij)2
文章共用到了vgg的conv1-5层,将5层的loss_style和loss_content相加则可以得到总的损失函数:
ltotal=α∗lstyle+β∗lcontent
网络的训练和求解result image:
一般的CNN优化,都是通过有监督训练去优化网络的每层的w、b等参数,而本文额优化的参数不再是网络的w和b,而是初始result image 图像为一张噪声图片x(也可以初始化为content image) ,并通过梯度下降法求解使得损失函数最小的result image ,作为最终的风格转换结果图像。
看完这篇文献,感觉需要重点理解的就是:
可以用feature map的相关性表示图像的风格特征;
在CNN训练时,优化的参数不再是网络的w和b,优化损失函数的同时,可以更新得到输出图像;
讲完原理,要理解代码就很容易了,在此仅贴出文献主要代码以便进一步理解:
https://github.com/woodrush/neural-art-tf/blob/master/neural-art-tf.py
import tensorflow as tf
import numpy as np
from models import VGG16, I2V
from utils import read_image, save_image, parseArgs, getModel, add_mean
import argparse
import time
content_image_path, style_image_path, params_path, modeltype, width, alpha, beta, num_iters, device, args = parseArgs()
# The actual calculation
print "Read images..."
content_image = read_image(content_image_path, width)
style_image = read_image(style_image_path, width)
g = tf.Graph()
with g.device(device), g.as_default(), tf.Session(graph=g, config=tf.ConfigProto(allow_soft_placement=True)) as sess:
print "Load content values..."
image = tf.constant(content_image)# 得到content image的feature map
model = getModel(image, params_path, modeltype)
content_image_y_val = [sess.run(y_l) for y_l in model.y()] # sess.run(y_l) is a constant numpy array
print "Load style values..."
image = tf.constant(style_image)
model = getModel(image, params_path, modeltype)
y = model.y()# stlye image的feature map
style_image_st_val = []
for l in range(len(y)):
num_filters = content_image_y_val[l].shape[3]
st_shape = [-1, num_filters]
st_ = tf.reshape(y[l], st_shape)
st = tf.matmul(tf.transpose(st_), st_)#计算stlye image的每一层的风格特征
style_image_st_val.append(sess.run(st)) # sess.run(st) is a constant numpy array
print "Construct graph..."
# Start from white noise
# gen_image = tf.Variable(tf.truncated_normal(content_image.shape, stddev=20), trainable=True, name='gen_image')
# Start from the original image
gen_image = tf.Variable(tf.constant(np.array(content_image, dtype=np.float32)), trainable=True, name='gen_image')#初始化风格转换图像为content image
model = getModel(gen_image, params_path, modeltype)
y = model.y()
L_content = 0.0
L_style = 0.0
for l in range(len(y)):
# Content loss
L_content += model.alpha[l]*tf.nn.l2_loss(y[l] - content_image_y_val[l])#result image与content image的特征损失函数
# Style loss
num_filters = content_image_y_val[l].shape[3]
st_shape = [-1, num_filters]
st_ = tf.reshape(y[l], st_shape)
st = tf.matmul(tf.transpose(st_), st_)#计算result image的每一层的风格特征
N = np.prod(content_image_y_val[l].shape).astype(np.float32)
L_style += model.beta[l]*tf.nn.l2_loss(st - style_image_st_val[l])/N**2/len(y)#result image 与 style image的风格损失函数
# The loss
L = alpha* L_content + beta * L_style#总的损失函数
# The optimizer,梯度下降法优化result image
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(learning_rate=2.0, global_step=global_step, decay_steps=100, decay_rate=0.94, staircase=True)
train_step = tf.train.AdamOptimizer(learning_rate).minimize(L, global_step=global_step)
# A more simple optimizer
# train_step = tf.train.AdamOptimizer(learning_rate=2.0).minimize(L)
print "Start calculation..."
# The optimizer has variables that require initialization as well
sess.run(tf.initialize_all_variables())
for i in range(num_iters):
if i % 10 == 0:
gen_image_val = sess.run(gen_image)
save_image(gen_image_val, i, args.out_dir)
print "L_content, L_style:", sess.run(L_content), sess.run(L_style)
print "Iter:", i
sess.run(train_step)