前言
关于深度学习基本原理的文章接近尾声了,最后一个课题是图像风格迁移,关于图像风格迁移,个人感觉很有趣,算法理论比较简单,但任何算法想要做的更加的精细都是不容易的,因此本片文章仅仅做一个小小的展示。
算法流程简介
算法的执行过程为:
- 先给定一张内容图和一张风格图,然后是输入一张含有随机像素的一张图片。
- 三张图片经过VGG16网络提取特征图
- 通过计算两两特征图之间的损失值,对损失值进行优化,可以迭代出一个即含有content image的内容,同时具有style image风格的图片。
用公式表示为:
F e a t u r e m a p = V G G 16 ( s t y l e i m a g e , c o n t e n t i m a g e , r a n d o m i m a g e ) Feature \space map = VGG16(style\space image, content\space image, random \space image) Feature map=VGG16(style image,content image,random image)
当然这里可以是用别的网络结构提取图像特征。
计算风格损失:
L o s s s t y l e = L s t y l e ( F e a t u r e m a p s t y l e , F e a t u r e m a p r a n d o m ) Loss_{style} = L_{style}(Feature \space map_{style}, Feature \space map_{random}) Lossstyle=Lstyle(Feature mapstyle,Feature maprandom)
计算内损失:
L o s s c o n t e n t = L c o n t e n t ( F e a t u r e m a p c o n t e n t , F e a t u r e m a p r a n d o m ) Loss_{content} = L_{content}(Feature \space map_{content}, Feature \space map_{random}) Losscontent=Lcontent(Feature mapcontent,Feature maprandom)
为了让图像保持局部的连贯性也就是让图像更加平滑需要合并风格损失和内容损失。
整体函数可以通过下式表示:
已知:
- s t y l e i m a g e style \space image style image
- c o n t e n t i m a g e content \space image content image
- V G G 16 N e u r a l N e t W o r k a n d I m a g e N e t ′ s w e i g h t s VGG16 \space Neural \space NetWork \space and \space ImageNet's \space weights VGG16 Neural NetWork and ImageNet′s weights
随机初始化(符合某种分布):
- R a n d o m i m a g e Random \space image Random image
用以个函数表示要做的事情:
L
o
s
s
t
o
t
a
l
=
L
s
t
y
l
e
(
F
e
a
t
u
r
e
m
a
p
s
t
y
l
e
,
F
e
a
t
u
r
e
m
a
p
r
a
n
d
o
m
)
+
L
c
o
n
t
e
n
t
(
F
e
a
t
u
r
e
m
a
p
c
o
n
t
e
n
t
,
F
e
a
t
u
r
e
m
a
p
r
a
n
d
o
m
)
Loss_{total} = L_{style}(Feature \space map_{style}, Feature \space map_{random}) + L_{content}(Feature \space map_{content}, Feature \space map_{random})
Losstotal=Lstyle(Feature mapstyle,Feature maprandom)+Lcontent(Feature mapcontent,Feature maprandom)
由于
F
e
a
t
u
r
e
m
a
p
s
t
y
l
e
Feature \space map_{style}
Feature mapstyle 和
F
e
a
t
u
r
e
m
a
p
c
o
n
t
e
n
t
Feature \space map_{content}
Feature mapcontent都是已知的。可以不用表示出来所以函数可以表示为:
L
o
s
s
t
o
t
a
l
=
L
s
t
y
l
e
(
V
G
G
(
R
a
n
d
o
m
i
m
a
g
e
)
)
+
L
c
o
n
t
e
n
t
(
V
G
G
(
R
a
n
d
o
m
i
m
a
g
e
)
)
Loss_{total} = L_{style}(VGG(Random \space image)) + L_{content}(VGG(Random \space image))
Losstotal=Lstyle(VGG(Random image))+Lcontent(VGG(Random image))
又因为:
L
s
t
y
l
e
L_{style}
Lstyle 和
L
c
o
n
t
e
n
t
L_{content}
Lcontent都是
R
a
n
d
o
m
i
m
a
g
e
Random \space image
Random image的函数所以可以修改上述式子为:
L
o
s
s
t
o
t
a
l
=
L
s
t
y
l
e
+
c
o
n
t
e
n
t
(
R
a
n
d
o
m
i
m
a
g
e
)
Loss_{total} = L_{style+content}(Random\space image)
Losstotal=Lstyle+content(Random image)
对比以往损失函数:
L
o
s
s
=
y
−
y
^
=
A
c
t
i
v
a
t
i
o
n
(
w
x
+
b
)
−
y
^
=
f
(
w
)
\begin{aligned} Loss &= y - \hat{y} \\ &= Activation(wx + b) - \hat{y} \\ &= f(w) \end{aligned}
Loss=y−y^=Activation(wx+b)−y^=f(w)
之前训练的模型,我们会给出数据集(x, y)大多数库函数会根据给定的x和y构建损失,初始化参数w通过迭代修改参数w的值,并保存参数值到模型中,当需要这些修改好的参数的时候直接调用就可以了。
而风格迁移一反常态,通过观察函数
L
o
s
s
t
o
t
a
l
Loss_{total}
Losstotal 函数得知‘权重参数’是Random image当输入成了要修改的参数就需要自己构建修改器了。
代码解析
import time
import numpy as np
from scipy.misc import imsave
from scipy.optimize import fmin_l_bfgs_b
from keras.preprocessing.image import load_img, img_to_array
from keras.applications import vgg16
from keras import backend as K
def preprocess_image(img_path, img_size):
'''
预处理输入图片的程序
打开图像后把图片resize成定义大小并返回keras中vgg16的规格
open, resize and format pictures into appropriate tensors
'''
img = load_img(img_path, target_size=img_size)
img = img_to_array(img) # h * w * 3
img = np.expand_dims(img, axis=0) # 1 * h * w * 3
img = vgg16.preprocess_input(img) # 对img像素值减去训练图片的平均像素值
return img # 1 * h * w * 3
def deprocess_image(x, target_size):
'''
最后生成输出图片的处理程序
'''
x = x.reshape((target_size[0], target_size[1], 3))
# Remove zero-center by mean pixel
x[:, :, 0] += 103.939
x[:, :, 1] += 116.779
x[:, :, 2] += 123.68
# 'BGR'->'RGB'
x = x[:, :, ::-1]
x = np.clip(x, 0, 255).astype('uint8')
return x
def gram_matrix(x):
'''
gram矩阵--特征图的外积矩阵
度量各个维度自己的特性以及各个维度之间的关系
'''
assert K.ndim(x) == 3
features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))
gram = K.dot(features, K.transpose(features))
return gram
def style_loss(style_features, comb_features, target_size):
'''
风格损失(style loss)的目的是在生成的图像中保持参考图像的风格
其中对于风格的定义使用了风格和生成特征图的gram矩阵
'''
assert K.ndim(style_features) == 3
assert K.ndim(comb_features) == 3
channels = 3
S = gram_matrix(style_features)
C = gram_matrix(comb_features)
size = target_size[0] * target_size[1] # w * h
return K.sum(K.square(S - C)) / (4. * pow(channels, 2) * pow(size, 2))
def content_loss(base_features, comb_features):
''' 内容损失(content loss)函数 目的是在生成的图像中保持图像的内容
'''
return K.sum(K.square(comb_features - base_features))
# the 3rd loss function, total variation loss,
# designed to keep the generated image locally coherent
def total_variation_loss(x, size):
assert K.ndim(x) == 4
row = size[0] - 1
col = size[1] - 1
a = K.square(x[:, :row, :col, :] - x[:, 1:, :col, :])
b = K.square(x[:, :row, :col, :] - x[:, :row, 1:, :])
return K.sum(K.pow(a + b, 1.25))
def eval_loss_and_grads(x, size, f_outputs):
'''
计算损失和梯度
'''
x = x.reshape((1, size[0], size[1], 3))
outs = f_outputs([x])
loss_value = outs[0]
if len(outs[1:]) == 1:
grad_values = outs[1].flatten().astype('float64')
else:
grad_values = np.array(outs[1:]).flatten().astype('float64')
return loss_value, grad_values
class Evaluator(object):
'''
Evaluator 类从两个不同的程序中分别获得loss损失和gradients梯度,然后统一计算
'''
def __init__(self, size, f_outputs):
self.loss_value = None
self.grads_values = None
self.size = size
self.f_outputs = f_outputs
def loss(self, x):
assert self.loss_value is None
loss_value, grad_values = eval_loss_and_grads(
x, self.size, self.f_outputs)
self.loss_value = loss_value
self.grad_values = grad_values
return self.loss_value
def grads(self, x):
assert self.loss_value is not None
grad_values = np.copy(self.grad_values)
self.loss_value = None
self.grad_values = None
return grad_values
def main():
base_img_path = "base.jpg"
style_img_path = "style_img.jpg"
target_prefix = "target"
iterations = 10
# 不同的损失权重
total_variation_weight = 1.0 # 总变化损失权重
style_weight = 1.0 # 风格损失权重
content_weight = 0.025 # 内容损失权重
# 确定风格转移图的尺寸 norws: h, ncols: w
img_nrows = 400
width, height = load_img(base_img_path).size
img_ncols = int(width * img_nrows / height)
target_size = (img_nrows, img_ncols)
# 实例化输入图片为keras的tensor对象 -> shape=(3, h, w, 3)
base_img = K.variable(preprocess_image(base_img_path, target_size))
style_img = K.variable(preprocess_image(style_img_path, target_size))
# 实例化合成的图片为keras的tensor对象 -> shape=(1, h, w, 3)
comb_img = K.placeholder((1, img_nrows, img_ncols, 3))
# 把输入的三张图片合并成keras的tensor对象 -> shape=(3, h, w, 3)
input_tensor = K.concatenate([base_img, style_img, comb_img], axis=0)
# 建立VGG网络 把base_img, style_img, comb_img 三张图片作为输入
# 然后载入预先训练好的VGGNet的权重文件
model = vgg16.VGG16(input_tensor=input_tensor,
weights='imagenet', include_top=False)
for layer in model.layers:
print("[layer name]: %14s, [Output]: %s" % (layer.name, layer.output))
print("Model loaded finish")
# 把VGG网络中的每一层的命名和输出信息放入字典里
vgg_dict = dict([(layer.name, layer.output) for layer in model.layers])
# 定义内容损失
# 把block4 conv2层后的特征作为"内容损失函数"
loss = K.variable(0.)
# Relu_8:0 shape=(3, 50, 66, 512)
layer_features = vgg_dict['block4_conv2']
base_img_features = layer_features[0, :, :, :]
comb_features = layer_features[2, :, :, :]
loss += content_weight * content_loss(base_img_features, comb_features)
feature_layers = ['block1_conv1', 'block2_conv1', 'block3_conv1',
'block4_conv1', 'block5_conv1']
# 定义风格损失
# 遍历feature_layers卷积层,把其征作图为"内容损失函数"
for layer_name in feature_layers:
layer_features = vgg_dict[layer_name]
style_features = layer_features[1, :, :, :]
comb_features = layer_features[2, :, :, :]
sl = style_loss(style_features, comb_features, target_size)
loss += (style_weight / len(feature_layers)) * sl # 0.25
# 定义总损失
loss += total_variation_weight * \
total_variation_loss(comb_img, target_size)
# 返回loss函数关于comb_img的梯度
grads = K.gradients(loss, comb_img)
outputs = [loss]
outputs += grads
f_outputs = K.function([comb_img], outputs) # 实例化一个Keras函数
evaluator = Evaluator(target_size, f_outputs)
# 创建一个归一化的(1, h, w, 3)图像
x = np.random.uniform(0, 255, (1, target_size[0], target_size[1], 3))
x -= 128.
# run scipy-based optimization (L-BFGS) over the pixels of the generated image
# so as to minimize the neural style loss
for i in range(iterations):
print('Start iteration', i)
start_time = time.time()
cur_loss = evaluator.loss
cur_grads = evaluator.grads
# 使用scip的L-BFGS算法计算损失函数最小值
x, min_val, info = fmin_l_bfgs_b(
cur_loss, x.flatten(), fprime=cur_grads, maxfun=20)
end_time = time.time()
img = deprocess_image(x.copy(), target_size)
fname = target_prefix + '_%d.png' % i
imsave(fname, img)
print('Current loss value:', min_val)
print('Image saved as', fname)
print('Iteration %d completed in %ds' % (i, end_time - start_time))
if __name__ == '__main__':
main()
我将代码架构画了下来:
通过公式已经知道了:
L
o
s
s
t
o
t
a
l
=
L
s
t
y
l
e
+
c
o
n
t
e
n
t
(
R
a
n
d
o
m
i
m
a
g
e
)
Loss_{total} = L_{style+content}(Random\space image)
Losstotal=Lstyle+content(Random image)
则由于我们已经出了
R
a
n
d
o
m
I
m
a
g
e
Random \space Image
Random Image之外的任何值所以可以求的梯度。
所以就有:
前面式子:
L
o
s
s
t
o
t
a
l
=
L
s
t
y
l
e
+
c
o
n
t
e
n
t
(
R
a
n
d
o
m
i
m
a
g
e
)
Loss_{total} = L_{style+content}(Random\space image)
Losstotal=Lstyle+content(Random image)
由该式子计算梯度:
G
r
a
d
i
e
n
t
=
∂
L
o
s
s
t
o
t
a
l
∂
R
a
n
d
o
m
I
m
a
g
e
Gradient= \frac{\partial Loss_{total}}{\partial Random\space Image}
Gradient=∂Random Image∂Losstotal
所以我们可一定义一个新的式子:
G
r
a
d
i
n
e
n
t
,
L
o
s
s
t
o
t
a
l
=
K
.
f
u
n
c
t
i
o
n
(
R
a
n
d
o
m
I
m
a
g
e
)
Gradinent, Loss_{total} = K.function(Random \space Image)
Gradinent,Losstotal=K.function(Random Image)
这也就是代码部分定义的Function函数。
Random image = Init()
for iters:
Loss, Gradient = K.function(Random Image)
Random image = Random image - Gradent
print(loss) #
save_image(Random_image)
Loss 主要是供我们观察损失函数情况。
下图是训练成果图:
自己的图像结合了一幅山水画。(效果还可以吧)