Neural Style Transfer风格迁移详细解读

Neural Style Transfer (NST) 是深度学习一个非常有趣的应用,如下图所示,它将“content” image © 与“style” image (S)融合成“generated” image (G)。生成的G由C的内容与S的风格组合而成。这里我们讲解其算法原理与tensorflow的具体实现过程。在这里插入图片描述

算法原理

基本思想是分别从内容和风格图像中提取内容和风格特征,并将这两个特征重新组合成为目标图像,之后在线迭代地重建目标图像,依据是生成图像与内容和风格图像之间的差异。

内容损失函数定义为两者通过VGG网络提取的特征之间的欧式距离。风格损失函数定义为两者通过VGG网络提取的特征之间的格拉姆矩阵的欧氏距离。其中内容,风格特征定义与损失函数会在后文中具体讲解。

这里我们需要注意的是,通过迭代优化的参数并不是模型中kernel里的参数,而是我们用内容图像加上噪声后的输入图像x,通过内容损失与风格损失对x求导来直接优化x像素,从而改变原始图像,最终获得风格迁移后的图像。
在这里插入图片描述

Transfer Learning

NST使用了一个预训练模型,这里我们用VGG19
在这里插入图片描述
这个模型在python中以字典类型存储,每个variable name是一个key相应的值是包含variable值的tensor。为了在网络中运行,你必须将图输入到模型中,在Tensorflow中使用tf.assign。
在这里插入图片描述
如果你想在运行时使用特定层的激活数(比如4_2)你可以运行Tensorflow session
在这里插入图片描述

Neural Style Transfer

NST算法有三个步骤
在这里插入图片描述
Computing the content cost
如何确定生成图G能够匹配C的内容呢?这是我们需要理解的重要问题之一。

我们知道卷积神经网络中浅层卷积往往检测低级特征,深层卷积可以提取更高级的特征。我们希望G能够与C的内容相似。那么我们选择一些层的激活来代表一幅画的内容,事实上,如果我们选择网络的中间层(不是太浅也不要太深)我们会得到最令人满意的结果。

所以,假设我们选择每个特殊隐含层来使用。将C作为输入放入VGG中,并运行前向传播。我们设 a ( C ) a^{(C)} a(C)为所选中隐含层的激活数。它是一个nH×nW×nC的tensor,G也如此,将G作为输入,并进行前向传播, a ( G ) a^{(G)} a(G)为相应隐含层的激活数。我们将内容损失函数定义为:
在这里插入图片描述
在这里插入图片描述
这里,nh,nw,nc是我们选择隐含层的高,宽以及通道数,如图每种颜色表示一个filter,将其展开成2d方便我们后面理解。

Exercise: 用tensroflow计算 content cost

# GRADED FUNCTION: compute_content_cost

def compute_content_cost(a_C, a_G):
    """
    Computes the content cost

    Arguments:
    a_C -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image C 
    a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image G

    Returns: 
    J_content -- scalar that you compute using equation 1 above.
    """

    ### START CODE HERE ###
    # Retrieve dimensions from a_G (≈1 line)
    m, n_H, n_W, n_C = a_G.get_shape().as_list()

    # Reshape a_C and a_G (≈2 lines)
    a_C_unrolled = tf.reshape(a_C,[n_H*n_W,n_C])
    a_G_unrolled = tf.reshape(a_G,[n_H*n_W,n_C])

    # compute the cost with tensorflow (≈1 line)
    J_content = tf.reduce_sum(tf.square(tf.subtract(a_C_unrolled,a_G_unrolled)))/(4*n_H*n_W*n_C)
    ### END CODE HERE ###

    return J_content
tf.reset_default_graph()

with tf.Session() as test:
    tf.set_random_seed(1)
    a_C = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    a_G = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    J_content = compute_content_cost(a_C, a_G)
    print("J_content = " + str(J_content.eval()))

J_content = 6.76559

这里我们需要记住:
content cost采用的是网络一个隐含层的激活数,它测量了 a ( C ) a^{(C)} a(C) a ( G ) a^{(G)} a(G)的差别。
当我们让content cost最小化时,这就是得G的内容与C相似。

Computing the style cost
我们来看看如何定义“style” const function Jstyle(S,G)

Style matrix
style matrix又称作Gram matrix。
在这里插入图片描述
Gij比较Vi与Vj的相似程度,如果他们高度相似,你会希望它们有一个大的点积,所以对于Gij来说是大的。在这里插入图片描述
在NST中,我们计算Style matrix通过展开后的filter matrix与它的转置相乘。其结果就是(nc×nc)维度的矩阵,nc是filter的数量。Gij测量了filter i的激活量与j激活量的相似度。Gii测量filter i有多活跃,换句话说,假设filter i检测垂直纹理,那么Gii就可以测量垂直纹理在一幅图中有多么频繁。

通过捕捉Gii以及Gij,就能够测量一幅图的style。

Exercise:

# GRADED FUNCTION: gram_matrix

def gram_matrix(A):
    """
    Argument:
    A -- matrix of shape (n_C, n_H*n_W)

    Returns:
    GA -- Gram matrix of A, of shape (n_C, n_C)
    """

    ### START CODE HERE ### (≈1 line)
    GA = tf.matmul(A,tf.transpose(A))
    ### END CODE HERE ###

    return GA
    
    tf.reset_default_graph()

with tf.Session() as test:
    tf.set_random_seed(1)
    A = tf.random_normal([3, 2*1], mean=1, stddev=4)
    GA = gram_matrix(A)

    print("GA = " + str(GA.eval()))

Style cost
在这里插入图片描述
这里L指具体的隐藏层, G ( S ) G^{(S)} G(S) G ( G ) G^{(G)} G(G)分别表示“style“,“generated”与的Gram matrices。

# GRADED FUNCTION: compute_layer_style_cost

def compute_layer_style_cost(a_S, a_G):
    """
    Arguments:
    a_S -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S 
    a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G

    Returns: 
    J_style_layer -- tensor representing a scalar value, style cost defined above by equation (2)
    """

    ### START CODE HERE ###
    # Retrieve dimensions from a_G (≈1 line)
    m, n_H, n_W, n_C = a_G.get_shape().as_list()

    # Reshape the images to have them of shape (n_C, n_H*n_W) (≈2 lines)
    a_S = tf.reshape(a_S,[n_H*n_W,n_C])
    a_G = tf.reshape(a_G,[n_H*n_W,n_C])

    # Computing gram_matrices for both images S and G (≈2 lines)
    GS = gram_matrix(tf.transpose(a_S))
    GG = gram_matrix(tf.transpose(a_G))

    # Computing the loss (≈1 line)
    J_style_layer=tf.reduce_sum(tf.square(tf.subtract(GS,GG)))/(4*tf.square(tf.to_float(n_C))*tf.square(tf.to_float(n_H*n_W)))

    ### END CODE HERE ###

    return J_style_layer
tf.reset_default_graph()

with tf.Session() as test:
    tf.set_random_seed(1)
    a_S = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    a_G = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    J_style_layer = compute_layer_style_cost(a_S, a_G)

    print("J_style_layer = " + str(J_style_layer.eval()))

J_style_layer = 9.19028

Style Weights
到目前为止我们已经捕获了一层style,如果我们融合多层style将会获得更好的结果。随意改变层数,看看不同的效果。

STYLE_LAYERS = [
    ('conv1_1', 0.2),
    ('conv2_1', 0.2),
    ('conv3_1', 0.2),
    ('conv4_1', 0.2),
    ('conv5_1', 0.2)]
def compute_style_cost(model, STYLE_LAYERS):
    """
    Computes the overall style cost from several chosen layers

    Arguments:
    model -- our tensorflow model
    STYLE_LAYERS -- A python list containing:
                        - the names of the layers we would like to extract style from
                        - a coefficient for each of them

    Returns: 
    J_style -- tensor representing a scalar value, style cost defined above by equation (2)
    """

    # initialize the overall style cost
    J_style = 0

    for layer_name, coeff in STYLE_LAYERS:

        # Select the output tensor of the currently selected layer
        out = model[layer_name]

        # Set a_S to be the hidden layer activation from the layer we have selected, by running the session on out
        a_S = sess.run(out)

        # Set a_G to be the hidden layer activation from same layer. Here, a_G references model[layer_name] 
        # and isn't evaluated yet. Later in the code, we'll assign the image G as the model input, so that
        # when we run the session, this will be the activations drawn from the appropriate layer, with G as input.
        a_G = out

        # Compute style_cost for the current layer
        J_style_layer = compute_layer_style_cost(a_S, a_G)

        # Add coeff * J_style_layer of this layer to overall style cost
        J_style += coeff * J_style_layer

    return J_style

当我们minimize style cost时,G就会跟随S的风格。

Defining the total cost to optimize

最终的cost function定义为:
在这里插入图片描述
Exercise: Implement the total cost function which includes both the content cost and the style cost.

# GRADED FUNCTION: total_cost

def total_cost(J_content, J_style, alpha = 10.0, beta = 40.0):
    """
    Computes the total cost function

    Arguments:
    J_content -- content cost coded above
    J_style -- style cost coded above
    alpha -- hyperparameter weighting the importance of the content cost
    beta -- hyperparameter weighting the importance of the style cost

    Returns:
    J -- total cost as defined by the formula above.
    """

    ### START CODE HERE ### (≈1 line)
    J = alpha*J_content+beta*J_style
    ### END CODE HERE ###

    return J

Solving the optimization problem

最后,我们把所有的结合到一起来实现NST。
下面是详细的程序步骤:
在这里插入图片描述
我们来具体分析每一个步骤的细节。

我们之前已经求出了总损失 J(G)。现在建立Tensoflow对G进行优化,您的程序必须重置图形并使用“Interactive Session”

# Reset the graph
tf.reset_default_graph()

# Start interactive session
sess = tf.InteractiveSession()

加载,变形并且归一化我们的content,style图像

content_image = scipy.misc.imread("images/louvre_small.jpg")
content_image = reshape_and_normalize_image(content_image)
style_image = scipy.misc.imread("images/monet.jpg")
style_image = reshape_and_normalize_image(style_image)

现在,我们把生成图像G初始化成带噪声的content图,这将使G更加快速的匹配C的内容

generated_image = generate_noise_image(content_image)
#imshow(generated_image[0])

加载VGG模型

model = load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")

我们使用conv4_2来计算content cost,代码步骤如下:
在这里插入图片描述

# Assign the content image to be the input of the VGG model.  
sess.run(model['input'].assign(content_image))

# Select the output tensor of layer conv4_2
out = model['conv4_2']

# Set a_C to be the hidden layer activation from the layer we have selected
a_C = sess.run(out)

# Set a_G to be the hidden layer activation from same layer. Here, a_G references model['conv4_2'] 
# and isn't evaluated yet. Later in the code, we'll assign the image G as the model input, so that
# when we run the session, this will be the activations drawn from the appropriate layer, with G as input.
a_G = out

# Compute the content cost
J_content = compute_content_cost(a_C, a_G)

注意:在这里,a_G是一个tensor并没有被评价,当我们运行Tensorflow graph使每次迭代才评价更新它。

# Assign the input of the model to be the "style" image 
sess.run(model['input'].assign(style_image))

# Compute the style cost
J_style = compute_style_cost(model, STYLE_LAYERS)

Exercise: Now that you have J_content and J_style, compute the total cost J by calling total_cost(). Use alpha = 10 and beta = 40.

### START CODE HERE ### (1 line)
J = total_cost(J_content,J_style,alpha=10,beta=40)
### END CODE HERE ###

# define optimizer (1 line)
optimizer = tf.train.AdamOptimizer(2.0)

# define train_step (1 line)
train_step = optimizer.minimize(J)

Exercise: model_nn() function 可以初始化variable,将G作为vgg的输入并且运行train-step。

def model_nn(sess, input_image, num_iterations = 200):

    # Initialize global variables (you need to run the session on the initializer)
    ### START CODE HERE ### (1 line)
    sess.run(tf.global_variables_initializer())
    ### END CODE HERE ###

    # Run the noisy input image (initial generated image) through the model. Use assign().
    ### START CODE HERE ### (1 line)
    sess.run(model["input"].assign(input_image))
    ### END CODE HERE ###

    for i in range(num_iterations):

        # Run the session on the train_step to minimize the total cost
        ### START CODE HERE ### (1 line)
        sess.run(train_step)
        ### END CODE HERE ###

        # Compute the generated image by running the session on the current model['input']
        ### START CODE HERE ### (1 line)
        generated_image = sess.run(model["input"])
        ### END CODE HERE ###

        # Print every 20 iteration.
        if i%20 == 0:
            Jt, Jc, Js = sess.run([J, J_content, J_style])
            print("Iteration " + str(i) + " :")
            print("total cost = " + str(Jt))
            print("content cost = " + str(Jc))
            print("style cost = " + str(Js))

            # save current generated image in the "/output" directory
            save_image("output/" + str(i) + ".png", generated_image)

    # save last generated image
    save_image('output/generated_image.jpg', generated_image)

    return generated_image
  • 2
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值