
相信很多人都对之前大名鼎鼎的 Prisma 早有耳闻,Prisma 能够将一张普通的图像转换成各种艺术风格的图像,今天,我们将要介绍一下Prisma 这款软件背后的算法原理。就是发表于 2016 CVPR 一篇文章,

“ Image Style Transfer Using Convolutional Neural Networks”



总得来说,就是利用一个训练好的卷积神经网络 VGG-19,这个网络在ImageNet 上已经训练过了。

给定一张风格图像  a  和一张普通图像  p ,风格图像经过VGG-19 的时候在每个卷积层会得到很多 feature maps, 这些feature maps 组成一个集合  A ,同样的,普通图像  p  通过 VGG-19 的时候也会得到很多 feature maps,这些feature maps 组成一个集合  P ,然后生成一张随机噪声图像  x , 随机噪声图像  x  通过VGG-19 的时候也会生成很多feature maps,这些 feature maps 构成集合  G  和  F  分别对应集合  A  和  P , 最终的优化函数是希望调整  x  让 随机噪声图像  x  最后看起来既保持普通图像  p  的内容, 又有一定的风格图像  a  的风格。

content representation

在建立目标函数之前,我们需要先给出一些定义: 在CNN 中, 假设某一 layer 含有  Nl  个 filters, 那么将会生成  Nl  个 feature maps,每个 feature map 的维度为  Ml  ,  Ml  是 feature map 的 高与宽的乘积。所以每一层 feature maps 的集合可以表示为  FlRNl×Ml  ,  Flij  表示第  i 个 filter在 position  j  上的 activation。

所以,我们可以给出 content 的 cost function:


style representation

为了建立风格的representation,我们先利用 Gram matrix 去表示每一层各个 feature maps 之间的关系, GlRNl×Nl  ,  Glij  是 feature maps  i,j  的内积:


利用 Gram matrix,我们可以建立每一层的关于 style 的 cost :




最后将 content 和 style 的 cost 相结合,最终可以得到:


α,β  表示权值,在建立  Lcontent  的时候,用到了 VGG-19 的 conv4_2 层,而在建立  Lstyle  的时候,用到了VGG-19 的 conv1_1, conv2_1, conv3_1, conv4_1 以及 conv5_1。

我们主要介绍基于TensorFlow的程序实现,为了实现以下程序,你需要安装 TensorFlow, Numpy, Scipy, 以及下载 VGG-19 model。


import os
import sys
import numpy as np
import scipy.io
import scipy.misc
import tensorflow as tf

# Output folder for the images.
OUTPUT_DIR = 'output/'
# Style image to use.
STYLE_IMAGE = '/images/ocean.jpg'
# Content image to use.
CONTENT_IMAGE = '/images/Taipei101.jpg'
# Image dimensions constants.

# Algorithm constants
# 设置随机噪声图像与内容图像的比率
# 设置迭代次数
# 设置内容图像与风格图像的权重
alpha = 1
beta = 500
# 加载VGG-19 MODEL及设定均值
VGG_Model = 'Downloads/imagenet-vgg-verydeep-19.mat'
MEAN_VALUES = np.array([123.68, 116.779, 103.939]).reshape((1, 1, 1, 3))
# 设置需要用到的卷积层
CONTENT_LAYERS = [('conv4_2', 1.)]
STYLE_LAYERS = [('conv1_1', 0.2), ('conv2_1', 0.2), ('conv3_1', 0.2), ('conv4_1', 0.2), ('conv5_1', 0.2)]

# 生成随机噪声图,与content图以一定比率融合
def generate_noise_image(content_image, noise_ratio = NOISE_RATIO):
    Returns a noise image intermixed with the content image at a certain ratio.
    noise_image = np.random.uniform(
            -20, 20,
            (1, IMAGE_HEIGHT, IMAGE_WIDTH, COLOR_CHANNELS)).astype('float32')
    # White noise image from the content representation. Take a weighted average
    # of the values
    img = noise_image * noise_ratio + content_image * (1 - noise_ratio)
    return img

def load_image(path):
    image = scipy.misc.imread(path)
    # Resize the image for convnet input, there is no change but just
    # add an extra dimension.
    image = np.reshape(image, ((1,) + image.shape))
    # Input to the VGG net expects the mean to be subtracted.
    image = image - MEAN_VALUES
    return image

def save_image(path, image):
    # Output should add back the mean.
    image = image + MEAN_VALUES
    # Get rid of the first useless dimension, what remains is the image.
    image = image[0]
    image = np.clip(image, 0, 255).astype('uint8')
    scipy.misc.imsave(path, image)

def build_net(ntype, nin, nwb=None):
    if ntype == 'conv':
        return tf.nn.relu(tf.nn.conv2d(nin, nwb[0], strides=[1, 1, 1, 1], padding='SAME') + nwb[1])
    elif ntype == 'pool':
        return tf.nn.avg_pool(nin, ksize=[1, 2, 2, 1],
                              strides=[1, 2, 2, 1], padding='SAME')

def get_weight_bias(vgg_layers, i):
    weights = vgg_layers[i][0][0][2][0][0]
    weights = tf.constant(weights)
    bias = vgg_layers[i][0][0][2][0][1]
    bias = tf.constant(np.reshape(bias, (bias.size)))
    return weights, bias

def build_vgg19(path):
    net = {}
    vgg_rawnet = scipy.io.loadmat(path)
    vgg_layers = vgg_rawnet['layers'][0]
    net['input'] = tf.Variable(np.zeros((1, IMAGE_HEIGHT, IMAGE_WIDTH, 3)).astype('float32'))
    net['conv1_1'] = build_net('conv', net['input'], get_weight_bias(vgg_layers, 0))
    net['conv1_2'] = build_net('conv', net['conv1_1'], get_weight_bias(vgg_layers, 2))
    net['pool1'] = build_net('pool', net['conv1_2'])
    net['conv2_1'] = build_net('conv', net['pool1'], get_weight_bias(vgg_layers, 5))
    net['conv2_2'] = build_net('conv', net['conv2_1'], get_weight_bias(vgg_layers, 7))
    net['pool2'] = build_net('pool', net['conv2_2'])
    net['conv3_1'] = build_net('conv', net['pool2'], get_weight_bias(vgg_layers, 10))
    net['conv3_2'] = build_net('conv', net['conv3_1'], get_weight_bias(vgg_layers, 12))
    net['conv3_3'] = build_net('conv', net['conv3_2'], get_weight_bias(vgg_layers, 14))
    net['conv3_4'] = build_net('conv', net['conv3_3'], get_weight_bias(vgg_layers, 16))
    net['pool3'] = build_net('pool', net['conv3_4'])
    net['conv4_1'] = build_net('conv', net['pool3'], get_weight_bias(vgg_layers, 19))
    net['conv4_2'] = build_net('conv', net['conv4_1'], get_weight_bias(vgg_layers, 21))
    net['conv4_3'] = build_net('conv', net['conv4_2'], get_weight_bias(vgg_layers, 23))
    net['conv4_4'] = build_net('conv', net['conv4_3'], get_weight_bias(vgg_layers, 25))
    net['pool4'] = build_net('pool', net['conv4_4'])
    net['conv5_1'] = build_net('conv', net['pool4'], get_weight_bias(vgg_layers, 28))
    net['conv5_2'] = build_net('conv', net['conv5_1'], get_weight_bias(vgg_layers, 30))
    net['conv5_3'] = build_net('conv', net['conv5_2'], get_weight_bias(vgg_layers, 32))
    net['conv5_4'] = build_net('conv', net['conv5_3'], get_weight_bias(vgg_layers, 34))
    net['pool5'] = build_net('pool', net['conv5_4'])
    return net

def content_layer_loss(p, x):

    M = p.shape[1] * p.shape[2]
    N = p.shape[3]
    loss = (1. / (2 * N * M)) * tf.reduce_sum(tf.pow((x - p), 2))
    return loss

def content_loss_func(sess, net):

    layers = CONTENT_LAYERS
    total_content_loss = 0.0
    for layer_name, weight in layers:
        p = sess.run(net[layer_name])
        x = net[layer_name]
        total_content_loss += content_layer_loss(p, x)*weight

    total_content_loss /= float(len(layers))
    return total_content_loss

def gram_matrix(x, area, depth):

    x1 = tf.reshape(x, (area, depth))
    g = tf.matmul(tf.transpose(x1), x1)
    return g

def style_layer_loss(a, x):

    M = a.shape[1] * a.shape[2]
    N = a.shape[3]
    A = gram_matrix(a, M, N)
    G = gram_matrix(x, M, N)
    loss = (1. / (4 * N ** 2 * M ** 2)) * tf.reduce_sum(tf.pow((G - A), 2))
    return loss

def style_loss_func(sess, net):

    layers = STYLE_LAYERS
    total_style_loss = 0.0
    for layer_name, weight in layers:
        a = sess.run(net[layer_name])
        x = net[layer_name]
        total_style_loss += style_layer_loss(a, x) * weight
    total_style_loss /= float(len(layers))
    return total_style_loss

def main():
    net = build_vgg19(VGG_Model)
    sess = tf.Session()

    content_img = load_image(CONTENT_IMAGE)
    style_img = load_image(STYLE_IMAGE)

    cost_content = content_loss_func(sess, net)

    cost_style = style_loss_func(sess, net)

    total_loss = alpha * cost_content + beta * cost_style
    optimizer = tf.train.AdamOptimizer(2.0)

    init_img = generate_noise_image(content_img)

    train_op = optimizer.minimize(total_loss)

    for it in range(ITERATIONS):
        if it % 100 == 0:
            # Print every 100 iteration.
            mixed_image = sess.run(net['input'])
            print('Iteration %d' % (it))
            print('sum : ', sess.run(tf.reduce_sum(mixed_image)))
            print('cost: ', sess.run(total_loss))

            if not os.path.exists(OUTPUT_DIR):

            filename = 'output/%d.png' % (it)
            save_image(filename, mixed_image)

if __name__ == '__main__':

对前面的代码做了一些改变,设置了一个 image resize 函数,这样可以处理任意size的 input image,而且我们尝试利用 L-BFGS 优化算法替代之前的 Adam 优化算法,对卷积层以及pooling层函数做了修改。

import numpy as np
import scipy.io
import scipy.misc
from scipy.misc import imresize, imread
import tensorflow as tf

# Constants for the image input and output.

# Output folder for the images.
OUTPUT_DIR = 'output/'
# Style image to use.
STYLE_IMAGE = 'images/the_scream.jpg'
# Content image to use.
CONTENT_IMAGE = 'images/Taipei101.jpg'
# Image dimensions constants.

# Algorithm constants
# Noise ratio. Percentage of weight of the noise for intermixing with the
# content image.
# Number of iterations to run.
# Constant to put more emphasis on content loss.
alpha = 1
# Constant to put more emphasis on style loss.
beta = 500
VGG_Model = 'imagenet-vgg-verydeep-19.mat'
MEAN_VALUES = np.array([123.68, 116.779, 103.939]).reshape((1, 1, 1, 3))

CONTENT_LAYERS = [('conv4_2', 1.)]
STYLE_LAYERS = [('conv1_1', 0.2), ('conv2_1', 0.2), ('conv3_1', 0.2), ('conv4_1', 0.2), ('conv5_1', 0.2)]

def generate_noise_image(content_image, noise_ratio = NOISE_RATIO):
    Returns a noise image intermixed with the content image at a certain ratio.
    noise_image = np.random.uniform(
            -20, 20,
            (1, IMAGE_HEIGHT, IMAGE_WIDTH, COLOR_CHANNELS)).astype('float32')
    # White noise image from the content representation. Take a weighted average
    # of the values
    img = noise_image * noise_ratio + content_image * (1 - noise_ratio)
    return img

def load_image(path):
    image = imread(path)

    image = imresize(image, (IMAGE_HEIGHT, IMAGE_WIDTH))

    image = np.reshape(image, ((1,) + image.shape))
    # Input to the VGG net expects the mean to be subtracted.
    image = image - MEAN_VALUES
    return image

def save_image(path, image):
    # Output should add back the mean.
    image = image + MEAN_VALUES
    # Get rid of the first useless dimension, what remains is the image.
    image = image[0]
    image = np.clip(image, 0, 255).astype('uint8')
    scipy.misc.imsave(path, image)

def get_weight_bias(vgg_layers, layer_i):
    weights = vgg_layers[layer_i][0][0][2][0][0]
    w = tf.constant(weights)
    bias = vgg_layers[layer_i][0][0][2][0][1]
    b = tf.constant(np.reshape(bias, (bias.size)))
    layer_name = vgg_layers[layer_i][0][0][0]
    print layer_name
    return w, b

def conv_relu_layer(layer_input, nwb):

    conv_val = tf.nn.conv2d(layer_input, nwb[0], strides=[1, 1, 1, 1], padding='SAME')
    relu_val = tf.nn.relu(conv_val + nwb[1])

    return relu_val

def pool_layer(pool_style, layer_input):
    if pool_style == 'avg':
        return tf.nn.avg_pool(layer_input, ksize=[1, 2, 2, 1],
                              strides=[1, 2, 2, 1], padding='SAME')
    elif pool_style == 'max':
        return  tf.nn.max_pool(layer_input, ksize=[1, 2, 2, 1],
                              strides=[1, 2, 2, 1], padding='SAME')

def build_vgg19(path):
    net = {}
    vgg_rawnet = scipy.io.loadmat(path)
    vgg_layers = vgg_rawnet['layers'][0]
    net['input'] = tf.Variable(np.zeros((1, IMAGE_HEIGHT, IMAGE_WIDTH, 3)).astype('float32'))
    net['conv1_1'] = conv_relu_layer(net['input'], get_weight_bias(vgg_layers, 0))
    net['conv1_2'] = conv_relu_layer(net['conv1_1'], get_weight_bias(vgg_layers, 2))
    net['pool1'] = pool_layer('avg', net['conv1_2'])
    net['conv2_1'] = conv_relu_layer(net['pool1'], get_weight_bias(vgg_layers, 5))
    net['conv2_2'] = conv_relu_layer(net['conv2_1'], get_weight_bias(vgg_layers, 7))
    net['pool2'] = pool_layer('max', net['conv2_2'])
    net['conv3_1'] = conv_relu_layer(net['pool2'], get_weight_bias(vgg_layers, 10))
    net['conv3_2'] = conv_relu_layer(net['conv3_1'], get_weight_bias(vgg_layers, 12))
    net['conv3_3'] = conv_relu_layer(net['conv3_2'], get_weight_bias(vgg_layers, 14))
    net['conv3_4'] = conv_relu_layer(net['conv3_3'], get_weight_bias(vgg_layers, 16))
    net['pool3'] = pool_layer('avg', net['conv3_4'])
    net['conv4_1'] = conv_relu_layer(net['pool3'], get_weight_bias(vgg_layers, 19))
    net['conv4_2'] = conv_relu_layer(net['conv4_1'], get_weight_bias(vgg_layers, 21))
    net['conv4_3'] = conv_relu_layer(net['conv4_2'], get_weight_bias(vgg_layers, 23))
    net['conv4_4'] = conv_relu_layer(net['conv4_3'], get_weight_bias(vgg_layers, 25))
    net['pool4'] = pool_layer('max', net['conv4_4'])
    net['conv5_1'] = conv_relu_layer(net['pool4'], get_weight_bias(vgg_layers, 28))
    net['conv5_2'] = conv_relu_layer(net['conv5_1'], get_weight_bias(vgg_layers, 30))
    net['conv5_3'] = conv_relu_layer(net['conv5_2'], get_weight_bias(vgg_layers, 32))
    net['conv5_4'] = conv_relu_layer(net['conv5_3'], get_weight_bias(vgg_layers, 34))
    net['pool5'] = pool_layer('avg', net['conv5_4'])
    return net

def content_layer_loss(p, x):

    M = p.shape[1] * p.shape[2]
    N = p.shape[3]
    loss = (1. / (2 * N * M)) * tf.reduce_sum(tf.pow((x - p), 2))
    return loss

def content_loss_func(sess, net):

    layers = CONTENT_LAYERS
    total_content_loss = 0.0
    for layer_name, weight in layers:
        p = sess.run(net[layer_name])
        x = net[layer_name]
        total_content_loss += content_layer_loss(p, x)*weight

    total_content_loss /= float(len(layers))
    return total_content_loss

def gram_matrix(x, area, depth):

    x1 = tf.reshape(x, (area, depth))
    g = tf.matmul(tf.transpose(x1), x1)
    return g

def style_layer_loss(a, x):

    M = a.shape[1] * a.shape[2]
    N = a.shape[3]
    A = gram_matrix(a, M, N)
    G = gram_matrix(x, M, N)
    loss = (1. / (4 * N ** 2 * M ** 2)) * tf.reduce_sum(tf.pow((G - A), 2))

    return loss

def style_loss_func(sess, net):

    layers = STYLE_LAYERS
    total_style_loss = 0.0

    for layer_name, weight in layers:
        a = sess.run(net[layer_name])
        x = net[layer_name]
        total_style_loss += style_layer_loss(a, x) * weight

    total_style_loss /= float(len(layers))

    return total_style_loss

def main():
    net = build_vgg19(VGG_Model)
    sess = tf.Session()

    content_img = load_image(CONTENT_IMAGE)
    style_img = load_image(STYLE_IMAGE)

    cost_content = content_loss_func(sess, net)

    cost_style = style_loss_func(sess, net)

    total_loss = alpha * cost_content + beta * cost_style

    optimizer = tf.contrib.opt.ScipyOptimizerInterface(
        total_loss, method='L-BFGS-B',
        options={'maxiter': ITERATIONS,
                 'disp': 0})

    init_img = generate_noise_image(content_img)



    mixed_img = sess.run(net['input'])

    filename = 'output/out.png'
    save_image(filename, mixed_img)

if __name__ == '__main__':

以下是利用卷积神经网络实现图像风格迁移的代码: ```python import tensorflow as tf import numpy as np import PIL.Image import time import functools # 定义常量 IMAGE_WIDTH = 512 IMAGE_HEIGHT = 512 CONTENT_WEIGHT = 1e3 STYLE_WEIGHT = 1e-2 TOTAL_VARIATION_WEIGHT = 30 # 加载图片 def load_image(image_path): max_dim = 512 img = tf.io.read_file(image_path) img = tf.image.decode_image(img, channels=3) img = tf.image.convert_image_dtype(img, tf.float32) shape = tf.cast(tf.shape(img)[:-1], tf.float32) long_dim = max(shape) scale = max_dim / long_dim new_shape = tf.cast(shape * scale, tf.int32) img = tf.image.resize(img, new_shape) img = img[tf.newaxis, :] return img # 显示图片 def show_image(image): if len(image.shape) > 3: image = tf.squeeze(image, axis=0) img = PIL.Image.fromarray(np.array(image * 255, dtype=np.uint8)) img.show() # 加载模型 def load_model(): # 加载预训练的VGG19模型 vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet') vgg.trainable = False # 获取需要的层 content_layers = ['block5_conv2'] style_layers = ['block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1', 'block5_conv1'] # 构建模型 content_outputs = [vgg.get_layer(name).output for name in content_layers] style_outputs = [vgg.get_layer(name).output for name in style_layers] outputs = content_outputs + style_outputs model = tf.keras.models.Model(vgg.input, outputs) return model, style_layers, content_layers # 计算内容损失 def calculate_content_loss(content_output, target_content): return tf.reduce_mean(tf.square(content_output - target_content)) # 计算Gram矩阵 def gram_matrix(input_tensor): channels = int(input_tensor.shape[-1]) a = tf.reshape(input_tensor, [-1, channels]) n = tf.shape(a)[0] gram = tf.matmul(a, a, transpose_a=True) return gram / tf.cast(n, tf.float32) # 计算风格损失 def calculate_style_loss(style_outputs, target_style): style_loss = 0 for style_output in style_outputs: style_output_gram = gram_matrix(style_output) target_style_gram = gram_matrix(target_style) style_loss += tf.reduce_mean(tf.square(style_output_gram - target_style_gram)) style_loss /= len(style_outputs) return style_loss # 计算总变差损失 def calculate_total_variation_loss(image): x_deltas, y_deltas = tf.image.image_gradients(image) return tf.reduce_mean(tf.abs(x_deltas)) + tf.reduce_mean(tf.abs(y_deltas)) # 计算总损失 def calculate_total_loss(model, loss_weights, init_image, gram_style_features, content_features): style_weight, content_weight, tv_weight = loss_weights model_outputs = model(init_image) content_output_features = model_outputs[len(gram_style_features):] style_output_features = model_outputs[:len(gram_style_features)] content_loss = 0 for target_content, content_output in zip(content_features, content_output_features): content_loss += calculate_content_loss(content_output, target_content) content_loss *= content_weight / len(content_features) style_loss = 0 for target_style, style_output in zip(gram_style_features, style_output_features): style_loss += calculate_style_loss(style_output, target_style) style_loss *= style_weight / len(gram_style_features) tv_loss = calculate_total_variation_loss(init_image) tv_loss *= tv_weight total_loss = content_loss + style_loss + tv_loss return total_loss # 计算梯度 @tf.function def calculate_gradients(model, loss_weights, init_image, gram_style_features, content_features): with tf.GradientTape() as tape: loss = calculate_total_loss(model, loss_weights, init_image, gram_style_features, content_features) gradients = tape.gradient(loss, init_image) return gradients, loss # 执行风格迁移 def style_transfer(content_image_path, style_image_path, epochs=10, steps_per_epoch=100, learning_rate=0.01): # 加载模型 model, style_layers, content_layers = load_model() # 加载图片并提取特征 content_image = load_image(content_image_path) style_image = load_image(style_image_path) content_features = [model(content_image)[idx] for idx in range(len(content_layers))] gram_style_features = [gram_matrix(model(style_image)[idx]) for idx in range(len(style_layers))] # 初始化生成的图像 init_image = tf.Variable(content_image) # 定义损失权重 loss_weights = (STYLE_WEIGHT, CONTENT_WEIGHT, TOTAL_VARIATION_WEIGHT) # 优化器 optimizer = tf.optimizers.Adam(learning_rate=learning_rate) # 记录时间 start_time = time.time() # 迭代训练 for epoch in range(epochs): for step in range(steps_per_epoch): gradients, loss = calculate_gradients(model, loss_weights, init_image, gram_style_features, content_features) optimizer.apply_gradients([(gradients, init_image)]) init_image.assign(tf.clip_by_value(init_image, 0, 1)) print(".", end='') print("Train step: {}".format(epoch)) end_time = time.time() print("Total time: {:.1f}".format(end_time - start_time)) # 显示生成的图像 show_image(init_image.numpy()) ``` 使用方法: ```python style_transfer("content.jpg", "style.jpg") ``` 其中,`content.jpg`是原始图片的路径,`style.jpg`是想要迁移的风格图片的路径。默认使用10个epochs进行风格迁移,每个epoch训练100步,学习率为0.01。可以根据需要进行调整。




