CNTK API文档翻译(19)——艺术风格转变

最新推荐文章于 2023-05-15 10:17:44 发布

阔活洵信

最新推荐文章于 2023-05-15 10:17:44 发布

阅读量1.1k

点赞数

分类专栏： CNTK文档翻译文章标签： CNTK 机器学习卷积神经网络

CNTK文档翻译专栏收录该内容

25 篇文章 0 订阅

订阅专栏

本教程展示了如何将一张图片的风格转换成另外一种。这让我们可以将一张原始照片渲染成世界名画的风格。

与创建一个好看的图片不同，在本教程中你讲学习如何在CNTK中加载一个已经训练好的VGG模型，如何基于输入变量获取对应的梯度，以及如何在不使用CNTK的时候使用梯度。

我们使用Leon A. Gatys等人提出并经过Novak和Nikulin改进的方法。当然有更快的技术，不过那些只局限于改进图片风格。

我们首先引入一些我们需要的模块。除了通用模块（numpy、scipy和cntk）之外，我们需要引入PIL模块处理图像，requests模块下载训练好的模型以及h5py模块读取已经训练好的模型里的权重。

from __future__ import print_function
import numpy as np
from scipy import optimize as opt
import cntk as C
from PIL import Image
import requests
import h5py
import os
import matplotlib.pyplot as plt

# Select the right target device when this notebook is being tested:
if 'TEST_DEVICE' in os.environ:
    if os.environ['TEST_DEVICE'] == 'cpu':
        C.device.try_set_default_device(C.device.cpu())
    else:
        C.device.try_set_default_device(C.device.gpu(0))

训练好的模型是一个VGG神经网络，我们从https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3获取。我们（微软）把他放在了自己的服务器上，以便能轻松的下载到他。下面的代码里面我们就将下载他（如果本地不存在），然后将其加载成numpy数组。

def download(url, filename):
    response = requests.get(url, stream=True)
    with open(filename, 'wb') as handle:
        for data in response.iter_content(chunk_size=2**20):
            if data: handle.write(data)


def load_vgg(path):
    f = h5py.File(path)
    layers = []
    for k in range(f.attrs['nb_layers']):
        g = f['layer_{}'.format(k)]
        n = g.attrs['nb_params']
        layers.append([g['param_{}'.format(p)][:] for p in range(n)])
    f.close()
    return layers

# Check for an environment variable defined in CNTK's test infrastructure
envvar = 'CNTK_EXTERNAL_TESTDATA_SOURCE_DIRECTORY'
def is_test(): return envvar in os.environ

path = 'vgg16_weights.bin'
url = 'https://cntk.ai/jup/models/vgg16_weights.bin'
# We check for the model locally
if not os.path.exists(path):
    # If not there we might be running in CNTK's test infrastructure
    if is_test():
        path = os.path.join(os.environ[envvar],'PreTrainedModels','Vgg16','v0',path)
    else:
        #If neither is true we download the file from the web
        print('downloading VGG model (~0.5GB)')
        download(url, path)
layers = load_vgg(path)
print('loaded VGG model')

接下来我们使用CNTK来定义VGG神经网络

# A convolutional layer in the VGG network
def vggblock(x, arrays, layer_map, name):
    f = arrays[0]
    b = arrays[1]
    k = C.constant(value=f)
    t = C.constant(value=np.reshape(b, (-1, 1, 1)))
    y = C.relu(C.convolution(k, x, auto_padding=[False, True, True]) + t)
    layer_map[name] = y
    return y

# A pooling layer in the VGG network
def vggpool(x):
    return C.pooling(x, C.AVG_POOLING, (2, 2), (2, 2))


# Build the graph for the VGG network (excluding fully connected layers)
def model(x, layers):
    model_layers = {}
    def convolutional(z): return len(z) == 2 and len(z[0].shape) == 4
    conv = [layer for layer in layers if convolutional(layer)]
    cnt = 0
    num_convs = {1: 2, 2: 2, 3: 3, 4: 3, 5: 3}
    for outer in range(1,6):
        for inner in range(num_convs[outer]):
            x = vggblock(x, conv[cnt], model_layers, 'conv%d_%d' % (outer, 1+inner))
            cnt += 1
        x = vggpool(x)

    return x, C.combine([model_layers[k] for k in sorted(model_layers.keys())])

定义成本函数

在本教程中比较有意思的部分是定义一个成本函数，当优化完成之后，导致的结果一张内容与一张图片相似但是风格与另一张图片相识的图片。这个成本函数包含很多项目，其中有一些是VGG神经网络创建时定义的。具体来说，成本函数以待计算图像x为参数，计算内容损失，风格损失以及总畸变损失的加权和：

L (x) = α C (x) + β S (x) + T (x)

$L(x)=\alpha C(x) + \beta S(x) + T(x)$

其中 $\alpha$ 和 $\beta$ 是分别是内容损失和风格损失的权重。我们对权重进行了归一化处理，所以在总畸变损失之前的加起来是1.他们分别如何计算呢？

总畸变损失 $T(x)$ 是最容易理解的：他衡量相邻像素值平方差之和的平均值，他的变小会让图像锐度降低。我们通过使用一个包含（-1,1）的卷积核对图像进行横向和纵向的卷积运算，对结果进行平方运算，然后计算他们的平均值。
内容损失衡量内容图片和待计算图片之间的平方差。我们既可以直接计算原始像素的差值，也可以计算VGG网络里面各个层的差值。因为内容图像是固定的，所以虽然其需要依赖内容图片实现，我们并没有将其写入公式。
风格损失 $S(x)$ 与内容损失类似，也需要依靠另一张图片实现。Leon A. Gatys等人提出的风格的定义是神经网络中节点激活值的相互关系，衡量风格损失也就是这些关系的平方差。具体来说，对于特定的网络层，我们计算输出通道中所有为位置的协方差矩阵平均值。风格损失就是风格图像的协方差矩阵和待计算图像的协方差矩阵之间的均方误差。我们故意没说特定层是哪一层，不同的实现方式有不同的计算方法，下面我们将使用所有层所有风格损失的加权和。

def flatten(x):
    assert len(x.shape) >= 3
    return C.reshape(x, (x.shape[-3], x.shape[-2] * x.shape[-1]))


def gram(x):
    features = C.minus(flatten(x), C.reduce_mean(x))
    return C.times_transpose(features, features)


def npgram(x):
    features = np.reshape(x, (-1, x.shape[-2]*x.shape[-1])) - np.mean(x)
    return features.dot(features.T)


def style_loss(a, b):
    channels, x, y = a.shape
    assert x == y
    A = gram(a)
    B = npgram(b)
    return C.squared_error(A, B)/(channels**2 * x**4)


def content_loss(a,b):
    channels, x, y = a.shape
    return C.squared_error(a, b)/(channels*x*y)


def total_variation_loss(x):
    xx = C.reshape(x, (1,)+x.shape)
    delta = np.array([-1, 1], dtype=np.float32)
    kh = C.constant(value=delta.reshape(1, 1, 1, 1, 2))
    kv = C.constant(value=delta.reshape(1, 1, 1, 2, 1))
    dh = C.convolution(kh, xx, auto_padding=[False])
    dv = C.convolution(kv, xx, auto_padding=[False])
    avg = 0.5 * (C.reduce_mean(C.square(dv)) + C.reduce_mean(C.square(dh)))
    return avg

计算成本

现在我们准备使用两张图片计算成本。我们会使用一张波兰的风景照片和一张梵高的《星空》。我们先定义几个调节参数，他们的解释如下：

依据代码在CPU或者GPU上运行，我们设置图像的大小分别是64×64或300×300，以及对应的调整优化循环的次数来加速处理的进程，方便我们实验。当然如果你想要更好的结果，你可以使用更大的图片。如果你只有CPU，这就需要花一会了。
内容权重和风格权重是影响结果图片最主要的参数。
衰减系数是一个在（0,1）之间的数字，决定个网络层的贡献值。根据Novak和Nikulin的研究，所有的网络层都影响内容损失和风格损失。VGG神经网络中的输入层对内容损失的影响最大，后面的层的权重随着层数的增加指数级减少。VGG神经网络中的输出层对风格损失的影响最大，之前的层的权重随着离输出层的距离指数级减少。我们和Novak和Nikulin的文章中一样，衰减系数设为0.5。
inner和outer参数定义我们如何得到最终结果。我们将在成本值最小的时候获取outer个截图。每经过inner次优化我们都会得到一个截图。
最后，非常重要的一点是我们用到的已经训练好的模型是如何训练的。具体来说，在训练时，训练集中每个样本的红绿蓝三色通道都被减掉了一个常数向量，这让输入的数据以0为中心，方便训练执行。如果我们自己的图片不减掉这个常数向量，我们的数据和训练时使用的数据就会不太一样，也得不到好的结果。这个向量是下面的SHIFT。

style_path = 'style.jpg'
content_path = 'content.jpg'

start_from_random = False
content_weight = 5.0
style_weight = 1.0
decay = 0.5

if is_test():
    outer = 2
    inner = 2
    SIZE = 64
else:
    outer = 10
    inner = 20
    SIZE = 300

SHIFT = np.reshape([103.939, 116.779, 123.68], (3, 1, 1)).astype('f')

def load_image(path):
    with Image.open(path) as pic:
        hw = pic.size[0] / 2
        hh = pic.size[1] / 2
        mh = min(hw,hh)
        cropped = pic.crop((hw - mh, hh - mh, hw + mh, hh + mh))
        array = np.array(cropped.resize((SIZE,SIZE), Image.BICUBIC), dtype=np.float32)
        return np.ascontiguousarray(np.transpose(array, (2,0,1)))-SHIFT

def save_image(img, path):
    sanitized_img = np.maximum(0, np.minimum(255, img+SHIFT))
    pic = Image.fromarray(np.uint8(np.transpose(sanitized_img, (1, 2, 0))))
    pic.save(path)

def ordered_outputs(f, binding):
    _, output_dict = f.forward(binding, f.outputs)
    return [np.squeeze(output_dict[out]) for out in f.outputs]

# download the images if they are not available locally
for local_path in content_path, style_path:
    if not os.path.exists(local_path):
        download('https://cntk.ai/jup/%s' % local_path, local_path)

# Load the images
style   = load_image(style_path)
content = load_image(content_path)

# Display the images
for img in content, style:
    plt.figure()
    plt.imshow(np.asarray(np.transpose(img+SHIFT, (1, 2, 0)), dtype=np.uint8))

# Push the images through the VGG network 
# First define the input and the output
y = C.input_variable((3, SIZE, SIZE), needs_gradient=True)
z, intermediate_layers = model(y, layers)
# Now get the activations for the two images
content_activations = ordered_outputs(intermediate_layers, {y: [[content]]})
style_activations = ordered_outputs(intermediate_layers, {y: [[style]]})
style_output = np.squeeze(z.eval({y: [[style]]}))

# Finally define the loss
n = len(content_activations)
# makes sure that changing the decay does not affect the magnitude of content/style
total = (1-decay**(n+1))/(1-decay)
loss = (1.0/total * content_weight * content_loss(y, content) 
         + 1.0/total * style_weight * style_loss(z, style_output) 
         + total_variation_loss(y))

for i in range(n):
    loss = (loss 
        + decay**(i+1)/total * content_weight * content_loss(intermediate_layers.outputs[i], content_activations[i])
        + decay**(n-i)/total * style_weight   *   style_loss(intermediate_layers.outputs[i], style_activations[i]))

优化成本

现在我们做好了得到成本值最小时的图像的准备了。我们将使用scipy里的优化包，具体来说是LBFGS方法。LBFGS方法是一个非常棒的优化程序，像我们的案例中在计算完整梯度可行时，他非常受欢迎。

注意我们根据输入值计算梯度，这与我们之前根据网络中的参数计算梯度非常不同，默认情况下CNTK的输入变量不需要梯度，不过我们定义输入变量如下

y = C.input_variable((3, SIZE, SIZE), needs_gradient=True)

上述代码表示CNTK将根据输入变量计算梯度。

剩下的代码就比较简单了，最复杂的部分都是使用scipy优化包：

这个优化器只支持双精度向量，所以img2vec函数输入一个(3,SIZE,SIZE)大小的图像，将其转换成双精度的向量。
CNTK需要输入是图像但是scipy要求返回向量。
CNTK计算出来的梯度也是图像，但是scipy要求梯度是向量。

除开上面那些比较复杂的地方，我们输入内容图片，执行优化，展示最后的图片。

# utility to convert a vector to an image
def vec2img(x):
    d = np.round(np.sqrt(x.size / 3)).astype('i')
    return np.reshape(x.astype(np.float32), (3, d, d))

# utility to convert an image to a vector
def img2vec(img):
    return img.flatten().astype(np.float64)

# utility to compute the value and the gradient of f at a particular place defined by binding
def value_and_grads(f, binding):
    if len(f.outputs) != 1:
        raise ValueError('function must return a single tensor')
    df, valdict = f.forward(binding, [f.output], set([f.output]))
    value = list(valdict.values())[0]
    grads = f.backward(df, {f.output: np.ones_like(value)}, set(binding.keys()))
    return value, grads

# an objective function that scipy will be happy with
def objfun(x, loss):
    y = vec2img(x)
    v, g = value_and_grads(loss, {loss.arguments[0]: [[y]]})
    v = np.reshape(v, (1,))
    g = img2vec(list(g.values())[0])
    return v, g

# the actual optimization procedure
def optimize(loss, x0, inner, outer):
    bounds = [(-np.min(SHIFT), 255-np.max(SHIFT))]*x0.size
    for i in range(outer):
        s = opt.minimize(objfun, img2vec(x0), args=(loss,), method='L-BFGS-B', 
                         bounds=bounds, options={'maxiter': inner}, jac=True)
        print('objective : %s' % s.fun[0])
        x0 = vec2img(s.x)
        path = 'output_%d.jpg' % i
        save_image(x0, path)
    return x0

np.random.seed(98052)
if start_from_random:
    x0 = np.random.randn(3, SIZE, SIZE).astype(np.float32)
else:
    x0 = content
xstar = optimize(loss, x0, inner, outer)
plt.imshow(np.asarray(np.transpose(xstar+SHIFT, (1, 2, 0)), dtype=np.uint8))