我们如何通过 AI 模拟实现属于自己的清明上河图？

最新推荐文章于 2021-03-31 17:00:19 发布

CSDN资讯

最新推荐文章于 2021-03-31 17:00:19 发布

阅读量2.6k

点赞数 2

本文链接：https://blog.csdn.net/csdnnews/article/details/103105306

版权

640?wx_fmt=gif

640?wx_fmt=jpeg

作者 | 李秋键

责编 | 刘静

出品 | CSDN（ID：CSDNnews）

我们知道清明上河图是我国国画的代表作之一，是中国十大传世名画之一。为北宋风俗画，北宋画家张择端仅见的存世精品，属国宝级文物，现藏于北京故宫博物院。

清明上河图宽24.8厘米、长528.7厘米，绢本设色。作品以长卷形式，采用散点透视构图法，生动记录了中国十二世纪北宋都城东京(又称汴京，今河南开封 )的城市面貌和当时社会各阶层人民的生活状况，是北宋时期都城汴京当年繁荣的见证，也是北宋城市经济情况的写照。

这在中国乃至世界绘画史上都是独一无二的。在五米多长的画卷里，共绘了数量庞大的各色人物，牛、骡、驴等牲畜，车、轿、大小船只，房屋、桥梁、城楼等各有特色，体现了宋代建筑的特征。具有很高的历史价值和艺术价值。《清明上河图》虽然场面热闹，但表现的并非繁荣市景，而是一幅带有忧患意识的"盛世危图"，官兵懒散税务重。

而我们今天的项目就是通过对算法的改造，实现属于自己的清明上河图。

下面我们将利用vgg19模型训练画作，详细步骤如下，并且我在每个代码上面都注释了方便查看：

首先我们导入先关的库：

import tensorflow as tf
import numpy as np
import scipy.io
import scipy.misc
import os
import time

接着定义一些变量方便调用：

CONTENT_IMG = '1.png'
STYLE_IMG = 'sty.jpg'
OUTPUT_DIR = 'neural_style_transfer_tensorflow/'

再创建一个目录用来保存图片：

if not os.path.exists(OUTPUT_DIR):
   os.mkdir(OUTPUT_DIR)

定义生成图像的长宽通道等信息：

IMAGE_W = 400
IMAGE_H = 300
COLOR_C = 3

NOISE_RATIO = 0.7
BETA = 5
ALPHA = 100

再接着定义模型路径

VGG_MODEL = 'imagenet-vgg-verydeep-19.mat'

生成一个参数矩阵，作为图像的处理过程之一，对像素值运算：

MEAN_VALUES = np.array([123.68, 116.779, 103.939]).reshape((1, 1, 1, 3))

再接着定义读取模型函数，下面我都有所注解：

def load_vgg_model(path):
  '''
  Details of the VGG19 model:
  - 0 is conv1_1 (3, 3, 3, 64)
  - 1 is relu
  - 2 is conv1_2 (3, 3, 64, 64)
  - 3 is relu    
  - 4 is maxpool
  - 5 is conv2_1 (3, 3, 64, 128)
  - 6 is relu
  - 7 is conv2_2 (3, 3, 128, 128)
  - 8 is relu
  - 9 is maxpool
  - 10 is conv3_1 (3, 3, 128, 256)
  - 11 is relu
  - 12 is conv3_2 (3, 3, 256, 256)
  - 13 is relu
  - 14 is conv3_3 (3, 3, 256, 256)
  - 15 is relu
  - 16 is conv3_4 (3, 3, 256, 256)
  - 17 is relu
  - 18 is maxpool
  - 19 is conv4_1 (3, 3, 256, 512)
  - 20 is relu
  - 21 is conv4_2 (3, 3, 512, 512)
  - 22 is relu
  - 23 is conv4_3 (3, 3, 512, 512)
  - 24 is relu
  - 25 is conv4_4 (3, 3, 512, 512)
  - 26 is relu
  - 27 is maxpool
  - 28 is conv5_1 (3, 3, 512, 512)
  - 29 is relu
  - 30 is conv5_2 (3, 3, 512, 512)
  - 31 is relu
  - 32 is conv5_3 (3, 3, 512, 512)
  - 33 is relu
  - 34 is conv5_4 (3, 3, 512, 512)
  - 35 is relu
  - 36 is maxpool
  - 37 is fullyconnected (7, 7, 512, 4096)
  - 38 is relu
  - 39 is fullyconnected (1, 1, 4096, 4096)
  - 40 is relu
  - 41 is fullyconnected (1, 1, 4096, 1000)
  - 42 is softmax
  '''
  vgg = scipy.io.loadmat(path)
  vgg_layers = vgg['layers']
#加载vgg模型获取模型各层参数和名称
  def _weights(layer, expected_layer_name):
     W = vgg_layers[0][layer][0][0][2][0][0]
     b = vgg_layers[0][layer][0][0][2][0][1]
     layer_name = vgg_layers[0][layer][0][0][0][0]
     assert layer_name == expected_layer_name
     return W, b
#将加载的变量初始化成tf可运算的张量类型，函数返回值为激活函数的输出
  def _conv2d_relu(prev_layer, layer, layer_name):
     W, b = _weights(layer, layer_name)
     W = tf.constant(W)
     b = tf.constant(np.reshape(b, (b.size)))
     return tf.nn.relu(tf.nn.conv2d(prev_layer, filter=W, strides=[1, 1, 1, 1], padding='SAME') + b)
#定义池化层函数
  def _avgpool(prev_layer):
     return tf.nn.avg_pool(prev_layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
#将各层输出值都放到列表中方便加载，形成字典
  graph = {}
  graph['input']    = tf.Variable(np.zeros((1, IMAGE_H, IMAGE_W, COLOR_C)), dtype='float32')
  #定义['conv1_1']为vgg模型的第0层，输入层为上一层的['input' ]
  graph['conv1_1']  = _conv2d_relu(graph['input'], 0, 'conv1_1')
  graph['conv1_2']  = _conv2d_relu(graph['conv1_1'], 2, 'conv1_2')
  graph['avgpool1'] = _avgpool(graph['conv1_2'])
  graph['conv2_1']  = _conv2d_relu(graph['avgpool1'], 5, 'conv2_1')
  graph['conv2_2']  = _conv2d_relu(graph['conv2_1'], 7, 'conv2_2')
  graph['avgpool2'] = _avgpool(graph['conv2_2'])
  graph['conv3_1']  = _conv2d_relu(graph['avgpool2'], 10, 'conv3_1')
  graph['conv3_2']  = _conv2d_relu(graph['conv3_1'], 12, 'conv3_2')
  graph['conv3_3']  = _conv2d_relu(graph['conv3_2'], 14, 'conv3_3')
  graph['conv3_4']  = _conv2d_relu(graph['conv3_3'], 16, 'conv3_4')
  graph['avgpool3'] = _avgpool(graph['conv3_4'])
  graph['conv4_1']  = _conv2d_relu(graph['avgpool3'], 19, 'conv4_1')
  graph['conv4_2']  = _conv2d_relu(graph['conv4_1'], 21, 'conv4_2')
  graph['conv4_3']  = _conv2d_relu(graph['conv4_2'], 23, 'conv4_3')
  graph['conv4_4']  = _conv2d_relu(graph['conv4_3'], 25, 'conv4_4')
  graph['avgpool4'] = _avgpool(graph['conv4_4'])
  graph['conv5_1']  = _conv2d_relu(graph['avgpool4'], 28, 'conv5_1')
  graph['conv5_2']  = _conv2d_relu(graph['conv5_1'], 30, 'conv5_2')
  graph['conv5_3']  = _conv2d_relu(graph['conv5_2'], 32, 'conv5_3')
  graph['conv5_4']  = _conv2d_relu(graph['conv5_3'], 34, 'conv5_4')
  graph['avgpool5'] = _avgpool(graph['conv5_4'])

  return graph

为了实现自己的项目效果，设定损失函数：

#定义内容损失函数，变量为tf计算图和vgg模型参数，返回值为损失值
def content_loss_func(sess, model):
  #p就是model['conv4_2'])参数，x是model['conv4_2'])
  def _content_loss(p, x):
  #p的值为Tensor("Relu_9:0", shape=(1, 75, 100, 512), dtype=float32)，故N为512，M为75*100，分别为卷积核个数，卷积核大小的宽*100
     N = p.shape[3]
     M = p.shape[1] * p.shape[2]
     return (1 / (4 * N * M)) * tf.reduce_sum(tf.pow(x - p, 2))
  return _content_loss(sess.run(model['conv4_2']), model['conv4_2'])

STYLE_LAYERS = [('conv1_1', 0.5), ('conv2_1', 1.0), ('conv3_1', 1.5), ('conv4_1', 3.0), ('conv5_1', 4.0)]
#返回值为_style_loss的值*0.5,1,1.5,4的加和
def style_loss_func(sess, model):
  def _gram_matrix(F, N, M):
     Ft = tf.reshape(F, (M, N))
     return tf.matmul(tf.transpose(Ft), Ft)
  #a,x都为'conv1_1', conv2_1', 'conv3_1', 'conv4_1'，'conv5_1'中的参数遍历
  def _style_loss(a, x):
     #同内容损失函数
     N = a.shape[3]
     M = a.shape[1] * a.shape[2]
     A = _gram_matrix(a, N, M)
     G = _gram_matrix(x, N, M)
     return (1 / (4 * N ** 2 * M ** 2)) * tf.reduce_sum(tf.pow(G - A, 2))

  return sum([_style_loss(sess.run(model[layer_name]), model[layer_name]) * w for layer_name, w in STYLE_LAYERS])

再定义生成图片，读取图片，保存图片函数：

#产生噪声图片
def generate_noise_image(content_image, noise_ratio=NOISE_RATIO):
   #随机产生矩阵图片，矩阵元素内容符合标准正太分布
   noise_image = np.random.uniform(-20, 20, (1, IMAGE_H, IMAGE_W, COLOR_C)).astype('float32')
   #将产生的矩阵内各元素与神经网络加和
   input_image = noise_image * noise_ratio + content_image * (1 - noise_ratio)
   return input_image
#读取图片，改变尺寸，变成1行多列矩阵，将矩阵与初始值相减返回
def load_image(path):
   image = scipy.misc.imread(path)
   image = scipy.misc.imresize(image, (IMAGE_H, IMAGE_W))
   #image.shape为[800,600,3],则(1, ) + image.shape)为[1,800,600,3]
   image = np.reshape(image, ((1, ) + image.shape))
   #MEAN_VALUES = np.array([123.68, 116.779, 103.939]).reshape((1, 1, 1, 3))
   #其中image为三通道矩阵，MEAN_VALUES为三维矩阵可以相减
   image = image - MEAN_VALUES
   return image
#保存图片
def save_image(path, image):
   image = image + MEAN_VALUES
   #参见上面图像加载时多加了1维，故形成时要减少维度，
   image = image[0]
   #截取所有数值在0-255之间的，因为像素值必须是这个范围。而参数运算后可能会超过这个值
   image = np.clip(image, 0, 255).astype('uint8')
   #保存
   scipy.misc.imsave(path, image)

下面是训练加载：

#启动计算图
with tf.Session() as sess:
   #读取图片，返回值为减去MEAN_VALUES的矩阵，矩阵形状为[1,800,600,3]
   content_image = load_image(CONTENT_IMG)
   style_image = load_image(STYLE_IMG)
   #加载vgg19模型，返回值为一个字典，里面为各网络层参数，输入和输出
   model = load_vgg_model(VGG_MODEL)
   #产生噪声图片，返回值为随机矩阵加上网络层参数的新矩阵
   input_image = generate_noise_image(content_image)
   #变量初始化
   sess.run(tf.global_variables_initializer())
   #从网络层input层开始运算内容图片矩阵
   sess.run(model['input'].assign(content_image))
   content_loss = content_loss_func(sess, model)
   # 从网络层input层开始运算内容图片矩阵
   sess.run(model['input'].assign(style_image))
   style_loss = style_loss_func(sess, model)
   #总损失为内容损失加上风格损失
   total_loss = BETA * content_loss + ALPHA * style_loss
   #建立优化器以调整参数
   optimizer = tf.train.AdamOptimizer(2.0)
   #优化器调整参数，使得损失为最小
   train = optimizer.minimize(total_loss)

   sess.run(tf.global_variables_initializer())
   # 从网络层input层开始运算形成新的图片
   sess.run(model['input'].assign(input_image))

   ITERATIONS = 2000
   #训练2000轮
   for i in range(ITERATIONS):
      sess.run(train)
      print('Iteration %d' % i)
      print('Cost: ', sess.run(total_loss))
      if i % 100 == 0:
         #每一百次加载一次网络参数以保存图片
         output_image = sess.run(model['input'])
         print('Iteration %d' % i)
         print('Cost: ', sess.run(total_loss))
         save_image(os.path.join(OUTPUT_DIR, 'output_%d.jpg' % i), output_image)

最终得到的效果如图所示：

640?wx_fmt=png

左边是电视里找的图片，右边是模拟的图片，由此可见生成的效果还是可以的。而这个程序的主要思路就是在一个生成随机矩阵的基础上，通过加载网络层训练参数，然后生成的矩阵值按比例乘以网络参数，然后把矩阵保存为图片即可达到模拟生成的效果。而其中参数的调整是基于深层次网络提取的图像特征按公式运算，通过优化器优化参数，通过训练次数的增加，参数也在逐渐改善，最终形成自己需要的图片效果。

作者简介：李秋键，CSDN 博客专家，CSDN达人课作者。硕士在读于中国矿业大学，开发有安卓武侠游戏一部，VIP视频解析，文意转换写作机器人等项目，发表论文若干，多次高数竞赛获奖等等。

声明：本文为作者原创投稿，未经允许请勿转载。

【END】