图片风格转移论文-思想中心以及code的解释-python

最新推荐文章于 2024-07-09 02:09:34 发布

jianafeng

最新推荐文章于 2024-07-09 02:09:34 发布

阅读量312

点赞数 1

分类专栏： image_style_transfer_CNN 文章标签：机器学习深度学习 python 人工智能计算机视觉

本文链接：https://blog.csdn.net/Jiana_Feng/article/details/111659872

版权

image_style_transfer_CNN 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

序言

图片风格转移论文是基于gray的论文实现的，论文地址是 https://arxiv.org/pdf/1508.06576v2.pdf ，论文名字叫做《A Neural Algorithm of Artistic Style》
基本思想就是提取一张照片的内容以及另一张照片的背景风格，结合在一起产生一张很有趣的照片，前几年很火的一个app叫做 Prisma，就是风格转移的落地化实现。同时，借鉴李飞飞 [1] 老师的这篇论文《Perceptual Losses for Real-Time Style Transfer and Super-Resolution》来提高图片转化的速度，从而实现工业化的可能性。

第一部分

首先参考前一个文章，https://blog.csdn.net/Jiana_Feng/article/details/110069450
这篇文章对论文的思想以及使用的方法有很好的理解。
首先，论文采取的是用预训练模型VGG。对于内容的提取，是采用最后的某一层来实现的。论文主张，当图片经过预训练模型的deep layers之后，呈现的更多是内容细节的提取。在浅层则更多的是颜色形状的提取。下面我们用python来验证一下这一思想的正确性。

Python验证

下面我们将会用一张白色图片做底色，另一张照片为一只狗的照片。如果能够在白色的照片上显示出狗狗的内容，那说明是内容的成功提取。当白色的照片上显示出狗狗照片背景的显示，则说明背景风格的成功提取。

内容的提取

upload packages

## upload packages
%matplotlib inline
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import PIL.Image
import IPython.display as display

load 预训练vgg19

vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet') 
vgg.trainable = False

构建content的model

#定义哪一深层的提取
content_layers = ['block5_conv2']
#构建模型
model_content=tf.keras.Model([vgg.input],vgg.get_layer(content_layers[0]).output)

现在模型构建好了，接下来是照片的预处理，拿狗狗的照片作为提取的照片。

content_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')

照片预处理

先处理content图片的处理

def image_process(path_to_img):
  img = tf.io.read_file(path_to_img)
  img = tf.image.decode_image(img, channels=3)
  img = tf.image.convert_image_dtype(img, tf.float32) #这个函数会讲像素变成0-1之间
  img = tf.image.resize(img, (512,512)) #resize 
  img = img[tf.newaxis, :]  #增加一个维度，进入模型的处理
  return img

照片归一化处理，提高精准度

image= image_process(content_path)
image *= 255
image = tf.keras.applications.vgg19.preprocess_input(image)
#带入模型
image_update = model_content(image)

再处理底色照片的处理

path = './white.jpeg'
target = image_process(path)
#变成可迭代的tf var
obj = tf.Variable(target, name = 'var')

定义损失函数

def get_loss_function(img):
  #对目标照片进行归一化处理。然后带入content 模型
  target = img*255
  target=  tf.keras.applications.vgg19.preprocess_input(target)
  target_update = model_content(target)
  #定义损失函数 
  loss =tf.reduce_mean(((image_update[0]) - target_update[0])**2) 
  return loss

根据损失函数，开始更新迭代照片

#定义优化器 opt
opt = tf.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)
#这个是为了把照片像素保证在0-1之间
def clip_0_1(image):
  return tf.clip_by_value(image, clip_value_min=0, clip_value_max=1)
  
#定义train的每一步
def train_step(image):
  with tf.GradientTape() as tape:
    loss = get_loss_function(image)

  grad = tape.gradient(loss, image)
  opt.apply_gradients([(grad, image)])
  image.assign(clip_0_1(image))

#开始迭代1500次 
iteration = 1500
for i in range(iteration):
  print(i)
  train_step(obj)

可视化

#显示图片
plt.figure(figsize=(8,10))
img = tf.squeeze(obj,axis=0)
plt.imshow(img)

在这里插入图片描述
由图可知，在空白的底色照片中，我们成功地提取了狗狗的内容特征。能够显示狗的内容。下面我们看看如何提取照片的风格，还是用狗狗的照片作为提取照片，空白照片作为底色。

风格的提取

其他的不变化，模型要重写，以及添加了gram matrix进入损失函数作为style的迭代目标。

重建模型 style model

style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1', 
                'block4_conv1', 
                'block5_conv1']
#构建style模型
outputs = [vgg.get_layer(name).output for name in style_layers]
model_style = tf.keras.Model([vgg.input], outputs)

定义gram matrix

def gram_matrix(input_tensor):
  channels = int(input_tensor.shape[-1])
  a = tf.reshape(input_tensor, [-1, channels])
  n = tf.shape(a)[0]
  gram = tf.matmul(a, a, transpose_a=True)
  return gram / tf.cast(n, tf.float32)

计算提取照片的gram matrix

style_outputs = model_style(image)
style_G = [gram_matrix(style_output) for style_output in style_outputs]

损失函数也要重写

def get_loss_function(img): 
  target = img*255
  target=  tf.keras.applications.vgg19.preprocess_input(target)
  target_G = [gram_matrix(target_output) for target_output in target_outputs] 
  loss = tf.add_n([(1/len(target_G)) * tf.reduce_mean((target_G[i]-style_G[i])**2) for i in range(len(target_G))])
  return loss

开始迭代

opt = tf.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)

def clip_0_1(image):
  return tf.clip_by_value(image, clip_value_min=0, clip_value_max=1)

def train_step(image):
  with tf.GradientTape() as tape:
    loss = get_loss_function(image)

  grad = tape.gradient(loss, image)
  opt.apply_gradients([(grad, image)])
  image.assign(clip_0_1(image)) 
obj = tf.Variable(target, name = 'var')
iteration = 1500
for i in range(iteration):
  print(i)
  train_step(obj)

可视化

plt.figure(figsize=(6,6))
img = tf.squeeze(obj,axis=0)
plt.imshow(img)

在这里插入图片描述

由图可知，我们看不到内容了主要是图片背景的体现。草原清晰可见。并且背景分块随机的搅乱。
我们回想一下原来照片的样子，并作对比，就知道这两个提取方法是有效的。
原图如下：
在这里插入图片描述

于是把两者相结合，提取一张照片的风格，提取另一张照片的内容，结合在一起就可以产生一张很有趣的照片。第二个部分就是把两者结合在一起，产生一张有趣的照片

第二部分

先定义两张照片，内容提取照片选的是狗的照片，希望能够提取狗狗的样子。风格提取照片则是选了毕加索的那一张作为背景风格的提取，希望能提取乱七八槽的背景色彩。

决定两张照片

content_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')
style_path = tf.keras.utils.get_file('kandinsky5.jpg','https://storage.googleapis.com/download.tensorflow.org/example_images/Vassily_Kandinsky%2C_1913_-_Composition_7.jpg')

同理用上面的image process函数来处理照片

style = image_process(style_path)
style = style*255
style =  tf.keras.applications.vgg19.preprocess_input(style)

content = image_process(content_path)
content = content*255
content =  tf.keras.applications.vgg19.preprocess_input(content)
#用content照片作为底色 
target = image_process(content_path)

两个模型的定义

#(1)vgg pretrain model 
vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet') 
vgg.trainable = False

#(2)define which layer to extract 
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1', 
                'block4_conv1', 
                'block5_conv1']

content_layers = ['block5_conv2']

#(3)extract the defined layers and create into models 

# 3.1 build model for image style 
outputs = [vgg.get_layer(name).output for name in style_layers]
model_style = tf.keras.Model([vgg.input], outputs)
# 3.2 build model for image content 
model_content = tf.keras.Model([vgg.input], vgg.get_layer(content_layers[0]).output)

gram matrix还是用上面的函数，一模一样。

#get the style gram values 
style_outputs = model_style(style)
style_G = [gram_matrix(style_output) for style_output in style_outputs]    #output size is (5 ,1 ,filters_num,filters_num)

#get the content after content_model 
content_update = model_content(content)

现在两个模型的outpur也出来了，开始定义总的loss函数

定义损失函数

def get_loss_function(img, alpha=1e4, beta=0.01, style_weights=None ):
  '''
  -param: alpha is for content weights in total loss 
  -param: beta is for style weights in total loss 
  -param: style_weights is to tune gram matrix in style 
  '''

  #do the same thing on target (both gram matrix and content model)  
  target = img*255
  target=  tf.keras.applications.vgg19.preprocess_input(target)
  target_outputs = model_style(target)

  target_G = [gram_matrix(target_output) for target_output in target_outputs] 

  #get the style loss 
  if style_weights is None:
    #take the mean weight 
    loss_style = tf.add_n([(1/len(target_G)) * tf.reduce_mean((target_G[i]-style_G[i])**2) for i in range(len(target_G))])
  else:
    #use style weights we set
    loss_style = tf.add_n([style_weights[i] * tf.reduce_sum((style_G[i] - target_G[i])**2) for i in range(len(target_G))])

  #get the content loss 
  target_update = model_content(target)
  loss_content = tf.reduce_mean((content_update[0] - target_update[0])**2)
  #get the total loss 
  loss_total = alpha * loss_content + beta*loss_style
  return loss_total

开始迭代更新了

opt = tf.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)

def clip_0_1(image):
  return tf.clip_by_value(image, clip_value_min=0, clip_value_max=1)

def train_step(image,alpha,beta):
  with tf.GradientTape() as tape:
    loss = get_loss_function(image,alpha,beta)

  grad = tape.gradient(loss, image)
  opt.apply_gradients([(grad, image)])
  image.assign(clip_0_1(image))

obj = tf.Variable(target, name = 'var')
iteration = 1000
for i in range(iteration):
  print(i)
  train_step(obj)

plt.figure(figsize=(6,6))
img = tf.squeeze(obj,axis=0)
plt.imshow(img)

得到图片，如下：
在这里插入图片描述
我们现在回想原来的两张图片，长这个样子：

在这里插入图片描述

这两张图结合在一起就是上面的那张图。

补充

加上variation loss进去，可以让结合图更加smooth，更有神。而加上这个也很简单，因为tf有定义好的variation loss 函数，我们只要更改一下这部分即可

def train_step(image，variation_weight=30):
  with tf.GradientTape() as tape:
    loss = get_loss_function(image)
    #=====================================================================
    #加上这部分 以及参数的值
    loss += variation_weight * tf.image.total_variation(image).numpy()[0]
    #=====================================================================
    
  grad = tape.gradient(loss, image)
  opt.apply_gradients([(grad, image)])
  image.assign(clip_0_1(image))
  return loss