tensorflow(神经网络)学习笔记(四)卷积图像风格转化（笔记）

最新推荐文章于 2023-09-18 10:15:27 发布

cmzz

最新推荐文章于 2023-09-18 10:15:27 发布

阅读量576

点赞数

分类专栏：机器学习笔记文章标签： tensorflow 学习笔记

本文链接：https://blog.csdn.net/qq_40965177/article/details/102884191

版权

机器学习同时被 2 个专栏收录

23 篇文章 0 订阅

订阅专栏

笔记

16 篇文章 0 订阅

订阅专栏

卷积神经网络的应用

有监督:图像分类,图像识别
无监督:图像风格转化,图像修复,换脸,图像超清化,图像翻译,文字生成图像
在这里插入图片描述

卷积神经网络学到了什么

第一层神经网络学到了色彩
在这里插入图片描述
第二层神经网络学到了边缘信息

层次越深,神经网络学到的东西就越多,越来越抽象

图像风格转换-V1

内容特征:图像输入到CNN得到的某一层的激活值
风格特征(图像共性):图像输入到CNN得到的某一层的激活值之间的关联
风格转化

使用内容特征进行重建图像时,使用越高层的特征,得到的效果越不好.而风格特征则是相反.
在这里插入图片描述
Fij为第i张图片输入到第j层神经网络输出的激活值,Pij为另外一张i输入到j层神经网络的激活值.利用损失函数最小化Fij-Pij=loss,使得两张图片的内容相近.

风格特征是一种基于关联性的计算,计算公式如下:Gram矩阵.Fik,和Fj分别为多个神经核的输出,两两内积得到风格Gij,Aij为要生成图片的风格,然后最小化.
在这里插入图片描述
图片内容Ltotal=加权α乘Lcontent(内容平均值)+β乘Lstyle(风格平均值)

总模型如下:

Tensorfolw实现图像风格转化-V1算法
利用已有的模型vgg16
vgg结构如下

vgg16网络结构图

其中fc为全连接层，conv为卷积层
在这里插入图片描述


import os
import math
import numpy as np
import tensorflow as tf
from PIL import Image
import time
# 原有模型的RGB通道的值
VGG_MEAN = [103.939, 116.779, 123.68]
class VGGNet:
    """Builds VGG-16 net structure,
        load parameters from pre-train models.    
    """
    def __init__(self, data_dict):
        self.data_dict = data_dict
    # 将卷积分离出来
    def get_conv_filter(self, name):
        return tf.constant(self.data_dict[name][0], name='conv')
    # 得到权重
    def get_fc_weight(self, name):
        return tf.constant(self.data_dict[name][0], name='fc')
    # 得到偏置
    def get_bias(self, name):
        return tf.constant(self.data_dict[name][1], name='bias')
    
    # 卷积层
    def conv_layer(self, x, name):
        """Builds convolution layer."""
        with tf.name_scope(name):
            conv_w = self.get_conv_filter(name)
            conv_b = self.get_bias(name)
            h = tf.nn.conv2d(x, conv_w, [1,1,1,1], padding='SAME')
            h = tf.nn.bias_add(h, conv_b)
            h = tf.nn.relu(h)
            return h
    # pooling层
    def pooling_layer(self, x, name):
        """Builds pooling layer"""
        return tf.nn.max_pool(x,
                              ksize=[1,2,2,1],
                              strides=[1,2,2,1],
                              padding='SAME',
                              name=name
                              )
    # 全连接层
    def fc_layser(self, x, name, activation=tf.nn.relu):
        """Builds fuu """
        with tf.name_scope(name):
            fc_w = self.get_fc_weight(name)
            fc_b = self.get_bias(name)
            h = tf.matmul(x, fc_w)
            h = tf.nn.bias_add(h, fc_b)
            if activation is None:
                return h
            else:
                return activation(h)
    # 展平
    def flatten_layer(self, x, name):
        with tf.name_scope(name):
            # x = [batch_size, image_width, image_height, channel]
            x_shape = x.get_shape().as_list()
            dim = 1
            for d in x_shape[1:]:
                dim *= d
            x = tf.reshape(x, [-1, dim])
            return x   
    def build(self, x_rgb):
        """
        Build VGG16 network structure.
        :param x_rgb: [1, 224, 224, 3]
        :return: 
        """
        start_time = time.time()
        print('building model ....')
        r, g, b = tf.split(x_rgb, [1,1,1], axis=3)
        x_bgr = tf.concat(
            [b - VGG_MEAN[0],
             g - VGG_MEAN[1],
             r - VGG_MEAN[2],],
            axis=3
        )
        
        assert x_bgr.get_shape().as_list()[1:] == [224, 224, 3]
        
        self.conv1_1 = self.conv_layer(x_bgr, 'conv1_1')
        self.conv1_2 = self.conv_layer(self.conv1_1, 'conv1_2')
        self.pool1 = self.pooling_layer(self.conv1_2, 'pool1')
        
                
        self.conv2_1 = self.conv_layer(self.pool1, 'conv2_1')
        self.conv2_2 = self.conv_layer(self.conv2_1, 'conv2_2')
        self.pool2 = self.pooling_layer(self.conv2_2, 'pool2')
        
                
        self.conv3_1 = self.conv_layer(self.pool2, 'conv3_1')
        self.conv3_2 = self.conv_layer(self.conv3_1, 'conv3_2')
        self.conv3_3 = self.conv_layer(self.conv3_2, 'conv3_3')
        self.pool3 = self.pooling_layer(self.conv3_3, 'pool3')        
                
        self.conv4_1 = self.conv_layer(self.pool3, 'conv4_1')
        self.conv4_2 = self.conv_layer(self.conv4_1, 'conv4_2')
        self.conv4_3 = self.conv_layer(self.conv4_2, 'conv4_3')
        self.pool4 = self.pooling_layer(self.conv4_3, 'pool4')        
                
        self.conv5_1 = self.conv_layer(self.pool4, 'conv5_1')
        self.conv5_2 = self.conv_layer(self.conv5_1, 'conv5_2')
        self.conv5_3 = self.conv_layer(self.conv5_2, 'conv5_3')
        self.pool5 = self.pooling_layer(self.conv5_3, 'pool5')
        
        '''
        self.flatten5 = self.flatten_layer(self.pool5, 'flatten')
        self.fc6 = self.fc_layser(self.flatten5, 'fc6')
        self.fc7 = self.fc_layser(self.fc6, 'fc7')
        self.fc8 = self.fc_layser(self.fc7, 'fc8', activation=None)
        self.prob = tf.nn.softmax(self.fc8, name='prob')
        '''
        print('Building model finished:{}'.format(time.time()- start_time))
# gugon = r'.\deep_learn\images\gugong.jpg'
# gugong = Image.open(gugon)
# gugong_resize = gugong.resize((224, 224), Image.ANTIALIAS)
# gugong_resize.save(r'.\deep_learn\images\gugong_resize.jpg')
#%%

vgg_npy_path = r'.\deep_learn\vgg16.npy'
# data_dict = np.load(vgg_npy_path, encoding='latin1').item()
# vgg16_for_result = VGGNet(data_dict)
# content = tf.placeholder(tf.float32, shape=[1, 224, 224, 3])
# vgg16_for_result.build(content)
style_img_path = r'.\deep_learn\images\xingyueye_resize.jpeg'
content_img_path = r'.\deep_learn\images\gugong_resize.jpg'

num_steps = 100
learning_rate = 10
lambda_c = 0.1
lambda_s = 500

output_dir = r'./run_style_trainsfer'

if not os.path.exists(output_dir):
    os.mkdir(output_dir)



def initial_result(shape, mean, sdddev):
    initail = tf.truncated_normal(shape, mean, sdddev)
    return tf.Variable(initail)

def read_img(img_name):
    img = Image.open(img_name)
    np_img = np.array(img)  # (224, 224, 3)
    np_img = np.asarray([np_img], dtype=np.int32) # (1, 224,224, 3)
    print(np_img.shape)
    return np_img
# 计算两张图片的相似度
def gram_matrix(x):
    """Calulates gram matrix
    :param x: features extracted from VGG Net. shape: [1, width, height, ch]
    :return: 
    """
    b, w, h, ch = x.get_shape().as_list()
    features = tf.reshape(x, [b, h*w, ch]) 
    gram = tf.matmul(features, features, adjoint_a=True) \
        / tf.constant(ch * w * h, tf.float32)
    return gram
    
    
result = initial_result((1, 224, 224, 3), 127.5, 20)
content_val = read_img(content_img_path)
style_val = read_img(style_img_path)

content = tf.placeholder(tf.float32, shape=[1, 224, 224, 3])
style = tf.placeholder(tf.float32, shape=[1, 224, 224, 3])

data_dict = np.load(vgg_npy_path, encoding='latin1').item()
vgg_for_content = VGGNet(data_dict)
vgg_for_style = VGGNet(data_dict)
vgg_for_result = VGGNet(data_dict)

vgg_for_content.build(content)
vgg_for_style.build((style))
vgg_for_result.build(result)

# 提取VGGnet不同层次的卷积层， 不同层次的卷积对整体影响不同
content_features = [
    vgg_for_content.conv1_2,
    vgg_for_content.conv2_2,
    # vgg_for_content.conv3_3,
    # vgg_for_content.conv4_3,
    # vgg_for_content.conv5_3,
]

result_content_features = [
    vgg_for_result.conv1_2,
    vgg_for_result.conv2_2,
    # vgg_for_result.conv3_3,
    # vgg_for_result.conv4_3,
    # vgg_for_result.conv5_3,
    
]
style_features = [
    # vgg_for_style.conv1_2,
    # vgg_for_style.conv2_2,
    # vgg_for_style.conv3_3,
    vgg_for_style.conv4_3,
    # vgg_for_style.conv5_3,
]
style_gram = [gram_matrix(feature) for feature in style_features]

result_style_features = [
    # vgg_for_result.conv1_2,
    # vgg_for_result.conv2_2,
    # vgg_for_result.conv3_3,
    vgg_for_result.conv4_3,
    # vgg_for_result.conv5_3,
]
result_style_gram = [gram_matrix(feature) for feature in result_style_features]
# 内容loss
content_loss = tf.zeros(1, tf.float32)
for c, c_ in zip(content_features, result_content_features):
    content_loss += tf.reduce_mean((c - c_) ** 2, [1, 2, 3])

style_loss = tf.zeros(1, tf.float32)
for s, s_ in zip(style_gram, result_style_gram):
    style_loss += tf.reduce_mean((s - s_) ** 2, [1, 2])
# (内容与风格)加权

loss = content_loss * lambda_c + style_loss * lambda_s

train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss)


init_op = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init_op)
    for step in range(num_steps):
        loss_value, content_loss_value, style_loss_value, _ = \
        sess.run([loss, content_loss, style_loss, train_op],
                 feed_dict = {
                     content:content_val,
                     style:style_val
                 })
        print('step: {}, loss_value: {}, content_loss: {} style_loss: {}' \
              .format(step+1,
                      loss_value[0],
                      content_loss_value[0],
                      style_loss_value[0],
                      ))
        result_img_path = os.path.join(
            output_dir, 'result-{}.jpg'.format(step+1)
        )
        result_val = result.eval(sess)[0]
        result_val = np.clip(result_val, 0, 255)
        img_arr = np.asarray(result_val, np.uint8)
        img = Image.fromarray(img_arr)
        img.save(result_img_path)
    # content = read_img(content_img_path)
    # style = read_img(style_img_path)

风格图如下：（大小先剪裁成224,224）
在这里插入图片描述
内容图如下：

融合如下（第十五张）：

第九十九张如下：

优化

在这里插入图片描述

亦可以做成超清图像转化器，Ys去掉，X为低分辨率图片，y^为高分辨率图片，让神经网络学习到从低分辨转化到高分辨图片。

V1和V2都引用了一个风格loss的东西：在卷积层的输出，多通道与多通道之间两两之间计算相似度，得到gram矩阵，有效但不知道为何有效，都存在一个风格loss无法定义的东西

def gram_matrix(x):
    """Calulates gram matrix
    :param x: features extracted from VGG Net. shape: [1, width, height, ch]
    :return: 
    """
    b, w, h, ch = x.get_shape().as_list()
    features = tf.reshape(x, [b, h*w, ch]) 
    gram = tf.matmul(features, features, adjoint_a=True) \
        / tf.constant(ch * w * h, tf.float32)
    return gram