TensorFlow学习笔记

最新推荐文章于 2024-08-16 08:18:10 发布

AI吃大瓜

最新推荐文章于 2024-08-16 08:18:10 发布

阅读量3.6k

点赞数 3

分类专栏： TensoFlow 学习笔记深度学习文章标签： TensorFlow学习笔记 slim学习

本文为博主原创文章，未经博主允许不得转载（AI吃大瓜）

本文链接：https://blog.csdn.net/guyuealian/article/details/83011655

版权

深度学习同时被 3 个专栏收录

109 篇文章 244 订阅

订阅专栏

TensoFlow

26 篇文章 10 订阅

订阅专栏

学习笔记

8 篇文章 0 订阅

订阅专栏

TensorFlow学习笔记

2.4 全局平局池化 global average pooling

tf.nn.conv2d_transpose反卷积函数

tf.nn.atrous_conv2d空洞卷积(dilated convolution)

六、Tensorflow 2.0与tf.keras

七、Tensorflow优化等问题

八、可视化工具

九、tensorflow中创建多个计算图(Graph)

Gram矩阵

一、TF-slim库

1.TF-slim使用方法

官方教材：https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim

翻译：https://blog.csdn.net/guvcolie/article/details/77686555

import tensorflow.contrib.slim as slim  
import tensorflow as tf
"""
例如，创建一个权值变量，并且用truncated_normal初始化，用L2损失正则化，放置于CPU中，我们只需要定义如下
"""
weights = slim.variable('weights',  
                             shape=[10, 10, 3 , 3],  
                             initializer=tf.truncated_normal_initializer(stddev=0.1),  
                             regularizer=slim.l2_regularizer(0.05),  
                             device='/CPU:0')  



"""
我们一般这么定义网络
"""
input = ...  
with tf.name_scope('conv1_1') as scope:  
  kernel = tf.Variable(tf.truncated_normal([3, 3, 64, 128], dtype=tf.float32,  
                                           stddev=1e-1), name='weights')  
  conv = tf.nn.conv2d(input, kernel, [1, 1, 1, 1], padding='SAME')  
  biases = tf.Variable(tf.constant(0.0, shape=[128], dtype=tf.float32),  
                       trainable=True, name='biases')  
  bias = tf.nn.bias_add(conv, biases)  
  conv1 = tf.nn.relu(bias, name=scope) 
  
"""
TF-Slim在更抽象的神经网络层的层面上提供了大量方便使用的操作符。比如，将上面的代码和TF-Slim响应的代码调用进行比较：
"""
input = ...  
net = slim.conv2d(input, 128, [3, 3], scope='conv1_1')      

"""
例如，VGG网络的一个片段，这个网络在两个池化层之间就有许多卷积层的堆叠
"""
net = ...  
net = slim.conv2d(net, 256, [3, 3], scope='conv3_1')  
net = slim.conv2d(net, 256, [3, 3], scope='conv3_2')  
net = slim.conv2d(net, 256, [3, 3], scope='conv3_3')  
net = slim.max_pool2d(net, [2, 2], scope='pool2')  

"""
使用TF-Slim的repeat操作符，代码看起来会更简洁：

slim.repeat不但可以在一行中使用相同的参数，而且还能智能地展开scope，
即每个后续的slim.conv2d调用所对应的scope都会追加下划线及迭代数字。
更具体地讲，上面代码的scope分别为 'conv3/conv3_1', 'conv3/conv3_2' and 'conv3/conv3_3'.
"""
net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')  
net = slim.max_pool2d(net, [2, 2], scope='pool2')  

"""
slim.stack,前一个层的输出是下一层的输入。而每个网络层的输出通道数从32变到64，再到128. 
同样，我们可以用stack简化一个多卷积层塔
"""
# Verbose way:  
x = slim.conv2d(x, 32, [3, 3], scope='core/core_1')  
x = slim.conv2d(x, 32, [1, 1], scope='core/core_2')  
x = slim.conv2d(x, 64, [3, 3], scope='core/core_3')  
x = slim.conv2d(x, 64, [1, 1], scope='core/core_4')  
  
# Using stack:  
slim.stack(x, slim.conv2d, [(32, [3, 3]), (32, [1, 1]), (64, [3, 3]), (64, [1, 1])], scope='core') 

"""
TF-Slim添加了一种叫做arg_scope的scope机制。
这种scope允许用户在arg_scope中指定若干操作符以及一批参数，这些参数会传给前面所有的操作符中。
arg_scope使代码更简洁且易于维护。注意，在arg_scope中被指定的参数值，也可以在局部位置进行覆盖。
比如，padding参数设置为'SAME', 而第二个卷积层仍然可以通过把它设为'VALID'而覆盖掉arg_scope中的默认设置。
"""

net = slim.conv2d(inputs, 64, [11, 11], 4, padding='SAME',  
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),  
                  weights_regularizer=slim.l2_regularizer(0.0005), scope='conv1')  
net = slim.conv2d(net, 128, [11, 11], padding='VALID',  
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),  
                  weights_regularizer=slim.l2_regularizer(0.0005), scope='conv2')  
net = slim.conv2d(net, 256, [11, 11], padding='SAME',  
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),  
                  weights_regularizer=slim.l2_regularizer(0.0005), scope='conv3')  


with slim.arg_scope([slim.conv2d], padding='SAME',  
                      weights_initializer=tf.truncated_normal_initializer(stddev=0.01),  
                      weights_regularizer=slim.l2_regularizer(0.0005)):  
    net = slim.conv2d(inputs, 64, [11, 11], scope='conv1')  
    net = slim.conv2d(net, 128, [11, 11], padding='VALID', scope='conv2')  
    net = slim.conv2d(net, 256, [11, 11], scope='conv3')  
"""
通过整合TF-Slim的变量、操作符和scope，我们可以用寥寥几行代码写一个通常非常复杂的网络。
例如，完整的VGG结构只需要用下面的一小段代码定义：
"""

def vgg16(inputs):  
  with slim.arg_scope([slim.conv2d, slim.fully_connected],  
                      activation_fn=tf.nn.relu,  
                      weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),  
                      weights_regularizer=slim.l2_regularizer(0.0005)):  
    net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')  
    net = slim.max_pool2d(net, [2, 2], scope='pool1')  
    net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')  
    net = slim.max_pool2d(net, [2, 2], scope='pool2')  
    net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')  
    net = slim.max_pool2d(net, [2, 2], scope='pool3')  
    net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')  
    net = slim.max_pool2d(net, [2, 2], scope='pool4')  
    net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')  
    net = slim.max_pool2d(net, [2, 2], scope='pool5')  
    net = slim.fully_connected(net, 4096, scope='fc6')  
    net = slim.dropout(net, 0.5, scope='dropout6')  
    net = slim.fully_connected(net, 4096, scope='fc7')  
    net = slim.dropout(net, 0.5, scope='dropout7')  
    net = slim.fully_connected(net, 1000, activation_fn=None, scope='fc8')
"""
训练一个tensorflow模型，需要一个网络模型，一个损失函数，梯度计算方式和用于迭代计算模型权重的训练过程。
TF-Slim提供了损失函数，同时也提供了一批运行训练和评估模型的帮助函数。
"""
import tensorflow as tf  
  
slim = tf.contrib.slim  
vgg = tf.contrib.slim.nets.vgg  
  
...  
  
train_log_dir = ...  
if not tf.gfile.Exists(train_log_dir):  
  tf.gfile.MakeDirs(train_log_dir)  
  
with tf.Graph().as_default():  
  # Set up the data loading:  
  images, labels = ...  
  
  # Define the model:  
  predictions = vgg.vgg16(images, is_training=True)  
  
  # Specify the loss function:  
  slim.losses.softmax_cross_entropy(predictions, labels)  
  
  total_loss = slim.losses.get_total_loss()  
  tf.summary.scalar('losses/total_loss', total_loss)  
  
  # Specify the optimization scheme:  
  optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001)  
  
  # create_train_op that ensures that when we evaluate it to get the loss,  
  # the update_ops are done and the gradient updates are computed.  
  train_tensor = slim.learning.create_train_op(total_loss, optimizer)  
  
  # Actually runs training.  
  slim.learning.train(train_tensor, train_log_dir)  
  
"""
评估模块
"""
import tensorflow as tf

slim = tf.contrib.slim

# Load the data
images, labels = load_data(...)

# Define the network
predictions = MyModel(images)

# Choose the metrics to compute:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    'accuracy': slim.metrics.accuracy(predictions, labels),
    'precision': slim.metrics.precision(predictions, labels),
    'recall': slim.metrics.recall(mean_relative_errors, 0.3),
})

# Create the summary ops such that they also print out to std output:
summary_ops = []
for metric_name, metric_value in names_to_values.iteritems():
  op = tf.summary.scalar(metric_name, metric_value)
  op = tf.Print(op, [metric_value], metric_name)
  summary_ops.append(op)

num_examples = 10000
batch_size = 32
num_batches = math.ceil(num_examples / float(batch_size))

# Setup the global step.
slim.get_or_create_global_step()

output_dir = ... # Where the summaries are stored.
eval_interval_secs = ... # How often to run the evaluation.
slim.evaluation.evaluation_loop(
    'local',
    checkpoint_dir,
    log_dir,
    num_evals=num_batches,
    eval_op=names_to_updates.values(),
    summary_op=tf.summary.merge(summary_ops),
    eval_interval_secs=eval_interval_secs)

一般网络定义方法

import tensorflow as tf

def resnet(input_image):

    with tf.variable_scope("generator"):

        W1 = weight_variable([9, 9, 3, 64], name="W1"); b1 = bias_variable([64], name="b1");
        c1 = tf.nn.relu(conv2d(input_image, W1) + b1)

        # residual 1

        W2 = weight_variable([3, 3, 64, 64], name="W2"); b2 = bias_variable([64], name="b2");
        c2 = tf.nn.relu(_instance_norm(conv2d(c1, W2) + b2))

        W3 = weight_variable([3, 3, 64, 64], name="W3"); b3 = bias_variable([64], name="b3");
        c3 = tf.nn.relu(_instance_norm(conv2d(c2, W3) + b3)) + c1

        # residual 2

        W4 = weight_variable([3, 3, 64, 64], name="W4"); b4 = bias_variable([64], name="b4");
        c4 = tf.nn.relu(_instance_norm(conv2d(c3, W4) + b4))

        W5 = weight_variable([3, 3, 64, 64], name="W5"); b5 = bias_variable([64], name="b5");
        c5 = tf.nn.relu(_instance_norm(conv2d(c4, W5) + b5)) + c3

        # residual 3

        W6 = weight_variable([3, 3, 64, 64], name="W6"); b6 = bias_variable([64], name="b6");
        c6 = tf.nn.relu(_instance_norm(conv2d(c5, W6) + b6))

        W7 = weight_variable([3, 3, 64, 64], name="W7"); b7 = bias_variable([64], name="b7");
        c7 = tf.nn.relu(_instance_norm(conv2d(c6, W7) + b7)) + c5

        # residual 4

        W8 = weight_variable([3, 3, 64, 64], name="W8"); b8 = bias_variable([64], name="b8");
        c8 = tf.nn.relu(_instance_norm(conv2d(c7, W8) + b8))

        W9 = weight_variable([3, 3, 64, 64], name="W9"); b9 = bias_variable([64], name="b9");
        c9 = tf.nn.relu(_instance_norm(conv2d(c8, W9) + b9)) + c7

        # Convolutional

        W10 = weight_variable([3, 3, 64, 64], name="W10"); b10 = bias_variable([64], name="b10");
        c10 = tf.nn.relu(conv2d(c9, W10) + b10)

        W11 = weight_variable([3, 3, 64, 64], name="W11"); b11 = bias_variable([64], name="b11");
        c11 = tf.nn.relu(conv2d(c10, W11) + b11)

        # Final

        W12 = weight_variable([9, 9, 64, 3], name="W12"); b12 = bias_variable([3], name="b12");
        enhanced = tf.nn.tanh(conv2d(c11, W12) + b12) * 0.58 + 0.5

    return enhanced

def adversarial(image_):

    with tf.variable_scope("discriminator"):

        conv1 = _conv_layer(image_, 48, 11, 4, batch_nn = False)
        conv2 = _conv_layer(conv1, 128, 5, 2)
        conv3 = _conv_layer(conv2, 192, 3, 1)
        conv4 = _conv_layer(conv3, 192, 3, 1)
        conv5 = _conv_layer(conv4, 128, 3, 2)
        
        flat_size = 128 * 7 * 7
        conv5_flat = tf.reshape(conv5, [-1, flat_size])

        W_fc = tf.Variable(tf.truncated_normal([flat_size, 1024], stddev=0.01))
        bias_fc = tf.Variable(tf.constant(0.01, shape=[1024]))

        fc = leaky_relu(tf.matmul(conv5_flat, W_fc) + bias_fc)

        W_out = tf.Variable(tf.truncated_normal([1024, 2], stddev=0.01))
        bias_out = tf.Variable(tf.constant(0.01, shape=[2]))

        adv_out = tf.nn.softmax(tf.matmul(fc, W_out) + bias_out)
    
    return adv_out

def weight_variable(shape, name):

    initial = tf.truncated_normal(shape, stddev=0.01)
    return tf.Variable(initial, name=name)

def bias_variable(shape, name):

    initial = tf.constant(0.01, shape=shape)
    return tf.Variable(initial, name=name)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def leaky_relu(x, alpha = 0.2):
    return tf.maximum(alpha * x, x)

def _conv_layer(net, num_filters, filter_size, strides, batch_nn=True):
    
    weights_init = _conv_init_vars(net, num_filters, filter_size)
    strides_shape = [1, strides, strides, 1]
    bias = tf.Variable(tf.constant(0.01, shape=[num_filters]))

    net = tf.nn.conv2d(net, weights_init, strides_shape, padding='SAME') + bias   
    net = leaky_relu(net)

    if batch_nn:
        net = _instance_norm(net)

    return net

def _instance_norm(net):

    batch, rows, cols, channels = [i.value for i in net.get_shape()]
    var_shape = [channels]

    mu, sigma_sq = tf.nn.moments(net, [1,2], keep_dims=True)
    shift = tf.Variable(tf.zeros(var_shape))
    scale = tf.Variable(tf.ones(var_shape))

    epsilon = 1e-3
    normalized = (net-mu)/(sigma_sq + epsilon)**(.5)

    return scale * normalized + shift

def _conv_init_vars(net, out_channels, filter_size, transpose=False):

    _, rows, cols, in_channels = [i.value for i in net.get_shape()]

    if not transpose:
        weights_shape = [filter_size, filter_size, in_channels, out_channels]
    else:
        weights_shape = [filter_size, filter_size, out_channels, in_channels]

    weights_init = tf.Variable(tf.truncated_normal(weights_shape, stddev=0.01, seed=1), dtype=tf.float32)
    return weights_init

2. 常用模块实现

2.1 残差单元

下图展示了两种形态的残差模块，左图是常规残差模块，有两个3×3卷积核卷积核组成，但是随着网络进一步加深，这种残差结构在实践中并不是十分有效。针对这问题，右图的“瓶颈残差模块”（bottleneck residual block）可以有更好的效果，它依次由1×1、3×3、1×1这三个卷积层堆积而成，这里的1×1的卷积能够起降维或升维的作用，从而令3×3的卷积可以在相对较低维度的输入上进行，以达到提高计算效率的目的。

简单的残差模块：

# 残差模块
def residual_block(net, reg, name):
    '''
    残差模块
    :param net:
    :return:
    '''
    input_nums = net.get_shape().as_list()[3]
    print("common_network inputs.shape:{}".format(net.get_shape()))
    with tf.variable_scope(name_or_scope=name):
        res = slim.conv2d(inputs=net,
                          num_outputs=64,
                          kernel_size=[1, 1],
                          padding="SAME",
                          scope="conv1",
                          activation_fn=tf.nn.relu,
                          weights_regularizer=reg)  # 1*1
        res = slim.conv2d(inputs=res,
                          num_outputs=64,
                          kernel_size=[3, 3],
                          padding="SAME",
                          scope="conv2",
                          activation_fn=tf.nn.relu,
                          weights_regularizer=reg)  # 3*3
        res = slim.conv2d(inputs=res,
                          num_outputs=input_nums,
                          kernel_size=[1, 1],
                          padding="SAME",
                          scope="conv3",
                          activation_fn=None,
                          weights_regularizer=reg)  # 1*1

        net = tf.nn.relu(tf.add(net, res))
    return net

其他方法：

'''
Created on 2018年7月2日

@author: Administrator
'''
import tensorflow as tf

'''
x 输入
filters 卷积核个数，也是输出通道数
bn_training_flag  bn时是否是训练阶段
pool_flag  是否池化
batch_norm_flag  是否批正则化
lamda   权重衰减参数（类似机器学习的正则化）
'''
def ResUnit_block(x,filters,bn_training_flag,pool_flag=False,batch_norm_flag=True,lamda = 0.01):
    #权重衰减
    regularizer = tf.contrib.layers.l2_regularizer(scale = lamda)
    res = x #将输入添加一个引用
    if pool_flag:
        x = tf.layers.max_pooling2d(x, pool_size=[2,2], strides=[2,2], padding="SAME")
#         x = max_pool_2x2(x, ksize_2d=[1,2,2,1], pool_strides_2d=[1,2,2,1], padding_str="SAME")
        res = tf.layers.conv2d(res, filters, kernel_size=[1,1], strides=(2, 2), padding='SAME')
    else:
        res = tf.layers.conv2d(res, filters, kernel_size=[1,1], strides=(1, 1), padding='SAME')
    if batch_norm_flag:
        x = tf.layers.batch_normalization(x,training = bn_training_flag)
    out = tf.layers.conv2d(x, filters, kernel_size=[3,3], strides=(1, 1), padding='SAME', activation=tf.nn.relu,kernel_regularizer=regularizer)
    if batch_norm_flag:
        out = tf.layers.batch_normalization(out,training=bn_training_flag)
    out = tf.layers.conv2d(out, filters, kernel_size=[3,3], strides=(1, 1), padding='SAME', activation=None,kernel_regularizer=regularizer)
    out = tf.nn.relu(tf.add(out,res))
    return out

'''
x 输入
out_channels   输出通道数
bottleneck_channels  在残差网络中计算3*3卷积的通道数
bn_training_flag  bn时是否是训练阶段
pool_flag  是否池化
batch_norm_flag  是否批正则化
lamda   权重衰减参数（类似机器学习的正则化）
'''
def ResUnit_bottleneck(x,out_channels,bottleneck_channels,bn_training_flag,pool_flag=False,batch_norm_flag=True,lamda = 0.01):
    #权重衰减
    regularizer = tf.contrib.layers.l2_regularizer(scale = lamda)
    res = x #将输入添加一个引用
    if pool_flag:
        x = tf.layers.max_pooling2d(x, pool_size=[2,2], strides=[2,2], padding="SAME")
#         x = max_pool_2x2(x, ksize_2d=[1,2,2,1], pool_strides_2d=[1,2,2,1], padding_str="SAME")
        res = tf.layers.conv2d(res, out_channels, kernel_size=[1,1], strides=(2, 2), padding='SAME')
    else:
        res = tf.layers.conv2d(res, out_channels, kernel_size=[1,1], strides=(1, 1), padding='SAME')
    if batch_norm_flag:
        x = tf.layers.batch_normalization(x,training = bn_training_flag)
    out = tf.layers.conv2d(x, bottleneck_channels, kernel_size=[1,1], strides=(1, 1), padding='SAME', activation=tf.nn.relu,kernel_regularizer=regularizer)
    if batch_norm_flag:
        out = tf.layers.batch_normalization(out,training = bn_training_flag)
    out = tf.layers.conv2d(out, bottleneck_channels, kernel_size=[3,3], strides=(1, 1), padding='SAME', activation=tf.nn.relu,kernel_regularizer=regularizer)
    if batch_norm_flag:
        out = tf.layers.batch_normalization(out,training = bn_training_flag)
    out = tf.layers.conv2d(out, out_channels, kernel_size=[1,1], strides=(1, 1), padding='SAME', activation=None,kernel_regularizer=regularizer)
    out = tf.nn.relu(tf.add(out,res))
    return out

2.2 dropout层

之前做分类的时候，Dropout 层一般加在全连接层防止过拟合提升模型泛化能力。而很少见到卷积层后接Drop out （原因主要是卷积参数少，不易过拟合），今天找了些博客，特此记录。

首先是一篇外文博客（他的一系列写的都很好）：Dropout Regularization For Neural Networks
也有中文翻译版的：基于Keras/Python的深度学习模型Dropout正则项

You can imagine that if neurons are randomly dropped out of the network during training, that other neurons will have to step in and handle the representation required to make predictions for the missing neurons. This is believed to result in multiple independent internal representations being learned by the network.

The effect is that the network becomes less sensitive to the specific weights of neurons. This in turn results in a network that is capable of better generalization and is less likely to overfit the training data.

您可以想象，如果在训练期间神经元随机掉出网络，其他神经元将不得不介入并处理对缺失神经元进行预测所需的表示。这被认为导致网络学习多个独立的内部表示。

其结果是网络对神经元的特定权重变得不那么敏感。这反过来导致网络能够更好地概括并且不太可能过度拟合训练数据。

在cifar数据集上使用Dropout的实例：92.45% on CIFAR-10 in Torch
这里面卷积层和全连接层都加了Dropout。But dropout values are usually < 0.5, e.g. 0.1, 0.2, 0.3 for the convolutional layers.

在附上提出Dropout的论文中的观点：

from the Srivastava/Hinton dropout paper:

“The additional gain in performance obtained by adding dropout in the convolutional layers (3.02% to 2.55%) is worth noting. One may have presumed that since the convolutional layers don’t have a lot of parameters, overfitting is not a problem and therefore dropout would not have much effect. However, dropout in the lower layers still helps because it provides noisy inputs for the higher fully connected layers which prevents them from overfitting.”
They use 0.7 prob for conv drop out and 0.5 for fully connected

由于卷积层没有很多参数，因此过度拟合不是问题，因此丢失不会产生太大影响。然而，较低层中的丢失仍然有帮助，因为它为较高的完全连接层提供了噪声输入，从而防止它们过度拟合。一般卷积层设置0.7，全连接层设置0.5）.

2.3 批规范化 batch_norm

slim.batch_norm接口：

def batch_norm(inputs,
               decay=0.999,
               center=True,
               scale=False,
               epsilon=0.001,
               activation_fn=None,
               param_initializers=None,
               param_regularizers=None,
               updates_collections=ops.GraphKeys.UPDATE_OPS,
               is_training=True,
               reuse=None,
               variables_collections=None,
               outputs_collections=None,
               trainable=True,
               batch_weights=None,
               fused=False,
               data_format=DATA_FORMAT_NHWC,
               zero_debias_moving_mean=False,
               scope=None,
               renorm=False,
               renorm_clipping=None,
               renorm_decay=0.99):

也可以自己定义：

# 批规范化
def batch_norm(net, train=True):
    # slim.batch_norm(inputs=net) #直接用slim.batch_norm(模块
    batch, rows, cols, channels = [i.value for i in net.get_shape()]
    var_shape = [channels]
    mu, sigma_sq = tf.nn.moments(net, [1,2], keep_dims=True)
    shift = tf.Variable(tf.zeros(var_shape))
    scale = tf.Variable(tf.ones(var_shape))
    epsilon = 1e-3
    normalized = (net-mu)/(sigma_sq + epsilon)**(.5)
    return scale * normalized + shift

slim.batch_norm里有moving_mean和moving_variance两个量，分别表示每个批次的均值和方差。当training时候，参数moving_mean和moving_variance的都需要update。看官网教程的解释和用法：

Note: when training, the moving_mean and moving_variance need to be updated.
By default the update ops are placed in `tf.GraphKeys.UPDATE_OPS`, so they
need to be added as a dependency to the `train_op`. For example:
```python

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
   # train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss)
   train_op = optimizer.minimize(loss)

```

# 定义占位符，X表示网络的输入，Y表示真实值label
X = tf.placeholder("float", [None, 224, 224, 3])
Y = tf.placeholder("float", [None, 100])
 
#调用含batch_norm的resnet网络，其中记得is_training=True
logits = model.resnet(X, 100, is_training=True)
cross_entropy = -tf.reduce_sum(Y*tf.log(logits))
 
#训练的op一定要用slim的slim.learning.create_train_op，只用tf.train.MomentumOptimizer.minimize（）是不行的
opt = tf.train.MomentumOptimizer(lr_rate, 0.9)
train_op = slim.learning.create_train_op(cross_entropy, opt, global_step=global_step)
 
#更新操作，具体含义不是很明白，直接套用即可
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
if update_ops:
    updates = tf.group(*update_ops)
    cross_entropy = control_flow_ops.with_dependencies([updates], cross_entropy)

之后的训练都和往常一样了，导出模型后，在测试阶段调用相同的网络，参数is_training一定要设置成False。

logits = model.resnet(X, 100, is_training=False)

否则，可能会出现这种情况：所有的单个图像分类，最后几乎全被归为同一类。这可能就是训练模式设置反了的问题。

相关说明：accuracy应该放在 update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)之后，如下面所示，否则会出现准确率很低的情况：

#波动
    loss1 = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=input_labels, logits=out))
    loss = loss1 #+ tf.reduce_sum(reg_ws)# 不加正则项loss<100,加上正则项loss>10000
    tf.summary.scalar("loss",loss)
    # train_op = tf.train.AdamOptimizer(base_lr).minimize(loss)
    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
        train_op = tf.train.AdamOptimizer(base_lr).minimize(loss)

    # accuracy = tf.reduce_mean(tf.cast(tf.equal(pred, input_labels), tf.float32))
    accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(out, 1), tf.argmax(input_labels, 1)), tf.float32))

参考资料：https://blog.csdn.net/jiruiYang/article/details/77202674

2.4 全局平局池化 global average pooling

# global average pooling全局平局池化,net shape=[batch_size,height,width,depths]
net = tf.reduce_mean(net,[1,2],keep_dims=True,name="GlobalPool")

2.5 tf.losses 模块

tf.losses 模块，实际就是对tf.nn.下面的loss函数的高级封装

loss = slim.losses.softmax_cross_entropy(predictions, labels)
对于多任务学习的loss，可以使用：
# Define the loss functions and get the total loss.
classification_loss = slim.losses.softmax_cross_entropy(scene_predictions, scene_labels)
sum_of_squares_loss = slim.losses.sum_of_squares(depth_predictions, depth_labels)
 
# The following two lines have the same effect:
total_loss = classification_loss + sum_of_squares_loss
total_loss = slim.losses.get_total_loss(add_regularization_losses=False)
如果使用了自己定义的loss，而又想使用slim的loss管理机制，可以使用：
pose_loss = MyCustomLossFunction(pose_predictions, pose_labels)
slim.losses.add_loss(pose_loss) 
total_loss = slim.losses.get_total_loss()
＃total_loss中包涵了pose_loss

若自定义myloss中,使用tf.losses中的loss函数,并将该loss添加到slim.losses.add_loss()中, 这时使用tf.losses.get_total_loss()函数时相当于累加两次myloss,因为tf.losses中的loss值都会自动添加到slim.losses接合中.因此若使用tf.losses中自带的loss函数,则不需要add_loss()了,否则相当于重复添加了

  # Specify the loss function:
    def myloss(logits, labels):
        '''
        自定义loss
        :param logits: 
        :param labels: 
        :return: 
        若自定义myloss中,使用tf.losses中的loss函数,并将该loss添加到slim.losses.add_loss()中,
        这时使用tf.losses.get_total_loss()函数时相当于累加两次myloss,因为tf.losses中的loss值
        都会自动添加到slim.losses接合中.因此若使用tf.losses中自带的loss函数,则不需要add_loss()了,
        否则相当于重复添加了
        '''
        my_loss=tf.losses.softmax_cross_entropy(onehot_labels=labels, logits=logits)
        return my_loss
 
    my_loss=myloss(logits=out,labels=y)
    slim.losses.add_loss(my_loss)
    # tf.losses.softmax_cross_entropy(onehot_labels=y, logits=out)## 1.609556
    loss = tf.losses.get_total_loss(add_regularization_losses=False)
    accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(out, 1), tf.argmax(y, 1)), tf.float32))
 
    # Specify the optimization scheme:
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
    # create_train_op that ensures that when we evaluate it to get the loss,
    # the update_ops are done and the gradient updates are computed.
    train_tensor = slim.learning.create_train_op(total_loss=loss,optimizer=optimizer)

2.5 正则化

在卷积层和全连接层加上参数：reg = slim.l2_regularizer(scale=weight_decay)

weight_decay=100.0 #正则化参数
reg = slim.l2_regularizer(scale=weight_decay)
def NET(inputs,reg):
    net = slim.conv2d(
        inputs=inputs,
        num_outputs=32,
        weights_initializer=tf.truncated_normal_initializer(stddev=0.0001),
        weights_regularizer=reg,
        kernel_size=(3, 3),
        activation_fn=tf.nn.relu,
        stride=(1, 1),
        padding="SAME",
        trainable=True,
        scope="conv_1")

方法1：训练时，需要使用 tf.get_collection获得需要正则化的变量：tf.GraphKeys.REGULARIZATION_LOSSES

    weights_list = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
    if weights_list and weight_decay is not None and weight_decay > 0:
        print("Regularization losses:")
        for i,rl in enumerate(weights_list):
            print("{}".format(rl.name))
        reg_loss=tf.reduce_sum(weights_list)#等价于sum(weights_list)
        total_loss = loss +tf.reduce_sum(reg_loss)
        #total_loss=reg_loss
    else:
      print("No regularization.")
      total_loss = loss

方法2：当然也可以设置：tf.GraphKeys.TRAINABLE_VARIABLES时，但这时会把偏置项biases也加正则化，所以需要调用slim.apply_regularization(reg, weights_list=weights_list)

    # 设置：tf.GraphKeys.TRAINABLE_VARIABLES时，会把偏置项biases也加入
    weights_list = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
    if weights_list and weight_decay is not None and weight_decay > 0:
        print("Regularization losses:")
        for i,rl in enumerate(weights_list):
            print("{}".format(rl.name))
        reg_loss = slim.apply_regularization(reg, weights_list=weights_list)
        total_loss = loss +tf.reduce_sum(reg_loss)
        # total_loss=tf.reduce_sum(reg_loss)
    else:
      print("No regularization.")
      total_loss = loss

方法3：

更直接的方法：若你的模型是使用slim构建的，那么可以直接调用tf.losses.get_regularization_losses()获得正则化的损失项，

    if  weight_decay is not None and weight_decay > 0:
        reg_loss=tf.losses.get_regularization_losses()
        total_loss = loss +tf.reduce_sum(reg_loss)
        #total_loss=tf.reduce_sum(reg_loss)
    else:
      print("No regularization.")
      total_loss = loss

实质上该tf.losses.get_regularization_losses()方法还是调用：get_collection(ops.GraphKeys.REGULARIZATION_LOSSES, scope)

def get_regularization_losses(scope=None):
  """Gets the list of regularization losses.

  Args:
    scope: An optional scope name for filtering the losses to return.

  Returns:
    A list of regularization losses as Tensors.
  """
  return ops.get_collection(ops.GraphKeys.REGULARIZATION_LOSSES, scope)

2.6 滑动平均方法

   moving_averages=True#是否进行滑动平均
   global_step = tf.contrib.framework.get_or_create_global_step()
    # train_op = tf.train.AdamOptimizer(base_lr).minimize(total_loss,global_step=global_step)
    train_op = tf.train.AdamOptimizer(base_lr).minimize(total_loss,global_step=global_step)

    # Average loss and psnr for display 平均损失loss和PSNR显示
    if moving_averages:
        with tf.name_scope("moving_averages"):
            ema = tf.train.ExponentialMovingAverage(decay=0.99)
            update_ma = ema.apply([total_loss, psnr])
            total_loss = ema.average(total_loss)
            psnr = ema.average(psnr)
            # Training stepper operation 训练op
            train_op = tf.group(train_op, update_ma)

3. 卷积,反卷积,空洞卷积

tf.nn.conv2d卷积函数

tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None,
data_format=None, name=None)

input为一个4-D的输入,fileter为滤波器(卷积核),4-D,通常为[height, width, input_dim, output_dim],height, width分别表示卷积核的高,宽.input_dim, output_dim分别表式输入维度,输出维度.

import tensorflow as tf

x1 = tf.constant(1.0, shape=[1, 5, 5, 3])
x2 = tf.constant(1.0, shape=[1, 6, 6, 3])
kernel = tf.constant(1.0, shape=[3, 3, 3, 1])
y1 = tf.nn.conv2d(x1, kernel, strides=[1, 2, 2, 1], padding="SAME")
y2 = tf.nn.conv2d(x2, kernel, strides=[1, 2, 2, 1], padding="SAME")

sess = tf.Session()
tf.global_variables_initializer().run(session=sess)
x1_cov,  x2_cov = sess.run([y1, y2])

print(x1_cov.shape)
print(x2_cov.shape)

tf.nn.conv2d_transpose反卷积函数

参考资料：https://blog.csdn.net/silence2015/article/details/78649734

tf.nn.conv2d_transpose(value,
filter,
output_shape,
strides,
padding="SAME",
data_format="NHWC",
name=None)

output_shape为输出shape,由于是卷积的反过程,因此这里filter的输入输出为维度位置调换,[height, width, output_channels, in_channels].

import tensorflow as tf

# 正向卷积

 # 输入尺度  [batch, height, width, in_channels]
inputx=tf.random_normal([100,255,255,3],dtype=tf.float32)
 # kernel  [height, width, in_channels, output_channels]
w=tf.random_normal(shape=[5,5,3,10],dtype=tf.float32)
 # 卷积输出 (100, 126, 126, 10)，注意：这儿用的是VALID
outputy=tf.nn.conv2d(input=inputx,filter=w,strides=[1,2,2,1],padding='VALID')

用卷积计算输出很容易，但是如果我现在想根据输出图片的大小，怎么设置卷积核的尺寸和strides呢？先贴代码

# 转置卷积
# 输入的value  [batch, height, width, in_channels]
value=tf.random_normal(shape=[100,126,126,10])
# filter [height, width, output_channels, in_channels]
w=tf.random_normal(shape=[4,4,3,10])
# 转置卷积得出的结果
result=tf.nn.conv2d_transpose(value=value,filter=w,output_shape=[100,255,255,3],strides=[1,2,2,1],padding='VALID')



with tf.Session() as sess:
    tf.global_variables_initializer().run()
    # sess.run(outputy)
    # print(outputy.shape)

    sess.run(result)
    print(result.shape)

其实有个很简单的思路，就是像计算卷积那样计算转置卷积。这句话是这么个意思，比如代码中我要得到255*255*3的一张图片（也就是经过转置卷积放大的图片），第一步我们要确定我们用什么padding方式，不同的padding方式确定不同的计算模式。代码中我们使用VALID模式，那么根据转置卷积的输入value是126*126*10的图片，根据计算公式
ceil((255−kernel+1)/stride)=126,其中ceil是向上取整ceil((255−kernel+1)/stride)=126,其中ceil是向上取整
设kernel等于4，stride算出来正好是2。
如果设置padding是SAME呢，据公式
ceil(255/stride)=126ceil(255/stride)=126,窝草，这是算不出符合的整数stride的，这说明不能将126*126转置卷积到255*255，据推算在stride为2时候，要输出255*255的只能输入128*128的value到转置卷积里。

tf.nn.atrous_conv2d空洞卷积(dilated convolution)

空洞卷积函数为:

tf.nn.atrous_conv2d(value, filters, rate, padding, name=None)

fileter为滤波器(卷积核),格式与卷积相同,为[height, width, input_dim, output_dim].rate为对输入的采样步长(sample stride).

x1 = tf.constant(1.0, shape=[1, 5, 5, 3])
kernel = tf.constant(1.0, shape=[3, 3, 3, 1])
y5=tf.nn.atrous_conv2d(x1,kernel,10,'SAME')

y5.shape为(1, 5, 5, 1).

完整调用代码为:

import tensorflow as tf

x1 = tf.constant(1.0, shape=[1, 5, 5, 3])
x2 = tf.constant(1.0, shape=[1, 6, 6, 3])
kernel = tf.constant(1.0, shape=[3, 3, 3, 1])
y1 = tf.nn.conv2d(x1, kernel, strides=[1, 2, 2, 1], padding="SAME")
y2 = tf.nn.conv2d(x2, kernel, strides=[1, 2, 2, 1], padding="SAME")
y3 = tf.nn.conv2d_transpose(y1,kernel,output_shape=[1,5,5,3],
    strides=[1,2,2,1],padding="SAME")
y4 = tf.nn.conv2d_transpose(y2,kernel,output_shape=[1,6,6,3],
    strides=[1,2,2,1],padding="SAME")

y5=tf.nn.atrous_conv2d(x1,kernel,10,'SAME')
sess = tf.Session()
tf.global_variables_initializer().run(session=sess)
x1_cov,  x2_cov,y1_decov,y2_decov,y5_dicov = sess.run([y1, y2,y3,y4,y5])

print(x1_cov.shape)
print(x2_cov.shape)
print(y1_decov.shape)
print(y2_decov.shape)
print(y5_dicov.shape)

二、网络模型实现

slim实现LeNet5

import  tensorflow as tf
import tensorflow.contrib.slim as slim
import numpy as np

from tensorflow.examples.tutorials.mnist import input_data

# 通过slim来定义letnet5网络结构
def lenet5(inputs):
    inputs=tf.reshape(inputs,[-1,28,28,1])
    net=slim.conv2d(inputs=inputs,
                    num_outputs=32,
                    kernel_size=[5,5],
                    padding="SAME",
                    scope="conv1")# SAME全零填充，VALED“不填充”
    net=slim.max_pool2d(inputs=net,
                        kernel_size=2,
                        stride=2,
                        scope="pool1")
    net =slim.conv2d(inputs=net,
                     num_outputs=64,
                     kernel_size=[5,5],
                     padding="SAME",
                     scope="conv2")
    net=slim.max_pool2d(inputs=net,
                        kernel_size=2,
                        stride=2,
                        scope="pool2")
    # tf.contrib.layers.flatten(P) 对于输入的P，将每一个example展开为1-D的Tensor,
    # 但是依然保留batch-size。它返回一个[batch_size, k]的Tensor。
    # 通常在CNN的最后一步连接到Fully Connected 的网络之前会将其展开，
    # 例如CNN的conv层输出的tensor的shape为[batch_size, height, width, channel],
    # 展开会就是[batch_size, height * width * channel]的-D的Tensor,。
    net =slim.flatten(inputs=net,scope="flatten")
    net=slim.fully_connected(inputs=net,num_outputs=500,scope="full1")
    net=slim.fully_connected(inputs=net,num_outputs=10,scope="out")
    return net

slim实现的VGG16

def vgg16(inputs):
  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      activation_fn=tf.nn.relu,
                      weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
                      weights_regularizer=slim.l2_regularizer(0.0005)):
    net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
    net = slim.max_pool2d(net, [2, 2], scope='pool1')
    net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
    net = slim.max_pool2d(net, [2, 2], scope='pool2')
    net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
    net = slim.max_pool2d(net, [2, 2], scope='pool3')
    net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
    net = slim.max_pool2d(net, [2, 2], scope='pool4')
    net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
    net = slim.max_pool2d(net, [2, 2], scope='pool5')
    net = slim.fully_connected(net, 4096, scope='fc6')
    net = slim.dropout(net, 0.5, scope='dropout6')
    net = slim.fully_connected(net, 4096, scope='fc7')
    net = slim.dropout(net, 0.5, scope='dropout7')
    net = slim.fully_connected(net, 1000, activation_fn=None, scope='fc8')
  return net

参考资料：

[1].《第二十四节，TensorFlow下slim库函数的使用以及使用VGG网络进行预训练、迁移学习(附代码) 》https://www.cnblogs.com/zyly/p/9146787.html

UNet模型实现

# -*-coding: utf-8 -*-
"""
    @Project: triple_path_networks
    @File   : UNet.py
    @Author : panjq
    @E-mail : pan_jinquan@163.com
    @Date   : 2019-01-24 11:18:15
"""
import tensorflow as tf
import tensorflow.contrib.slim as slim


def lrelu(x):
    return tf.maximum(x * 0.2, x)

activation_fn=lrelu

def UNet(inputs, reg):  # Unet
    conv1 = slim.conv2d(inputs, 32, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv1_1', weights_regularizer=reg)
    conv1 = slim.conv2d(conv1, 32, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv1_2',weights_regularizer=reg)
    pool1 = slim.max_pool2d(conv1, [2, 2], padding='SAME')

    conv2 = slim.conv2d(pool1, 64, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv2_1',weights_regularizer=reg)
    conv2 = slim.conv2d(conv2, 64, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv2_2',weights_regularizer=reg)
    pool2 = slim.max_pool2d(conv2, [2, 2], padding='SAME')

    conv3 = slim.conv2d(pool2, 128, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv3_1',weights_regularizer=reg)
    conv3 = slim.conv2d(conv3, 128, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv3_2',weights_regularizer=reg)
    pool3 = slim.max_pool2d(conv3, [2, 2], padding='SAME')

    conv4 = slim.conv2d(pool3, 256, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv4_1',weights_regularizer=reg)
    conv4 = slim.conv2d(conv4, 256, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv4_2',weights_regularizer=reg)
    pool4 = slim.max_pool2d(conv4, [2, 2], padding='SAME')

    conv5 = slim.conv2d(pool4, 512, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv5_1',weights_regularizer=reg)
    conv5 = slim.conv2d(conv5, 512, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv5_2',weights_regularizer=reg)

    up6 = upsample_and_concat(conv5, conv4, 256, 512)
    conv6 = slim.conv2d(up6, 256, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv6_1',weights_regularizer=reg)
    conv6 = slim.conv2d(conv6, 256, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv6_2',weights_regularizer=reg)

    up7 = upsample_and_concat(conv6, conv3, 128, 256)
    conv7 = slim.conv2d(up7, 128, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv7_1',weights_regularizer=reg)
    conv7 = slim.conv2d(conv7, 128, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv7_2',weights_regularizer=reg)

    up8 = upsample_and_concat(conv7, conv2, 64, 128)
    conv8 = slim.conv2d(up8, 64, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv8_1',weights_regularizer=reg)
    conv8 = slim.conv2d(conv8, 64, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv8_2',weights_regularizer=reg)

    up9 = upsample_and_concat(conv8, conv1, 32, 64)
    conv9 = slim.conv2d(up9, 32, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv9_1', weights_regularizer=reg)
    conv9 = slim.conv2d(conv9, 32, [3, 3], rate=1, activation_fn=activation_fn, scope='g_conv9_2',weights_regularizer=reg)
    print("conv9.shape:{}".format(conv9.get_shape()))

    type='UNet_1X'
    with tf.variable_scope(name_or_scope="output"):
        if type=='UNet_3X':#UNet放大三倍
            conv10 = slim.conv2d(conv9, 27, [1, 1], rate=1, activation_fn=None, scope='g_conv10',weights_regularizer=reg)
            out = tf.depth_to_space(conv10, 3)
        if type=='UNet_1X':#输入输出维度相同
            out = slim.conv2d(conv9, 6, [1, 1], rate=1, activation_fn=None, scope='g_conv10',weights_regularizer=reg)
    return out

def upsample_and_concat(x1, x2, output_channels, in_channels):
    pool_size = 2
    deconv_filter = tf.Variable(tf.truncated_normal([pool_size, pool_size, output_channels, in_channels], stddev=0.02))
    deconv = tf.nn.conv2d_transpose(x1, deconv_filter, tf.shape(x2), strides=[1, pool_size, pool_size, 1])

    deconv_output = tf.concat([deconv, x2], 3)
    deconv_output.set_shape([None, None, None, output_channels * 2])
    return deconv_output

if __name__=="__main__":
    weight_decay=0.001
    reg = slim.l2_regularizer(scale=weight_decay)
    inputs = tf.ones(shape=[4, 100, 200, 3])
    out=UNet(inputs,reg)
    print("net1.shape:{}".format(inputs.get_shape()))
    print("out.shape:{}".format(out.get_shape()))
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

三、TF常用函数说明

TensorFlow产生假数据：

import tensorflow as tf
 
def get_faked_data(shape,n_classes,one_hot):
    '''
    生成假训练数据，一般用于模型训练过程
    :param shape:  shape=[batch_size,height,width,image_channel]
    :param n_classes: 类别个数
    :param one_hot: labels是否转为one_hot形式
    :return: 
    '''
    images=tf.Variable(tf.random_normal(shape=shape,
                                        mean=0.0,stddev=1.0,dtype=tf.float32))
    labels=tf.Variable(tf.random_uniform(shape=[shape[0]],minval=0,maxval=n_classes,dtype=tf.int32))
    if one_hot:
        labels = tf.one_hot(labels, n_classes, 1, 0)
    return images,labels
 
if __name__=="__main__":
 
    batch_size=10
    height = 224
    width = 224
    image_channel = 3
    n_classes = 10
    # Set up the data loading:
    shape=[batch_size,height,width,image_channel]
    images, labels = get_faked_data(shape=shape,n_classes=n_classes,one_hot=False)
    with tf.Session() as sess:
        init_op = tf.initialize_all_variables()
        sess.run(init_op)
        print('images',images.get_shape())
        print('labels',sess.run(labels))

tf.reduce_mean()

reduce_mean( input_tensor, axis=None, keep_dims=False, name=None, reduction_indices=None )

参数：

input_tensor：要减少的张量。应该有数字类型。
axis：要减小的尺寸。如果为None（默认），则减少所有维度。必须在[-rank(input_tensor), rank(input_tensor))范围内。
keep_dims：如果为true，则保留长度为1的缩小尺寸。
name：操作的名称（可选）。
reduction_indices：axis的不支持使用的名称。

计算张量的各个维度上的元素的平均值。axis是tf.reduce_mean函数中的参数，按照函数中axis给定的维度减少input_tensor。除非keep_dims是true，否则张量的秩将在axis的每个条目中减少1。如果keep_dims为true，则缩小的维度将保留为1。如果axis没有条目，则减少所有维度，并返回具有单个元素的张量。

使用tf.reduce_mean实现全局平局池化

# global average pooling全局平局池化,net shape=[batch_size,height,width,depths]
net = tf.reduce_mean(net,[1,2],keep_dims=True,name="GlobalPool")

参考资料：

[1].https://blog.csdn.net/Hk_john/article/details/78188990?locationNum=6&fps=1

[2].https://www.w3cschool.cn/tensorflow_python/tensorflow_python-hckq2htb.html

spatial_squeeze

参数标志是否对输出进行squeeze 操作(即去除维数为1 的维度，比如5x3 x l 转为5 x 3 );

embedding_lookup

参考资料：https://blog.csdn.net/u014595019/article/details/52759104

RNN中，输入模型的input和target都是用词典id表示的。例如一个句子，“我/是/学生”，这三个词在词典中的序号分别是0,5,3，那么上面的句子就是[0,5,3]。显然这个是不能直接用的，我们要把词典id转化成向量,也就是embedding形式。可能有些人已经听到过这种描述了。实现的方法很简单。

第一步，构建一个矩阵，就叫embedding好了，尺寸为[vocab_size, embedding_size]，分别表示词典中单词数目，以及要转化成的向量的维度。一般来说，向量维度越高，能够表现的信息也就越丰富。

第二步，使用tf.nn.embedding_lookup(embedding,input_ids) 假设input_ids的长度为len，那么返回的张量尺寸就为[len,embedding_size]。举个栗子

# 示例代码
import tensorflow as tf
import numpy as np

sess = tf.InteractiveSession()

embedding = tf.Variable(np.identity(5,dtype=np.int32))
input_ids = tf.placeholder(dtype=tf.int32,shape=[None])
input_embedding = tf.nn.embedding_lookup(embedding,input_ids)

sess.run(tf.initialize_all_variables())
print(sess.run(embedding))
#[[1 0 0 0 0]
# [0 1 0 0 0]
# [0 0 1 0 0]
# [0 0 0 1 0]
# [0 0 0 0 1]]
print(sess.run(input_embedding,feed_dict={input_ids:[1,2,3,0,3,2,1]}))
#[[0 1 0 0 0]
# [0 0 1 0 0]
# [0 0 0 1 0]
# [1 0 0 0 0]
# [0 0 0 1 0]
# [0 0 1 0 0]
# [0 1 0 0 0]]

tf.concat拼接

https://blog.csdn.net/LoseInVain/article/details/79638183?utm_source=blogxgwz0

tf.concat相当于numpy中的np.concatenate函数，用于将两个张量在某一个维度(axis)合并起来，例如：

t1 = [[1, 2, 3], [4, 5, 6]]
t2 = [[7, 8, 9], [10, 11, 12]]
tf.concat([t1, t2], 0) ==> [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
tf.concat([t1, t2], 1) ==> [[1, 2, 3, 7, 8, 9], [4, 5, 6, 10, 11, 12]]

a = tf.constant([[1,2,3],[3,4,5]]) # shape (2,3)
b = tf.constant([[7,8,9],[10,11,12]]) # shape (2,3)
ab1 = tf.concat([a,b], axis=0) # shape(4,3)
ab2 = tf.concat([a,b], axis=1) # shape(2,6)

import tensorflow as tf
import numpy as np
#一共120个数,按照shape=[batch=5,height=4,width=3,depths=2]
batch_inputs = np.reshape(np.arange(5*4*3*2), (5,4,3,2)) 
print("batch_inputs.shape:{}\n batch_inputs:{}".format(batch_inputs.shape,batch_inputs))
x = tf.placeholder(tf.int32,shape=[5, 4, 3,2])
# y=tf.reduce_sum(x, axis=0,keep_dims=True)
# 计算axis=[1,2]的平均值，这就是全局平局池化
y = tf.reduce_mean(x,axis=[1,2],keep_dims=True,name="GlobalPool")
data_cat=tf.concat(values=[y,y],axis=3) # 在深度方向进行拼接
with tf.Session() as sess:
    y = sess.run(y,feed_dict={x:batch_inputs})
    print("y.shape:{},\ny:{}".format(y.shape,y))

    data_cat= sess.run(data_cat,feed_dict={x:batch_inputs})
    print("data_cat.shape:{},\ndata_cat:{}".format(data_cat.shape,data_cat))

tf.split分割

tf.split类似于np.array_split

data1,data2=np.array_split(batch_inputs,indices_or_sections=2,axis=3)

batch=4
height=100
width=100
depths=6
batch_inputs = np.reshape(np.arange(batch * height * width * depths), (batch, height, width, depths))
print("batch_inputs.shape:{}\n".format(batch_inputs.shape))
x = tf.placeholder(tf.int32, shape=[batch, height, width, depths])


# 深度方向进行拼接：[batch, height, width, depths]===>[batch, height, width, depths+depths]
data_cat = tf.concat(values=[batch_inputs, batch_inputs], axis=3)

# 在深度方向进行分割
data_split1,data_split2 = tf.split(value=batch_inputs,num_or_size_splits=[3,3], axis=3)

with tf.Session() as sess:
    data_cat = sess.run(data_cat, feed_dict={x: batch_inputs})
    print("data_cat.shape:{},\n".format(data_cat.shape))

    data_split1,data_split2 = sess.run([data_split1,data_split2], feed_dict={x: batch_inputs})
    print("data_split1.shape:{},\n".format(data_split1.shape))
    print("data_split2.shape:{},\n".format(data_split2.shape))

输出结果：

batch_inputs.shape:(4, 100, 100, 6)

data_cat.shape:(4, 100, 100, 12),

data_split1.shape:(4, 100, 100, 3),

data_split2.shape:(4, 100, 100, 3),

比较tf.split和np.split，前者tf.split是num_or_size_splits，后者np.split是indices_or_sections，下面分割是等效的：

import numpy as np
import tensorflow as tf

batch = 1
height = 13
width = 13
depths = 3
D5=85
batch_inputs = np.reshape(np.arange(batch * height * width * depths*D5), (batch, height, width, depths,D5))
print("batch_inputs.shape:{}".format(batch_inputs.shape))
print("*********************************")
# tf.split,[2, 2, 1, 80]对应被分割维度的宽度
box_centers, box_sizes, conf_logits, prob_logits = tf.split(batch_inputs, [2, 2, 1, 80], axis=-1)#axis=-1表示最后一个维度
print("box_centers.shape:{}".format(box_centers.shape))
print("box_sizes.shape:{}".format(box_sizes.shape))
print("conf_logits.shape:{}".format(conf_logits.shape))
print("prob_logits.shape:{}".format(prob_logits.shape))

# np.split，[2, 4, 5]对应分割的索引分割点
box_centers, box_sizes, conf_logits, prob_logits= np.split(batch_inputs, [2, 4, 5], axis=-1)
print("*********************************")

print("box_centers.shape:{}".format(box_centers.shape))
print("box_sizes.shape:{}".format(box_sizes.shape))
print("conf_logits.shape:{}".format(conf_logits.shape))
print("prob_logits.shape:{}".format(prob_logits.shape))

输出结果：

batch_inputs.shape:(1, 13, 13, 3, 85)
*********************************
box_centers.shape:(1, 13, 13, 3, 2)
box_sizes.shape:(1, 13, 13, 3, 2)
conf_logits.shape:(1, 13, 13, 3, 1)
prob_logits.shape:(1, 13, 13, 3, 80)
*********************************
box_centers.shape:(1, 13, 13, 3, 2)
box_sizes.shape:(1, 13, 13, 3, 2)
conf_logits.shape:(1, 13, 13, 3, 1)
prob_logits.shape:(1, 13, 13, 3, 80)

tf.py_func


import tensorflow as tf

def my_func(array1,array2):
    return array1 + array2, array1 - array2

if __name__ =='__main__':
    array1 = np.array([[1, 2], [3, 4]])
    array2 = np.array([[1, 2], [3, 4]])

    a1 = tf.placeholder(tf.float32,[2,2],name = 'array1')
    a2 = tf.placeholder(tf.float32,[2,2],name = 'array2')
    # 方法1，使用TF自带OP进行计算
    # y1=a1+a2
    # y2=a1-a2
    # 方法2，使用tf.py_func，调用第三方库numpy进行计算
    y1,y2 = tf.py_func(my_func,inp=[a1,a2],Tout=[tf.float32, tf.float32])

    with tf.Session() as sess:
        y1_,y2_ = sess.run([y1,y2],feed_dict={a1:array1,a2:array2})
        print("y1_:\n{}".format(y1_))
        print('*'*10)
        print("y2_:\n{}".format(y2_))

四TensorflowBUG问题

1.训练时内存持续增加并占满.

今天在跑程序的时候，内存一个劲儿的涨。本地不行拿到服务器上去跑，62G内存分分钟干没了，不知道问题出在哪儿。经过在网上的一番查找，才弄清楚。一句话说：在迭代循环时，不能再包含任何张量的计算表达式，包括以tf.开头的函数（如tf.nn.embedding_lookup）

如果你非得计算，请在循环体外面定义好表达式，在循环中直接run

举例：

import tensorflow as tf

a = tf.Variable(tf.truncated_normal(shape=[100,1000]),name='a')
b = tf.Variable(tf.truncated_normal(shape=[100,1000]),name='b')

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    while True:
        print(sess.run(a+b))

可以看到，在循环体中出现了a+b 这个表达式，当你在运行程序的时候，内存会慢慢的增大（当然这个程序的增长速度还不足以导致崩掉）。原因是在Tensorflow的机制中，任何张量的计算表达式（函数操作）都会被作为节点添加到计算图中。如果循环中有表达式，那么计算图中就会被不停的加入几点，导致内存上升。

正确的做法应该是：(将表达式定义在外边)

import tensorflow as tf

a = tf.Variable(tf.truncated_normal(shape=[100,1000]),name='a')
b = tf.Variable(tf.truncated_normal(shape=[100,1000]),name='b')
z=a+b
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    while True:
        print(sess.run(z))

同时TensorFlow也提供了一个办法来检查这个问题：

import tensorflow as tf

a = tf.Variable(tf.truncated_normal(shape=[100,1000]),name='a')
b = tf.Variable(tf.truncated_normal(shape=[100,1000]),name='b')

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    while True:
        print(sess.run(a+b))
        sess.graph.finalize()

此时将报错：RuntimeError: Graph is finalized and cannot be modified.
sess.graph.finalize()这个函数告诉TensorFlow，计算图我已经定义完毕。所以当循环到第二次的时候就会报错。

再例如：

import tensorflow as tf

a = tf.Variable(tf.truncated_normal(shape=[2, 3]), name='a')
b = tf.Variable(tf.truncated_normal(shape=[2, 3]), name='b')

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    sess.graph.finalize()
    c = tf.concat([a, b], axis=0)
    print(sess.run(c))

如上程序也会报错，因为tf.concat()会增加计算图中的节点，而在此之前，我已申明计算图定义完毕。这也证明，tf.开头的函数也将导致计算图中的节点增加。解决方法同上。

四、TF训练参数选择问题

batchsize: 批大小，即每次迭代训练时，在训练集中取batchsize个样本进行训练
epoch: 1个epoch等于使用训练集的全部样本训练一次
iteration: 迭代次数，1个iteration等于使用batchsize个样本训练一次，与batchsize和epoch的关系可以表示如下：

$iteration=\frac{epoch\times nums}{batchsize}$

举个栗子：一般100个epoch就够了，比如你的数据是1000，然后batchsize是128，所以100个epoch需要的迭代次数是 1000*100/128 = 781.25,所以大概迭代800次就够了

五、TF图像预处理

1.TF等比例缩放、裁剪和填充

先进行等比例缩放至height=224，再进行裁剪和填充，效果会比直接resize和直接裁剪填充为224*224（resize_image_with_crop_or_pad）好很多。

# -*-coding: utf-8 -*-
"""
    @Project: nlp-learning-tutorials
    @File   : tf_image_processing.py
    @Author : panjq
    @E-mail : pan_jinquan@163.com
    @Date   : 2018-10-25 10:25:50
"""

import matplotlib.pyplot as plt
import tensorflow as tf
def tf_resize_image_equal_scaling(tf_image,target_height=500):
    '''
    TF实现等比例缩放
    :param tf_image:
    :param target_height:目标图像输出的height,width会等比例缩放
    :return:
    '''
    with tf.Session() as sess:
        height, width, depth = tf_image.eval().shape
        target_width = int(width * target_height / height)
        tf_dest_image = tf.image.convert_image_dtype(tf_image, dtype=tf.float32)
        tf_dest_image = tf.image.resize_images(tf_dest_image, [target_height, target_width], method=0)
    return tf_dest_image

image_path="D:/tensorflow/2.jpg"
image_raw_data = tf.gfile.FastGFile(image_path,'rb').read()
with tf.Session() as sess:
    plt.ion()    # 打开交互模式
    plt.figure()
    img_data = tf.image.decode_jpeg(image_raw_data)
    plt.imshow(img_data.eval())

    # 直接裁剪填充为224*224
    croped_img1 = tf.image.resize_image_with_crop_or_pad(img_data,224,224)
    plt.figure()
    plt.imshow(croped_img1.eval())

    # 等比例缩放至height=224
    img_data=tf_resize_image_equal_scaling(img_data, target_height=224)
    plt.figure()
    plt.imshow(img_data.eval())
    # 再进行裁剪和填充为224*224
    croped_img2 = tf.image.resize_image_with_crop_or_pad(img_data,224,224)
    plt.figure()
    plt.imshow(croped_img2.eval())

    plt.ioff()
    plt.show()

六、Tensorflow 2.0与tf.keras

https://blog.csdn.net/guyuealian/article/details/84579227

七、Tensorflow优化等问题

1.在CPU上进行数据预处理

将输入管道的运算放在 CPU 上能够显著地提高性能。利用 CPU 来进行输入管道的运算使得 GPU 能够专注于训练。为了确保在 CPU 上进行数据预处理，请像下面这样对预处理运算进行包装：

with tf.device('/cpu:0'):
 # function to get and process images or data.
  distorted_inputs = load_and_distort_images()

如果使用 tf.estimator.Estimator，则 Estimator 的输入函数会自动被放在 CPU 上

2.使用TFRecord存储数据文件

读取大量的小文件极大地影响了 I/O 性能。在硬件一定的情况下，获得最大的 I/O 吞吐的一个方法是：将输入数据处理成 TFRecord（每个文件大于100MB）。对于小数据集（200MB-1GB），最好的方法是直接将整个数据集加载到内存中。这里有转换的例子。

3.Pipeline 机制

TensorFlow引入了tf.data.Dataset模块，使其数据读入的操作变得更为方便，而支持多线程（进程）的操作，也在效率上获得了一定程度的提高。使用tf.data.Dataset模块的pipline机制，可实现CPU多线程处理输入的数据，如读取图片和图片的一些的预处理，这样GPU可以专注于训练过程，而CPU去准备数据。

https://blog.csdn.net/guyuealian/article/details/80857228

https://blog.csdn.net/wangdongwei0/article/details/82991048

4.使用 BatchNorm

BN具有加速网络收敛速度，提升训练稳定性的效果

5.使用指数衰减学习率

使用初试较大的学习率，然后逐渐衰减

6.滑动平均：提高模型在测试数据上的健壮性

滑动平均模型，它可以使得模型在测试数据上更健壮，在使用随机梯度下降算法训练神经网络时，通过滑动平均模型可以在很多的应用中在一定程度上提高最终模型在测试数据上的表现。其实滑动平均模型，主要是通过控制衰减率来控制参数更新前后之间的差距，从而达到减缓参数的变化值（如，参数更新前是5，更新后的值是4，通过滑动平均模型之后，参数的值会在4到5之间），如果参数更新前后的值保持不变，通过滑动平均模型之后，参数的值仍然保持不变。

TensorFlow中的ExponentialMovingAverage()是针对权重weight和偏差bias的，而不是针对训练集的。如果你现在训练集中实现这个效果，需要自己设计代码。为什么要对w和b使用滑动平均模型呢？因为在神经网络中，
更新的参数时候不能太大也不能太小，更新的参数跟你之前的参数有联系，不能发生突变。一旦训练的时候遇到个“疯狂”的参数，有了滑动平均模型，疯狂的参数就会被抑制下来，回到正常的队伍里。这种对于突变参数的抑制作用，用专业术语讲叫鲁棒性，鲁棒性就是对突变的抵抗能力，鲁棒性越好，这个模型对恶性参数的提抗能力就越强。
在TensorFlow中，ExponentialMovingAverage()可以传入两个参数：衰减率（decay）和数据的迭代次数（step），这里的decay和step分别对应我们的β和num_updates，所以在实现滑动平均模型的时候，步骤如下：

1、定义训练轮数step
2、然后定义滑动平均的类
3、给这个类指定需要用到滑动平均模型的变量（w和b）
4、执行操作，把变量变为指数加权平均值

    # 1、定义训练的轮数，需要用trainable=False参数指定不训练这个变量，
    # 避免这个变量被计算滑动平均值
    global_step = tf.Variable(0, trainable=False)

    # 2、给定滑动衰减率和训练轮数，初始化滑动平均类
    # 定训练轮数的变量可以加快训练前期的迭代速度
    variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,
                                                          global_step)
    # 3、用tf.trainable_variable()获取所有可以训练的变量列表，也就是所有的w和b
    # 全部指定为使用滑动平均模型
    variables_averages_op = variable_averages.apply(tf.trainable_variables())

    # 反向传播更新参数之后，再更新每一个参数的滑动平均值，用下面的代码可以一次完成这两个操作
    with tf.control_dependencies([train_step, variables_averages_op]):
        train_op = tf.no_op(name="train")

设置完使用滑动平均模型之后，只需要在每次使用反向传播的时候改为使用run.(train_op)就可以正常执行了。

五分推荐：https://blog.csdn.net/m0_38106113/article/details/81542863

  with tf.control_dependencies([updates]):
      opt = tf.train.AdamOptimizer(args.learning_rate)
      minimize = opt.minimize(opt_loss, name='optimizer', global_step=global_step)

  # Average loss and psnr for display
  with tf.name_scope("moving_averages"):
    ema = tf.train.ExponentialMovingAverage(decay=0.99)
    update_ma = ema.apply([loss, psnr])
    loss = ema.average(loss)
    psnr = ema.average(psnr)

  # Training stepper operation
  train_op = tf.group(minimize, update_ma)#tf.group()用于组合多个操作

7.Grappler API

https://www.imooc.com/article/34403

tensorflow.gappler.ModelPruner类的主要优化逻辑是裁剪计算图，剔除不需要的节点。

8.模型量化工具：quantization

量化简单来说就是将32浮点数近似地用8位整数存储和计算，量化后，模型占用存储空间减小75%,起到了压缩模型的效果。

8bit量化简单的例子：模型属于同一层的参数值会分布在一个较小的区间内，比如在[-1,1]之间，可以把同一层的所有参数都线性映射区间[0, 255]，如：

float | Quantized

-------+----------

-1.0 | 0

1.0 | 255

0.0 | 125

执行命令：

bazel-bin/tensorflow/tools/quantization/quantize_graph \
--input=./tmp/classify_image_graph_def.pb \
--output_node_names="softmax" --output=./tmp/quantized_graph.pb \
--mode=eightbit

参考资料：https://blog.csdn.net/gaofeipaopaotang/article/details/81186891

9.tensorflow lite量化方法

八、可视化工具

模型结构可视化神器——Netron(支持tf, caffe, keras,mxnet等多种框架)：https://blog.csdn.net/leviopku/article/details/81980249

九、tensorflow中创建多个计算图(Graph)

tf程序中，系统会自动创建并维护一个默认的计算图，计算图可以理解为神经网络（Neural Network）结构的程序化描述。如果不显式指定所归属的计算图，则所有的tensor和Operation都是在默认计算图中定义的，使用tf.get_default_graph()函数可以获取当前默认的计算图句柄。

# -*- coding: utf-8 -*-)
import tensorflow as tf
 
a=tf.constant([1.0,2.0])
b=tf.constant([1.0,2.0])
 
result = a+b
 
print(a.graph is tf.get_default_graph()) # 输出为True，表示tensor a 是在默认的计算图中定义的
print(result.graph is tf.get_default_graph()) # 输出为True， 表示 Operation result 是在默认的计算图中定义的
print 'a.graph =        {0}'.format(a.graph)
print 'default graph =  {0}'.format(tf.get_default_graph())

输出为：

True
True
a.graph =        <tensorflow.python.framework.ops.Graph object at 0x7f0480c9ca90>
default graph =  <tensorflow.python.framework.ops.Graph object at 0x7f0480c9ca90>

tf中可以定义多个计算图，不同计算图上的张量和运算是相互独立的，不会共享。计算图可以用来隔离张量和计算，同时提供了管理张量和计算的机制。计算图可以通过Graph.device函数来指定运行计算的设备，为TensorFlow充分利用GPU/CPU提供了机制。

使用 g = tf.Graph()函数创建新的计算图;
在with g.as_default():语句下定义属于计算图g的张量和操作
在with tf.Session()中通过参数 graph = xxx指定当前会话所运行的计算图;
如果没有显式指定张量和操作所属的计算图，则这些张量和操作属于默认计算图;
一个图可以在多个sess中运行，一个sess也能运行多个图

创建多个计算图：

# -*- coding: utf-8 -*-)
import tensorflow as tf
 
# 在系统默认计算图上创建张量和操作
a=tf.constant([1.0,2.0])
b=tf.constant([2.0,1.0])
result = a+b
 
# 定义两个计算图
g1=tf.Graph()
g2=tf.Graph()
 
# 在计算图g1中定义张量和操作
with g1.as_default():
    a = tf.constant([1.0, 1.0])
    b = tf.constant([1.0, 1.0])
    result1 = a + b
 
with g2.as_default():
    a = tf.constant([2.0, 2.0])
    b = tf.constant([2.0, 2.0])
    result2 = a + b
 
 
# 在g1计算图上创建会话
with tf.Session(graph=g1) as sess:
    out = sess.run(result1)
    print 'with graph g1, result: {0}'.format(out)
 
with tf.Session(graph=g2) as sess:
    out = sess.run(result2)
    print 'with graph g2, result: {0}'.format(out)
 
# 在默认计算图上创建会话
with tf.Session(graph=tf.get_default_graph()) as sess:
    out = sess.run(result)
    print 'with graph default, result: {0}'.format(out)
 
print g1.version  # 返回计算图中操作的个数

Gram矩阵

格拉姆矩阵可以看做feature之间的偏心协方差矩阵（即没有减去均值的协方差矩阵），在feature map中，每个数字都来自于一个特定滤波器在特定位置的卷积，因此每个数字代表一个特征的强度，而Gram计算的实际上是两两特征之间的相关性，哪两个特征是同时出现的，哪两个是此消彼长的等等，同时，Gram的对角线元素，还体现了每个特征在图像中出现的量，因此，Gram有助于把握整个图像的大体风格。有了表示风格的Gram Matrix，要度量两个图像风格的差异，只需比较他们Gram Matrix的差异即可。

总之，格拉姆矩阵用于度量各个维度自己的特性以及各个维度之间的关系。内积之后得到的多尺度矩阵中，对角线元素提供了不同特征图各自的信息，其余元素提供了不同特征图之间的相关信息。这样一个矩阵，既能体现出有哪些特征，又能体现出不同特征间的紧密程度。

# -*-coding: utf-8 -*-
"""
    @Project: neural_style_tensorflow
    @File   : gram_test.py
    @Author : panjq
    @E-mail : pan_jinquan@163.com
    @Date   : 2019-02-25 15:59:53
"""

import tensorflow as tf

def gram(layer):
    shape = tf.shape(layer)
    num_images = shape[0]
    width = shape[1]
    height = shape[2]
    num_filters = shape[3]
    filters = tf.reshape(layer, tf.stack([num_images, -1, num_filters]))
    print("filters shape:{}".format(filters.shape))

    grams = tf.matmul(filters, filters, transpose_a=True) / tf.to_float(width * height * num_filters)
    return grams

if __name__=="__main__":
    '''
    假设batch_size=8,在VGG中某层卷积有5个filters，提取该层filters产生的feature_map，假设大小为32*32,
    Gram＝A*A',其计算过程：将每个通道feature_map转为向量，则A＝5*1024，A'＝1024*5，最后输出结果为Gram＝5*5
    Gram每个值可以说是代表i通道的feature map与j通道的feature map的互相关程度。
    最后loss比较的是每个Gram的L2差异
    '''
    batch_size=8
    width = 32
    height = 32
    num_feature_map=5#5个filters
    feature_map = tf.random_normal([batch_size, width, height, num_feature_map], dtype=tf.float32)
    print("feature_map shape:{}".format(feature_map.shape))

    grams=gram(feature_map)
    print("grams shape:{}".format(grams.shape))