TensorFlow-Slim API 官方教程

最新推荐文章于 2024-09-25 09:42:07 发布

wanttifa

最新推荐文章于 2024-09-25 09:42:07 发布

阅读量9.1k

点赞数 12

分类专栏： AI

AI 专栏收录该内容

36 篇文章 3 订阅

订阅专栏

TensorFlow 版本： 1.12.0

文章目录

TF-Slim

TF-Slim

TF-Slim 模块是 TensorFlow 中最好用的 API 之一。尤其是里面引入的 arg_scope、model_variables、repeat、stack。

TF-Slim 是 TensorFlow 中一个用来构建、训练、评估复杂模型的轻量化库。TF-Slim 模块可以和 TensorFlow 中其它API混合使用。

1. Slim 模块的导入

import tensorflow.contrib.slim as slim

2. Slim 模块的优点

Slim 模块可以使模型的构建、训练、评估变得简单：

允许用户用紧凑的代码定义模型。这主要由 arg_scope、大量的高级 layers 和 variables 来实现。这些工具增加了代码的可读性和维护性，减少了复制、粘贴超参数值出错的可能性，并且简化了超参数的调整。
通过提供常用的 regularizers 来简化模型的开发。很多常用的计算机视觉模型（例如 VGG、AlexNet）在 Slim 里面已经有了实现。这些模型开箱可用，并且能够以多种方式进行扩展（例如，给内部的不同层添加 multiple heads）。
Slim使得 “复杂模型的扩展” 及 “从一些现存的模型 ckpt 开始训练” 变得容易。

3. Slim 模块的组成

slim是独由几个独立的模块组成。

arg_scope：允许用户对该 scope 内的操作定义默认参数。
data：包含了 Slim 模块的 dataset definition、data providers、parallel_reader 及 decoding utilities。
evaluation：评估模型需要的一些东西。
layers：构建模型需要的一些高级 layers。
learning：训练模型需要的一些东西。.
losses：常见的 loss 函数。
metrics：常见的评估指标。
nets：常见的深度网络（例如 VGG、AlexNet）。注意：最新的 Slim 中已经没有 nets 了！！！
queues：提供一个容易、简单的开始和关闭 QueueRunners的 content manager。
regularizers：常见的权重 regularizer。
variables：provides convenience wrappers for variable creation and manipulation.

4. 使用 Slim 构建模型

可以用 slim、variables、layers 和 scopes 来十分简洁地定义模型。下面对各个部分进行了详细描述：

4.1 Slim 变量（Variables）

在原生的tensorflow中创建 Variables 需要一个预定义的值或者一个初始化机制（例如，从高斯分布随机采样）。更近一步，如果需要在一个指定的设备上创建一个 variable，必须进行显式指定。为了减少创建variable需要的代码，slim模块在 variable.py 内提供了一系列的wrapper函数，从而使得变量的定义更加容易。

例如，要创建一个权重 variable，用一个截断的正态分布初始化它，用 l2_loss 进行正则，并将它放在 CPU 上。只需要进行如下的声明即可。

weights = slim.variable('weights',
                        shape=[10, 10, 3 , 3],
                        initializer=tf.truncated_normal_initializer(stddev=0.1),
                        regularizer=slim.l2_regularizer(0.05),
                        device='/CPU:0')

注意：在原生的 TensorFlow 中，有两种类型的 variables：一般variables 和local（transient）variables。绝大数的变量是一般 variables；一旦被创建，他们能够用一个saver保存到disk。local variables只存在于一个 session 中，不保存到disk。

Slim 进一步区分了variables 通过定义model variables，这些变量代表一个模型的参数。Model variables are trained or fine-tuned during learning and are loaded from a checkpoint during evaluation or inference（例如，由 slim.fully_connected 和 slim.conv2d 创建的 variable）。Non-model 变量指训练、评估过程中需要但推理过程不需要的变量（例如，global_step 训练评估中需要，推理时不需要）。同样，moving average variables might mirror model variables, but the moving averages are not themselves model variables。

通过 Slim 创建和索引（retrieved）model variables和一般的 variables很容易：

# Model Variables
weights = slim.model_variable('weights',
                              shape=[10, 10, 3 , 3],
                              initializer=tf.truncated_normal_initializer(stddev=0.1),
                              regularizer=slim.l2_regularizer(0.05),
                              device='/CPU:0')
model_variables = slim.get_model_variables()

# Regular variables
my_var = slim.variable('my_var',
                       shape=[20, 1],
                       initializer=tf.zeros_initializer())
regular_variables_and_model_variables = slim.get_variables()

内部是怎么实现的呢？当你通过 Slim 的layers或者直接通过 slim.model_variable 函数创建model variables时，Slim 将 variable 添加到了tf.GrapghKeys.MODEL_VARIABLES 容器中。如果你有自定义的layers 或者 variable创建routine，但是仍然想要使用 Slim 去管理或者想让 Slim 知道你的model variables，Slim 模块提供了一个很方便的添加 model variable到对应的容器中的函数：

my_model_variable = CreateViaCustomCode()

# Letting TF-Slim know about the additional variable.
slim.add_model_variable(my_model_variable)

4.2 Slim 层（Layers）

虽然 TensorFlow 的操作集合相当广泛，但神经网络的开发人员通常会在更高的层次上考虑模型，比如：“layers”、“losses”、“metrics” 和 “networks”。layer（例如conv层、fc层、bn层）比 TensorFlow op 更加抽象，并且layer 通常涉及多个 op。
更进一步，layer 通常（但不总是）有很多与之相关的variable（可调参数(tunable parameters)），这一点与大多数的基本操作区别很大。例如，神经网络中的一个 conv 层由很多低级的 op 组成：

创建权重和偏差 viriable
对权重和输入进行卷积（输入来自前一层）
卷积结果加上偏差
应用一个激活函数

仅使用基础（plain）的 TensorFlow 代码，这可能相当费力：

input = ...
with tf.name_scope('conv1_1') as scope:
  kernel = tf.Variable(tf.truncated_normal([3, 3, 64, 128], dtype=tf.float32,
                                           stddev=1e-1), name='weights')
  conv = tf.nn.conv2d(input, kernel, [1, 1, 1, 1], padding='SAME')
  biases = tf.Variable(tf.constant(0.0, shape=[128], dtype=tf.float32),
                       trainable=True, name='biases')
  bias = tf.nn.bias_add(conv, biases)
  conv1 = tf.nn.relu(bias, name=scope)

为了避免代码的重复。Slim 提供了很多方便的神经网络 layers 的高层 op。例如：与上面的代码对应的 Slim 版的代码：

input = ...
net = slim.conv2d(input, 128, [3, 3], scope='conv1_1')

对于构建神经网络的大量部件，Slim 都提供了标准的实现。这些实现包括但不限于下表中的 op：

Layer	TF-Slim
BiasAdd	slim.bias_add
BatchNorm	slim.batch_norm
Conv2d	slim.conv2d
Conv2dInPlane	slim.conv2d_in_plane
Conv2dTranspose (Deconv)	slim.conv2d_transpose
FullyConnected	slim.fully_connected
AvgPool2D	slim.avg_pool2d
Dropout	slim.dropout
Flatten	slim.flatten
MaxPool2D	slim.max_pool2d
OneHotEncoding	slim.one_hot_encoding
SeparableConv2	slim.separable_conv2d
UnitNorm	slim.unit_norm

Slim 还提供了两个 meta-operations：repeat和stack。tf.contrib.layers.repeat 和 stack，普通函数可以用这两个函数。它们允许用户去重复的进行（perform）相同的操作（operation）。例如，考虑下面的代码段（来自 VGG 网络，它的 layers 在两个 pooling 层之间进行了很多 conv）：

net = ...
net = slim.conv2d(net, 256, [3, 3], scope='conv3_1')
net = slim.conv2d(net, 256, [3, 3], scope='conv3_2')
net = slim.conv2d(net, 256, [3, 3], scope='conv3_3')
net = slim.max_pool2d(net, [2, 2], scope='pool2')

一个减少代码重复的方法是使用 for 循环：

net = ...
for i in range(3):
  net = slim.conv2d(net, 256, [3, 3], scope='conv3_%d' % (i+1))
net = slim.max_pool2d(net, [2, 2], scope='pool2')

使用slim.repeat可以使上面的代码变得更清晰明了：

net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
net = slim.max_pool2d(net, [2, 2], scope='pool2')

注意：slim.repeat 不仅对repeated单元采用相同的参数，而且它对 repeated 单元的scope 采用更好的命名方式（加下划线，再加迭代序号）。具体来说，上面例子中的scopes 将会命名为 ‘conv3/conv3_1’，‘conv3/conv3_2’，‘conv3/conv3_3’

更进一步，Slim 的slim.stack允许去重复多个操作with不同的参数，从而创建一个多层的堆叠结构。slim.stack 也为每一个创建的 op 创造了一个新的tf.variable_scope。例如，创建一个多层感知器（Multi-Layer Perceptron (MLP)）的一个简单方式：

# Verbose way: 冗长的方式
x = slim.fully_connected(x, 32, scope='fc/fc_1')
x = slim.fully_connected(x, 64, scope='fc/fc_2')
x = slim.fully_connected(x, 128, scope='fc/fc_3')

# Equivalent, TF-Slim way using slim.stack:
x = slim.stack(x, slim.fully_connected, [32, 64, 128], scope='fc')

在这个例子中，slim.stack 调用 slim.fully_connected 三次，并将函数上一次调用的输出传递给下一次调用。但是，在每个调用中，隐形单元（hidden units）的数量分别为 32,64,128。相似地，我们可以使用 stack 去简化多层卷积的堆叠：

# Verbose way: 冗长的方式
x = slim.conv2d(x, 32, [3, 3], scope='core/core_1')
x = slim.conv2d(x, 32, [1, 1], scope='core/core_2')
x = slim.conv2d(x, 64, [3, 3], scope='core/core_3')
x = slim.conv2d(x, 64, [1, 1], scope='core/core_4')

# Using stack:
x = slim.stack(x, slim.conv2d, [(32, [3, 3]), (32, [1, 1]), (64, [3, 3]), (64, [1, 1])], scope='core')

4.3 Slim 作用域（Scopes）

除了 TensorFlow 中的scope机制的几种类型（name_scope，variable_scope），slim 增加了一个名为 arg_scope的新scope 机制。这个新scope 允许用户去给一个或多个 op 指定一套默认参数，这些默认参数将被传给 arg_scope里使用的的每一个 op。这个功能最好通过例子来说明。考虑一下代码段：

net = slim.conv2d(inputs, 64, [11, 11], 4, padding='SAME',
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                  weights_regularizer=slim.l2_regularizer(0.0005), scope='conv1')
net = slim.conv2d(net, 128, [11, 11], padding='VALID',
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                  weights_regularizer=slim.l2_regularizer(0.0005), scope='conv2')
net = slim.conv2d(net, 256, [11, 11], padding='SAME',
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                  weights_regularizer=slim.l2_regularizer(0.0005), scope='conv3')

很明显，这三个卷积层共享很多相同的超参数。两个有相同的 padding，三个都有相同的 weights_initializer 和weight_regularizer。这段代码很难读，并且包含了很多重复的值。一个解决方案是使用变量指定默认值：

padding = 'SAME'
initializer = tf.truncated_normal_initializer(stddev=0.01)
regularizer = slim.l2_regularizer(0.0005)
net = slim.conv2d(inputs, 64, [11, 11], 4,
                  padding=padding,
                  weights_initializer=initializer,
                  weights_regularizer=regularizer,
                  scope='conv1')
net = slim.conv2d(net, 128, [11, 11],
                  padding='VALID',
                  weights_initializer=initializer,
                  weights_regularizer=regularizer,
                  scope='conv2')
net = slim.conv2d(net, 256, [11, 11],
                  padding=padding,
                  weights_initializer=initializer,
                  weights_regularizer=regularizer,
                  scope='conv3')

这个解决方案保证了三个卷积层拥有相同的参数值，但代码仍不够清晰。通过使用一个arg_scope，我们能够在保证每一层使用相同参数值的同时，简化代码：

  with slim.arg_scope([slim.conv2d], padding='SAME',
                      weights_initializer=tf.truncated_normal_initializer(stddev=0.01)
                      weights_regularizer=slim.l2_regularizer(0.0005)):
    net = slim.conv2d(inputs, 64, [11, 11], scope='conv1')
    net = slim.conv2d(net, 128, [11, 11], padding='VALID', scope='conv2')
    net = slim.conv2d(net, 256, [11, 11], scope='conv3')

如上例所示，使用 arg_scope 使代码更清晰、简单并且容易去维护。注意，在 arg_scope 内部指定op的参数值时，指定的参数将取代默认参数。具体来讲，当 padding参数的默认值被设置为 ‘SAME’ 时，第二个卷积的 padding 参数被指定为‘VALID’。

我们也可以嵌套地使用 arg_scope，并且在同一个 scope 中可以使用多个op。例如：

with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      activation_fn=tf.nn.relu,
                      weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                      weights_regularizer=slim.l2_regularizer(0.0005)):
  with slim.arg_scope([slim.conv2d], stride=1, padding='SAME'):
    net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID', scope='conv1')
    net = slim.conv2d(net, 256, [5, 5],
                      weights_initializer=tf.truncated_normal_initializer(stddev=0.03),
                      scope='conv2')
    net = slim.fully_connected(net, 1000, activation_fn=None, scope='fc')

在这个例子中，第一个arg_scope 中对 conv2d、fully_connected层使用相同的 weights_initializer。在第二 arg_scope中，给 conv2d 的其它默认参数进行了指定。

4.4 实例：创建VGG网络（Working Example: Specifying the VGG16 Layers）

结合 Slim 模块的variable、operation、scope，我们能够用很少行的代码实现非常复杂的网络。例如，整个 VGG 架构可以使用下面的代码段实现：

def vgg16(inputs):
  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      activation_fn=tf.nn.relu,
                      weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
                      weights_regularizer=slim.l2_regularizer(0.0005)):
    net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
    net = slim.max_pool2d(net, [2, 2], scope='pool1')
    net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
    net = slim.max_pool2d(net, [2, 2], scope='pool2')
    net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
    net = slim.max_pool2d(net, [2, 2], scope='pool3')
    net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
    net = slim.max_pool2d(net, [2, 2], scope='pool4')
    net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
    net = slim.max_pool2d(net, [2, 2], scope='pool5')
    net = slim.fully_connected(net, 4096, scope='fc6')
    net = slim.dropout(net, 0.5, scope='dropout6')
    net = slim.fully_connected(net, 4096, scope='fc7')
    net = slim.dropout(net, 0.5, scope='dropout7')
    net = slim.fully_connected(net, 1000, activation_fn=None, scope='fc8')
  return net

5. 使用 Slim 训练模型（Training Models）

模型的训练需要一个model、一个 loss function、gradient computation 和一个training routine（迭代地计算模型的loss 关于权重的梯度，并且根据梯度对权重进行更新）。Slim 提供了常见的loss函数和一系列训练、评估需要的函数。

5.1 Slim 损失函数（Losses）

根据官方提示，slim.losses 模块将被去除，请使用tf.losses 模块，两者功能完全一致
loss 函数定义了一个我们想要优化的量。对于分类问题，loss 一般是正确的类别分布（true distribution）和预测的类别分布（predicted probability distribution across classes）之间的交叉熵（cross entropy）。对于回归问题，loss 一般是
预测值和真实值之间差值的平方和。

一些模型（比如多任务学习模型）需要同时使用多个loss 函数。换言之，loss 函数最终最小化的量是使用的多个loss函数的和。例如，在一个模型中，同时预测一张图片的场景（the type of scene in an image）和每个像素的景深（the depth from the camera of each pixel）。这个模型的loss 函数将是分类 loss 和depth prediction loss 的和。

Slim 通过losses 模块提供了一个易用的定义、追踪loss 函数的方法。我们以 VGG 网络的训练为一个简单的例子来说明其的使用：

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
vgg = nets.vgg

# Load the images and labels.
images, labels = ...

# Create the model.
predictions, _ = vgg.vgg_16(images)

# Define the loss functions and get the total loss.
loss = slim.losses.softmax_cross_entropy(predictions, labels)

在这个例子中，我们首先创建model（使用 slim.nets.vgg来实现），并且添加标准的分类损失（loss）。现在，让我们研究下多目标模型（产生多个输出）的情况：

# Load the images and labels.
images, scene_labels, depth_labels = ...

# Create the model.
scene_predictions, depth_predictions = CreateMultiTaskModel(images)

# Define the loss functions and get the total loss.
classification_loss = slim.losses.softmax_cross_entropy(scene_predictions, scene_labels)
sum_of_squares_loss = slim.losses.sum_of_squares(depth_predictions, depth_labels)

# The following two lines have the same effect:
total_loss = classification_loss + sum_of_squares_loss
total_loss = slim.losses.get_total_loss(add_regularization_losses=False)

在这个例子中，我们有两个loss（slim.losses.softmax_cross_entropy 和 slim.losses.sum_of_squares）。我们可以通过将两个loss加起来或者调用slim.losses.get_total_loss()来得到总的loss（total_loss）。slim.losses.get_total_loss的工作原理：当用slim 创建一个loss 函数时，slim 会把loss添加到一个特定的容器中。这使得我们既可以手动管理总的loss，也可以使用slim 来管理总loss。

在有一个自定义的loss 的情况下，如果想让 slim 来管理 losses，怎么办呢？loss_ops.py 也有一个函数去将自定义的loss添加到slim 的容器中。例如：

# Load the images and labels.
images, scene_labels, depth_labels, pose_labels = ...

# Create the model.
scene_predictions, depth_predictions, pose_predictions = CreateMultiTaskModel(images)

# Define the loss functions and get the total loss.
classification_loss = slim.losses.softmax_cross_entropy(scene_predictions, scene_labels)
sum_of_squares_loss = slim.losses.sum_of_squares(depth_predictions, depth_labels)
pose_loss = MyCustomLossFunction(pose_predictions, pose_labels)
slim.losses.add_loss(pose_loss) # Letting TF-Slim know about the additional loss.

# The following two ways to compute the total loss are equivalent:
regularization_loss = tf.add_n(slim.losses.get_regularization_losses())
total_loss1 = classification_loss + sum_of_squares_loss + pose_loss + regularization_loss

# (Regularization Loss is included in the total loss by default).
total_loss2 = slim.losses.get_total_loss()

在这个例子中，我们既可以手动地产生这个总的loss，也可以让slim 知道额外的loss 并处理losses。

5.2 Slim 训练 Loop（Training Loop）

Slim 为模型的训练提供了很多简单但强有力的工具（见learning.py中）。这包含了一个训练函数（重复地计算loss、计算梯度、将模型保存到 disk）和很多操纵梯度的函数。例如，一旦我们我们已经指定模型、loss 函数、训练方案，我们能够调用 slim.learning.create_train_op 和 slim.learning.train去执行优化：

g = tf.Graph()

# Create the model and specify the losses...
...

total_loss = slim.losses.get_total_loss()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

# create_train_op ensures that each time we ask for the loss, the update_ops
# are run and the gradients being computed are applied too.
train_op = slim.learning.create_train_op(total_loss, optimizer)
logdir = ... # Where checkpoints are stored.

slim.learning.train(
    train_op,
    logdir,
    number_of_steps=1000,
    save_summaries_secs=300,
    save_interval_secs=600)

在这个例子中，slim.learning.train 中的train_op 主要进行两个操作：(a) 计算loss；(b) 进行梯度更新。logdir指定了checkpoint和event文件保存的目录。我们可以指定梯度下降步的数量。在这个例子中，我们指定只执行 1000 步梯度下降。save_summaries_secs=300指定每5分钟计算一次summaries。save_interval_secs=600指定每10分钟保存一个model checkpoint。

5.3 实例：训练 VGG 模型（Working Example: Training the VGG16 Model）

为了说明 Slim 的用法，我们研究下 VGG 网络的训练：

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets

slim = tf.contrib.slim
vgg = nets.vgg

...

train_log_dir = ...
if not tf.gfile.Exists(train_log_dir):
  tf.gfile.MakeDirs(train_log_dir)

with tf.Graph().as_default():
  # Set up the data loading:
  images, labels = ...

  # Define the model:
  predictions = vgg.vgg_16(images, is_training=True)

  # Specify the loss function:
  slim.losses.softmax_cross_entropy(predictions, labels)

  total_loss = slim.losses.get_total_loss()
  tf.summary.scalar('losses/total_loss', total_loss)

  # Specify the optimization scheme:
  optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001)

  # create_train_op that ensures that when we evaluate it to get the loss,
  # the update_ops are done and the gradient updates are computed.
  train_tensor = slim.learning.create_train_op(total_loss, optimizer)

  # Actually runs training.
  slim.learning.train(train_tensor, train_log_dir)

6. 现有模型的微调（Fine-Tuning Existing Models）

6.1 从ckpt中恢复变量的简介（Brief Recap on Restoring Variables from a Checkpoint）

在一个模型训练完毕后，能够使用 tf.train.Saver() 从一个给定的 checkpoint 中恢复Variables。很多情况下，tf.train.Saver() 提供了一个简单的恢复所有或一小部分变量的方法。

# Create some variables.
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Add ops to restore all the variables.
restorer = tf.train.Saver()

# Add ops to restore some variables.
restorer = tf.train.Saver([v1, v2])

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  restorer.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Do some work with the model
  ...

6.2 部分地恢复模型（Partially Restoring Models）

很多时候，我们想去在一个新数据集或甚至一个新的任务上微调（fine-tune）一个已经训练好的网络。在这些情况下，我们能够使用slim的辅助函数去选择一部分的变量来进行恢复：

# Create some variables.
v1 = slim.variable(name="v1", ...)
v2 = slim.variable(name="nested/v2", ...)
...

# Get list of variables to restore (which contains only 'v2'). These are all
# equivalent methods:
variables_to_restore = slim.get_variables_by_name("v2")
# or
variables_to_restore = slim.get_variables_by_suffix("2")
# or
variables_to_restore = slim.get_variables(scope="nested")
# or
variables_to_restore = slim.get_variables_to_restore(include=["nested"])
# or
variables_to_restore = slim.get_variables_to_restore(exclude=["v1"])

# Create the saver which will be used to restore the variables.
restorer = tf.train.Saver(variables_to_restore)

with tf.Session() as sess:
  # Restore variables from disk.
  restorer.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Do some work with the model
  ...

6.3 不同变量名称的模型的恢复（Restoring models with different variable names）

当从一个checkpoint中恢复variables时，Saver会在checkpoint中寻找variable的name，并将它们映射到当前图中的variables。上面，我们在创建一个saver的时候，指定了要恢复的Variable。在这种情况下，会自动调用var.op.name来获得variables的name，然后映射。

当checkpoint文件中的variable names和当前图（graph）中的variable names匹配时，恢复过程很简单。但有时checkpoint中的变量和当前图中的变量有不同的name。在这种情况下，我们必须为Saver提供一个字典，这个字典将checkpoint中的variable name映射到图中的variable。在下面的例子中，我们用了一个简单的函数来获取checkpoint中的variables names：

# Assuming than 'conv1/weights' should be restored from 'vgg16/conv1/weights'
def name_in_checkpoint(var):
  return 'vgg16/' + var.op.name

# Assuming than 'conv1/weights' and 'conv1/bias' should be restored from 'conv1/params1' and 'conv1/params2'
def name_in_checkpoint(var):
  if "weights" in var.op.name:
    return var.op.name.replace("weights", "params1")
  if "bias" in var.op.name:
    return var.op.name.replace("bias", "params2")

variables_to_restore = slim.get_model_variables()
variables_to_restore = {name_in_checkpoint(var):var for var in variables_to_restore}
restorer = tf.train.Saver(variables_to_restore)

with tf.Session() as sess:
  # Restore variables from disk.
  restorer.restore(sess, "/tmp/model.ckpt")

6.4 在一个不同的任务上微调模型（Fine-Tuning a Model on a different task）

上面的例子中，我们有一个预训练好（pre-trained）的VGG16模型。这个模型是在1000类的ImageNet数据集上训练的。但是，我们想要将它应用到只有20类的Pascal VOC数据集上。为了达到这个目的，我们可以用预训练好的模型的参数来初始化我们的新模型（除了最后一层）：

# Load the Pascal VOC data
image, label = MyPascalVocDataLoader(...)
images, labels = tf.train.batch([image, label], batch_size=32)

# Create the model
predictions = vgg.vgg_16(images)

train_op = slim.learning.create_train_op(...)

# Specify where the Model, trained on ImageNet, was saved.
model_path = '/path/to/pre_trained_on_imagenet.checkpoint'

# Specify where the new model will live:
log_dir = '/path/to/my_pascal_model_dir/'

# Restore only the convolutional layers:
variables_to_restore = slim.get_variables_to_restore(exclude=['fc6', 'fc7', 'fc8'])
init_fn = slim.assign_from_checkpoint_fn(model_path, variables_to_restore)

# Start training.
slim.learning.train(train_op, log_dir, init_fn=init_fn)

7. 使用 Slim 评估模型（Evaluating Models）

一旦我们训练完一个模型（或者甚至在模型训练过程中），我们想看看模型在实践中的表现如何。这可以通过选择一组评价指标（evaluation metrics）来实现，这将对模型的性能（performance）进行评估（grade），并且评估代码会真正地加载数据（actually loads the data），进行推理（performs inference），将推理结果和真实情况（ground truth）进行比较，记录评估分数（records the evaluation scores）。该步骤可以执行一次或者周期性重复执行（repeated periodically）。

7.1 Slim 评价指标（Metrics）

我们定义了一个评价指标来作为性能的衡量，这个评价指标不是一个loss函数（在训练过程中，losses是被优化的量），但是这个评价指标对于我们模型的评估十分重要。例如，我们可能想要去最小化对数损失函数（log loss），但是我们的想要的评价指标可能是F1 score (test accuracy)或者IoU (Intersection Over Union score)（这是不可微分的，所以不能被用作losses）

slim提供了很多评价指标操作（metric operation），这些op使得模型的评估变得容易。理论上，计算评价指标的值能够被分为三部分：

初始化（Initialization）：初始化评价指标相关的一些variables
聚合（Aggregation）：执行很多计算评价指标需要的操作（sum等）
完成（Finalization）：(可选) 执行任何计算评价指标的最终操作。例如，计算均值（means）、最小值（mins）、最大值（maxes）等。
例如，为了计算mean_absolute_error，count和total两个变量被初始化为0。在聚合过程中，我们观测（observe）一些predictions和labels，计算误差的绝对值，并且对其求和total。每一次，我们观察另一个值，count就增加一点。最后，在完成阶段，total除以count从而获得误差绝对值的均值。

下面的例子说明了定义metrics的API的使用方法。因为metrics通常在测试数据集上计算，而测试集与训练集（通常loss是在训练集上计算）是不同的，我们将假设正在使用测试数据：

images, labels = LoadTestData(...)
predictions = MyModel(images)

mae_value_op, mae_update_op = slim.metrics.streaming_mean_absolute_error(predictions, labels)
mre_value_op, mre_update_op = slim.metrics.streaming_mean_relative_error(predictions, labels)
pl_value_op, pl_update_op = slim.metrics.percentage_less(mean_relative_errors, 0.3)

正如例子所述，创建一个metric会返回两个值：一个value_op一个update_op。value_op是一个返回metric当前值的idempotent op。update_op执行上面提及的聚合步骤（aggregation step）同时返回metric的值。

追踪每一个value_op 及update_op是非常费力的。为了处理这个问题，slim提供了两个很方便的函数：

# Aggregates the value and update ops in two lists:
value_ops, update_ops = slim.metrics.aggregate_metrics(
    slim.metrics.streaming_mean_absolute_error(predictions, labels),
    slim.metrics.streaming_mean_squared_error(predictions, labels))

# Aggregates the value and update ops in two dictionaries:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    "eval/mean_absolute_error": slim.metrics.streaming_mean_absolute_error(predictions, labels),
    "eval/mean_squared_error": slim.metrics.streaming_mean_squared_error(predictions, labels),
})

7.2 实例：追踪多个评价指标（Working example: Tracking Multiple Metrics）

把所有的代码放在一起：

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets

slim = tf.contrib.slim
vgg = nets.vgg


# Load the data
images, labels = load_data(...)

# Define the network
predictions = vgg.vgg_16(images)

# Choose the metrics to compute:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    "eval/mean_absolute_error": slim.metrics.streaming_mean_absolute_error(predictions, labels),
    "eval/mean_squared_error": slim.metrics.streaming_mean_squared_error(predictions, labels),
})

# Evaluate the model using 1000 batches of data:
num_batches = 1000

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  sess.run(tf.local_variables_initializer())

  for batch_id in range(num_batches):
    sess.run(names_to_updates.values())

  metric_values = sess.run(names_to_values.values())
  for metric, value in zip(names_to_values.keys(), metric_values):
    print('Metric %s has value: %f' % (metric, value))

注意：metric_ops.py可以在不使用 layers.py和loss_ops.py的情况下单独使用。

7.3 评估Loop（Evaluation Loop）

slim提供了一个评估模块(evaluation.py)，这个模块包含了编写模型评估脚本（scripts）的辅助函数（这些函数定义在metric_ops.py模块）。这些函数包括周期性运行评估、在batch上计算metrics、print和summarizing metric结果。例如：

import tensorflow as tf

slim = tf.contrib.slim

# Load the data
images, labels = load_data(...)

# Define the network
predictions = MyModel(images)

# Choose the metrics to compute:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    'accuracy': slim.metrics.accuracy(predictions, labels),
    'precision': slim.metrics.precision(predictions, labels),
    'recall': slim.metrics.recall(mean_relative_errors, 0.3),
})

# Create the summary ops such that they also print out to std output:
summary_ops = []
for metric_name, metric_value in names_to_values.iteritems():
  op = tf.summary.scalar(metric_name, metric_value)
  op = tf.Print(op, [metric_value], metric_name)
  summary_ops.append(op)

num_examples = 10000
batch_size = 32
num_batches = math.ceil(num_examples / float(batch_size))

# Setup the global step.
slim.get_or_create_global_step()

output_dir = ... # Where the summaries are stored.
eval_interval_secs = ... # How often to run the evaluation.
slim.evaluation.evaluation_loop(
    'local',
    checkpoint_dir,
    log_dir,
    num_evals=num_batches,
    eval_op=names_to_updates.values(),
    summary_op=tf.summary.merge(summary_ops),
    eval_interval_secs=eval_interval_secs)