TensorFlow-Slim学习笔记

最新推荐文章于 2021-05-28 22:55:23 发布

MioChiu

最新推荐文章于 2021-05-28 22:55:23 发布

阅读量874

点赞数

分类专栏： TensorFlow 文章标签： TensorFlow Slim

TensorFlow 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

最近开始学习SSD-Tensorflow的源码，由于该项目基于Slim实现，便学习了github上Slim说明文档。

TensorFlow-Slim

TF-Slim 是一个轻量的TensorFlow高层封装库，用于在TF中定义、训练和评估复杂模型。

Usage

import tensorflow.contrib.slim as slim

What are the various component of TF-Slim?

arg_scope:提供一个名叫arg_scope的scope，允许用户为该范围的特定操作定义默认参数。
data:包含TF-Slim的数据集，数据提供程序、并行阅读器和解码程序。
evaluation:包含评估模型的例程。
metric:包含常用的评估指标。
nets:包含常用的网络定义如VGG和AlexNet。
queues:提供上下文管理器来简单安全带关闭QueueRunner。
regularizers：包含权重正则化器。
variables:为变量的创建和操作提供便利的封装。

Defining Models

Variables

创建一个weights变量，使用truncated normal分布、l2正则化、部署在CPU上，只需声明以下内容:

weights = slim.variable('weights', shape=[10, 10, 3 , 3],                                      
                                      initializer=tf.truncated_normal_initializer(stddev=0.1), 
                                      regularizer=slim.l2_regularizer(0.05), 
                                      device='/CPU:0')

TensorFlow中有两种类型的变量:regular variable和local variable。绝大多数的变量都是regular variable，它们一经创造就可以用saver被保存在磁盘。local variable 是仅在会话期间存在且不被保存到磁盘的变量。
TF-Slim通过定义model variable进一步区分变量。model variable是表示模型参数的变量。model variable在学习期间被训练或者fine tuning，在评估和前向传播时从checkpoint中被加载。例如那些在slim.fully_connected或者slim.conv2d层创造的变量。Non-model variables是在学习和评估期间用到的其他参数，但并不是执行inference所实际需要的。例如global step和滑动平均变量。

可以通过Slim轻松创建和检索model variable

# Model Variables
weights = slim.model_variable('weights',
                              shape=[10, 10, 3 , 3],
                              initializer=tf.truncated_normal_initializer(stddev=0.1),
                              regularizer=slim.l2_regularizer(0.05),
                              device='/CPU:0')
model_variables = slim.get_model_variables()

# Regular variables
my_var = slim.variable('my_var',
                       shape=[20, 1],
                       initializer=tf.zeros_initializer())
regular_variables_and_model_variables = slim.get_variables()

当你通过TF-Slim的layer或者直接通过slim.model_variable函数创建model variable时，TF-Slim 会将变量添加到集合tf.GraphyKeys.MODEL_VARIABLES中。如果想自定义layer或者变量来创建例程但是仍想让slim来管理，可以通过以下函数：

my_model_variable = CreateViaCustomCode()

# Letting TF-Slim know about the additional variable.
slim.add_model_variable(my_model_variable)

Layers

Slim提供了更抽象的神经网络定义。

input =  ...
net = slim.conv2d(input,128,[3,3],scope='conv1_1')

Slim还提供了repeat和stack允许用户重复执行相同的操作。例如VGG网络中的以下片段，在池化层之间连续执行多个卷积：

net = ...
net = slim.conv2d(net, 256, [3, 3], scope='conv3_1')
net = slim.conv2d(net, 256, [3, 3], scope='conv3_2')
net = slim.conv2d(net, 256, [3, 3], scope='conv3_3')
net = slim.max_pool2d(net, [2, 2], scope='pool2')

一种减少代码重复的方法是使用for循环:

net = ...
for i in range(3):
  net = slim.conv2d(net, 256, [3, 3], scope='conv3_%d' % (i+1))
net = slim.max_pool2d(net, [2, 2], scope='pool2')

也可以通过Slim的repeat：

net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
net = slim.max_pool2d(net, [2, 2], scope='pool2')

注意repeat其实是相当智能的，这里它为每个后续调用的slim.conv2d增加了下划线和迭代编号，如上面示例中范围将命名为”conv3/conv3_1”、”conv3/conv3_2”、”conv3/conv3_3”.
此外，Slim的slim.stack允许调用者使用不同的参数来重复应用相同的操作:

# Verbose way:
x = slim.fully_connected(x, 32, scope='fc/fc_1')
x = slim.fully_connected(x, 64, scope='fc/fc_2')
x = slim.fully_connected(x, 128, scope='fc/fc_3')

# Equivalent, TF-Slim way using slim.stack:
slim.stack(x, slim.fully_connected, [32, 64, 128], scope='fc')

这里slim.stack调用slim.fully_connected三次，但是每次调用的神经元由32变为64到128。同样可以用stack来简化多层卷积：

# Verbose way:
x = slim.conv2d(x, 32, [3, 3], scope='core/core_1')
x = slim.conv2d(x, 32, [1, 1], scope='core/core_2')
x = slim.conv2d(x, 64, [3, 3], scope='core/core_3')
x = slim.conv2d(x, 64, [1, 1], scope='core/core_4')

# Using stack:
slim.stack(x, slim.conv2d, [(32, [3, 3]), (32, [1, 1]), (64, [3, 3]), (64, [1, 1])], scope='core')

Scopes

TF-slim提供了一个名为arg_scope的新作用域机制，该机制允许用户指定一个或多个操作以及一组参数，这些参数将传递给arg_scope中定义的每个操作。以下为示例：

net = slim.conv2d(inputs, 64, [11, 11], 4, padding='SAME',
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                  weights_regularizer=slim.l2_regularizer(0.0005), scope='conv1')
net = slim.conv2d(net, 128, [11, 11], padding='VALID',
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                  weights_regularizer=slim.l2_regularizer(0.0005), scope='conv2')
net = slim.conv2d(net, 256, [11, 11], padding='SAME',
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                  weights_regularizer=slim.l2_regularizer(0.0005), scope='conv3')

可以看到三层共享许多参数。这段代码难以阅读，有许多重复的地方可以改进。一种解决方案是使用变量指定默认值：

padding = 'SAME'
initializer = tf.truncated_normal_initializer(stddev=0.01)
regularizer = slim.l2_regularizer(0.0005)
net = slim.conv2d(inputs, 64, [11, 11], 4,
                  padding=padding,
                  weights_initializer=initializer,
                  weights_regularizer=regularizer,
                  scope='conv1')
net = slim.conv2d(net, 128, [11, 11],
                  padding='VALID',
                  weights_initializer=initializer,
                  weights_regularizer=regularizer,
                  scope='conv2')
net = slim.conv2d(net, 256, [11, 11],
                  padding=padding,
                  weights_initializer=initializer,
                  weights_regularizer=regularizer,
                  scope='conv3')

这种方案确保了三个卷积层共享的参数，但并没有减少代码的混乱。通过使用arg_scope,可以简化代码：

  with slim.arg_scope([slim.conv2d], padding='SAME',
                      weights_initializer=tf.truncated_normal_initializer(stddev=0.01)
                      weights_regularizer=slim.l2_regularizer(0.0005)):
    net = slim.conv2d(inputs, 64, [11, 11], scope='conv1')
    net = slim.conv2d(net, 128, [11, 11], padding='VALID', scope='conv2')
    net = slim.conv2d(net, 256, [11, 11], scope='conv3')

如上，使用arg_scope使得代码更简单清晰。注意，虽然参数在arg_scope中制定了默认值，但它们可以在本地重新覆盖。
也可以嵌套使用多个arg_scope:

with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      activation_fn=tf.nn.relu,
                      weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                      weights_regularizer=slim.l2_regularizer(0.0005)):
  with slim.arg_scope([slim.conv2d], stride=1, padding='SAME'):
    net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID', scope='conv1')
    net = slim.conv2d(net, 256, [5, 5],
                      weights_initializer=tf.truncated_normal_initializer(stddev=0.03),
                      scope='conv2')
    net = slim.fully_connected(net, 1000, activation_fn=None, scope='fc')

Working Example:Specifying the VGG Layers

通过Slim的variable、layer、scope，我们可以通过很少的代码编写一个复杂的网络，例如可以通过以下代码定义整个VGG网络：

def vgg16(inputs):
  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      activation_fn=tf.nn.relu,
                      weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
                      weights_regularizer=slim.l2_regularizer(0.0005)):
    net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
    net = slim.max_pool2d(net, [2, 2], scope='pool1')
    net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
    net = slim.max_pool2d(net, [2, 2], scope='pool2')
    net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
    net = slim.max_pool2d(net, [2, 2], scope='pool3')
    net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
    net = slim.max_pool2d(net, [2, 2], scope='pool4')
    net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
    net = slim.max_pool2d(net, [2, 2], scope='pool5')
    net = slim.fully_connected(net, 4096, scope='fc6')
    net = slim.dropout(net, 0.5, scope='dropout6')
    net = slim.fully_connected(net, 4096, scope='fc7')
    net = slim.dropout(net, 0.5, scope='dropout7')
    net = slim.fully_connected(net, 1000, activation_fn=None, scope='fc8')
  return net

Training Models

训练TensorFlow模型需要损失函数、梯度计算和训练例程，迭代的计算权重的损失并更新权重。

Losses

对于分类问题，损失常常是真实分布和预测分布的交叉熵。对于回归问题，通常是预测值和真实值的平方和误差。
有时候(比如多任务学习模型)最终最小化的损失函数时多种损失函数的综合。
Slim使用losses模块定义和跟踪损耗函数。

考虑一个简单情况：

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
vgg = nets.vgg

# Load the images and labels.
images, labels = ...

# Create the model.
predictions, _ = vgg.vgg_16(images)

# Define the loss functions and get the total loss.
loss = slim.losses.softmax_cross_entropy(predictions, labels)

考虑一个有多输出的多任务模型：

# Load the images and labels.
images, scene_labels, depth_labels = ...

# Create the model.
scene_predictions, depth_predictions = CreateMultiTaskModel(images)

# Define the loss functions and get the total loss.
classification_loss = slim.losses.softmax_cross_entropy(scene_predictions, scene_labels)
sum_of_squares_loss = slim.losses.sum_of_squares(depth_predictions, depth_labels)

# The following two lines have the same effect:
total_loss = classification_loss + sum_of_squares_loss
total_loss = slim.losses.get_total_loss(add_regularization_losses=False)

这个例子里，我们有交叉熵和平方和两种误差。total_loss = classification_loss + sum_of_squares_loss和total_loss = slim.losses.get_total_loss(add_regularization_losses=False) 是等价的。当通过Slim创建损失函数时，TF-Slim会将损失添加到特定的损失函数集合中。
若想让Slim来帮助您管理一个您自定义的损失函数， loss_ops.py 还有一个函数可以将这种损失添加到TF-Slims集合中。例如：

# Load the images and labels.
images, scene_labels, depth_labels, pose_labels = ...

# Create the model.
scene_predictions, depth_predictions, pose_predictions = CreateMultiTaskModel(images)

# Define the loss functions and get the total loss.
classification_loss = slim.losses.softmax_cross_entropy(scene_predictions, scene_labels)
sum_of_squares_loss = slim.losses.sum_of_squares(depth_predictions, depth_labels)
pose_loss = MyCustomLossFunction(pose_predictions, pose_labels)
slim.losses.add_loss(pose_loss) # Letting TF-Slim know about the additional loss.

# The following two ways to compute the total loss are equivalent:
regularization_loss = tf.add_n(slim.losses.get_regularization_losses())
total_loss1 = classification_loss + sum_of_squares_loss + pose_loss + regularization_loss

# (Regularization Loss is included in the total loss by default).
total_loss2 = slim.losses.get_total_loss()

Training Loop

learning.py中提供了训练模型的工具。一旦我们定义了模型，我们就可以调用slim.learning.create_train_op 和slim.learning.train 执行优化：

g = tf.Graph()

# Create the model and specify the losses...
...

total_loss = slim.losses.get_total_loss()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

# create_train_op ensures that each time we ask for the loss, the update_ops
# are run and the gradients being computed are applied too.
train_op = slim.learning.create_train_op(total_loss, optimizer)
logdir = ... # Where checkpoints are stored.

slim.learning.train(
    train_op,
    logdir,
    number_of_steps=1000,
    save_summaries_secs=300,
    save_interval_secs=600):

以上示例中，slim.learning.train 提供train_op 用于计算损失和更新梯度。logdir 指定了checkpoint保存地址。最后，save_summaries_secs=300表示我们将每5分钟计算一次摘要，save_interval_secs=600表示我们将每10分钟保存一次checjpoint。

Working Example:Traing the VGG16 Model

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets

slim = tf.contrib.slim
vgg = nets.vgg

...

train_log_dir = ...
if not tf.gfile.Exists(train_log_dir):
  tf.gfile.MakeDirs(train_log_dir)

with tf.Graph().as_default():
  # Set up the data loading:
  images, labels = ...

  # Define the model:
  predictions = vgg.vgg_16(images, is_training=True)

  # Specify the loss function:
  slim.losses.softmax_cross_entropy(predictions, labels)

  total_loss = slim.losses.get_total_loss()
  tf.summary.scalar('losses/total_loss', total_loss)

  # Specify the optimization scheme:
  optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001)

  # create_train_op that ensures that when we evaluate it to get the loss,
  # the update_ops are done and the gradient updates are computed.
  train_tensor = slim.learning.create_train_op(total_loss, optimizer)

  # Actually runs training.
  slim.learning.train(train_tensor, train_log_dir)

Fine-Tuning Existing Models

Brief Recap on Restoring Variablefrom a Checkpoint

使用tf.train.Saver()恢复变量

# Create some variables.
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Add ops to restore all the variables.
restorer = tf.train.Saver()

# Add ops to restore some variables.
restorer = tf.train.Saver([v1, v2])

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  restorer.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Do some work with the model
  ...

Partially Restoring Models

常常需要在新的数据集和新任务上微调预训练模型，在这种情况下可以使用Slim的辅助函数来选择要恢复的变量子集:

# Create some variables.
v1 = slim.variable(name="v1", ...)
v2 = slim.variable(name="nested/v2", ...)
...

# Get list of variables to restore (which contains only 'v2'). These are all
# equivalent methods:
variables_to_restore = slim.get_variables_by_name("v2")
# or
variables_to_restore = slim.get_variables_by_suffix("2")
# or
variables_to_restore = slim.get_variables(scope="nested")
# or
variables_to_restore = slim.get_variables_to_restore(include=["nested"])
# or
variables_to_restore = slim.get_variables_to_restore(exclude=["v1"])

# Create the saver which will be used to restore the variables.
restorer = tf.train.Saver(variables_to_restore)

with tf.Session() as sess:
  # Restore variables from disk.
  restorer.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Do some work with the model
  ...

Restoring model with different variable names

从Checkpoint恢复变量时，Saver将变量名放在checkpoint文件中并将它们映射到当前计算图的变量里。在上面的例子里我们传递一个变量列表来创建一个saver。这种情况下，从每个提供的变量的var.op.name中隐式的获得checkpoint文件中的变量名。
这种方式当checkpoint中的变量名与当前计算图的变量名匹配时运作的很好，但是有时候我们想要恢复的变量名与计算图中的变量名不一样，这张情况下我们必须提供给Saver一个字典，这个字典将checkpoint变量名映射到图中的变量名。

# Assuming than 'conv1/weights' should be restored from 'vgg16/conv1/weights'
def name_in_checkpoint(var):
  return 'vgg16/' + var.op.name

# Assuming than 'conv1/weights' and 'conv1/bias' should be restored from 'conv1/params1' and 'conv1/params2'
def name_in_checkpoint(var):
  if "weights" in var.op.name:
    return var.op.name.replace("weights", "params1")
  if "bias" in var.op.name:
    return var.op.name.replace("bias", "params2")

variables_to_restore = slim.get_model_variables()
variables_to_restore = {name_in_checkpoint(var):var for var in variables_to_restore}
restorer = tf.train.Saver(variables_to_restore)

with tf.Session() as sess:
  # Restore variables from disk.
  restorer.restore(sess, "/tmp/model.ckpt")

Fine-Tuning a Model on a different task

考虑以下情况：我们有预训练的VGG16模型，这个模型是在ImageNet数据集下训练的，它有1000个类别，我们想将这个模型应用到Pascal VOC数据集上，这个数据集只有20个类别。我们可以用不包含最后一层的预训练模型来初始化新模型：

# Load the Pascal VOC data
image, label = MyPascalVocDataLoader(...)
images, labels = tf.train.batch([image, label], batch_size=32)

# Create the model
predictions = vgg.vgg_16(images)

train_op = slim.learning.create_train_op(...)

# Specify where the Model, trained on ImageNet, was saved.
model_path = '/path/to/pre_trained_on_imagenet.checkpoint'

# Specify where the new model will live:
log_dir = '/path/to/my_pascal_model_dir/'

# Restore only the convolutional layers:
variables_to_restore = slim.get_variables_to_restore(exclude=['fc6', 'fc7', 'fc8'])
init_fn = assign_from_checkpoint_fn(model_path, variables_to_restore)

# Start training.
slim.learning.train(train_op, log_dir, init_fn=init_fn)

Evaluating Models

Metric

images, labels = LoadTestData(...)
predictions = MyModel(images)

mae_value_op, mae_update_op = slim.metrics.streaming_mean_absolute_error(predictions, labels)
mre_value_op, mre_update_op = slim.metrics.streaming_mean_relative_error(predictions, labels)
pl_value_op, pl_update_op = slim.metrics.percentage_less(mean_relative_errors, 0.3)

# Aggregates the value and update ops in two lists:
value_ops, update_ops = slim.metrics.aggregate_metrics(
    slim.metrics.streaming_mean_absolute_error(predictions, labels),
    slim.metrics.streaming_mean_squared_error(predictions, labels))

# Aggregates the value and update ops in two dictionaries:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    "eval/mean_absolute_error": slim.metrics.streaming_mean_absolute_error(predictions, labels),
    "eval/mean_squared_error": slim.metrics.streaming_mean_squared_error(predictions, labels),
})

Working example:Tracking Multiple Metrics

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets

slim = tf.contrib.slim
vgg = nets.vgg


# Load the data
images, labels = load_data(...)

# Define the network
predictions = vgg.vgg_16(images)

# Choose the metrics to compute:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    "eval/mean_absolute_error": slim.metrics.streaming_mean_absolute_error(predictions, labels),
    "eval/mean_squared_error": slim.metrics.streaming_mean_squared_error(predictions, labels),
})

# Evaluate the model using 1000 batches of data:
num_batches = 1000

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  sess.run(tf.local_variables_initializer())

  for batch_id in range(num_batches):
    sess.run(names_to_updates.values())

  metric_values = sess.run(names_to_values.values())
  for metric, value in zip(names_to_values.keys(), metric_values):
    print('Metric %s has value: %f' % (metric, value))

Evaluation Loop

import tensorflow as tf

slim = tf.contrib.slim

# Load the data
images, labels = load_data(...)

# Define the network
predictions = MyModel(images)

# Choose the metrics to compute:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    'accuracy': slim.metrics.accuracy(predictions, labels),
    'precision': slim.metrics.precision(predictions, labels),
    'recall': slim.metrics.recall(mean_relative_errors, 0.3),
})

# Create the summary ops such that they also print out to std output:
summary_ops = []
for metric_name, metric_value in names_to_values.iteritems():
  op = tf.summary.scalar(metric_name, metric_value)
  op = tf.Print(op, [metric_value], metric_name)
  summary_ops.append(op)

num_examples = 10000
batch_size = 32
num_batches = math.ceil(num_examples / float(batch_size))

# Setup the global step.
slim.get_or_create_global_step()

output_dir = ... # Where the summaries are stored.
eval_interval_secs = ... # How often to run the evaluation.
slim.evaluation.evaluation_loop(
    'local',
    checkpoint_dir,
    log_dir,
    num_evals=num_batches,
    eval_op=names_to_updates.values(),
    summary_op=tf.summary.merge(summary_ops),
    eval_interval_secs=eval_interval_secs)