[TensorFlow-Tutorial] ==> 使用Estimator构建CNN卷积神经网络+L2正则化实现+Early_Stopping实现

最新推荐文章于 2024-07-02 17:30:24 发布

ASR_THU

最新推荐文章于 2024-07-02 17:30:24 发布

阅读量2.4k

点赞数

分类专栏： tensorflow 文章标签： tensorflow

本文链接：https://blog.csdn.net/zongza/article/details/85015595

版权

tensorflow 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

CNN with Estimator官网教程：https://tensorflow.google.cn/tutorials/estimators/cnn

自定义Estimator指南：https://www.tensorflow.org/guide/custom_estimators

利用Estimator搭建神经网络模型一般需要定义五个函数，他们分别是：

define_flags() : 用于创建必要的参数，该函数利用自定义的flags_core模块一键创建一些运行model所需要的基本参数（如model_dir,train_epoch等）如果你想增加自定义的参数（如dropout_rate），可以使用absl.flags模块
input_fn() : 用于将数据转化成dataset，run_loop中调用模型三个子功能的时候会用到（如果你的数据集是csv格式的，还需要定义feature_column()函数来帮助networks()构建输入层input_layer，关于数据读取可参考这篇）
networks(input,label,params...) : 用于构建神经网络的结构层次（layers）, 函数输出是模型得到的逻辑值
model_fn(features,labels,mode,params) : 用于实现模型的训练（train）和评估（eval）、预测（predict）三个具体功能，通过调用networks()返回的logits做进一步处理 [ 注意该函数的四个参数是固定的，你不能自己随便定义新参数 ]
run_loop(flags) : 将模型实例化，调用模型的训练（train）和评估（eval）、预测（predict）三个功能

需要用到的包：

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import numpy as np
import absl
from absl import app as absl_app
from absl import flags

from official.utils.flags import core as flags_core
from official.utils.flags._conventions import help_wrap
from official.utils.logs import hooks_helper
from official.utils.misc import model_helpers
from official.mnist import dataset
from tensorflow.contrib.estimator import stop_if_no_decrease_hook

import os

define_flags()函数：

def define_flags():
    flags_core.define_base()
    flags_core.define_image()
    flags_core.define_performance()
    flags.adopt_module_key_flags(flags_core)
    flags.DEFINE_float(name='dropout_rate',default=0.4,help=help_wrap("dropout rate"))
    flags.DEFINE_float(name='learning_rate', default=0.001, help=help_wrap("learning rate"))
    #对一些特定参数设定默认值
    flags_core.set_defaults(data_dir='D:/pycharm_proj/tensorflow/dataset/mnist_data',
                            model_dir='D:/pycharm_proj/tensorflow/model/mnist_model_estimator_l2_earlyStop',
                            batch_size=100,
                            train_epochs=40,
                            data_format="channels_last",
                            dropout_rate=0.4)

补充：

absl.flags模块用法：https://blog.csdn.net/qq_33757398/article/details/82491411

flags_core模块源码：https://github.com/tensorflow/models/tree/master/official/utils/flags

input_fn()函数：

    def train_input_fn():#这里虽然返回的是一个ds但是实际上这个是被zip(feature,label)的ds,可以直接被parse成feature,label [也就是 model.train中需要input_fn返回的形式]
        ds = dataset.train(params.data_dir)
        ds = ds.cache().shuffle(buffer_size=50000).batch(params.batch_size)
        ds = ds.repeat(params.epochs_between_evals)
        return ds
    def eval_input_fn():
        return dataset.test(params.data_dir).batch(
            params.batch_size).make_one_shot_iterator().get_next()

补充：

dataset.train源码：https://github.com/tensorflow/models/blob/master/official/mnist/dataset.py

networks(input,label,params...) 函数

def networks(features,params,alpha=0,reuse=False,is_train=False):
    with tf.variable_scope('ConvNet', reuse=reuse):
        # tf.logging.INFO('building_model')
        # 卷积池化层开始 也就是CNN中的C
        # reshape
        input_layer = tf.reshape(features, [-1, 28, 28, 1])  # 可能数据本来就是这种形式了，只不过这里声明一下方便后面建模的时候自动获取维度？？
        # conv
        conv1 = tf.layers.conv2d(inputs=input_layer, filters=32, kernel_size=5, padding='same',
                                 activation=tf.nn.relu)  # stride默认是[1,1]也就是每次移动一个格，所以最后卷积之后得到的新图尺寸和原来相同
        # pool
        pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
        # conv
        conv2 = tf.layers.conv2d(inputs=pool1, filters=64, kernel_size=5, padding='same', activation=tf.nn.relu)
        # pool
        pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
        # 全连接层开始 也就是CNN中的NN
        '''
        全连接层（fully connected layers，FC）在整个卷积神经网络中起到“分类器”的作用。如果说卷积层、池化层和激活函数层等操作是将原始数据映射到隐层特征空间的话，
        全连接层则起到将学到的“分布式特征表示”映射到样本标记空间的作用。在实际使用中，全连接层可由卷积操作实现：对前层是全连接的全连接层可以转化为卷积核为1x1的卷积；
        而前层是卷积层的全连接层可以转化为卷积核为hxw的全局卷积，h和w分别为前层卷积结果的高和宽
    
        全连接就是个矩阵乘法，相当于一个特征空间变换，可以把前面所有有用的信息提取整合。再加上激活函数的非线性映射，多层全连接层理论上可以模拟任何非线性变换。但缺点也很明显: 无法保持空间结构。
    
        全连接的一个作用是维度变换，尤其是可以把高维变到低维，同时把有用的信息保留下来。全连接另一个作用是隐含语义的表达(embedding)，把原始特征映射到各个隐语义节点(hidden node)。
        对于最后一层全连接而言，就是分类的显示表达。不同channel同一位置上的全连接等价与1x1的卷积。N个节点的全连接可近似为N个模板卷积后的均值池化(GAP)。
        '''
        # flat
        # pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])#flat层的使用需要自己算展开后的维度？也许可以用
        pool2_flat = tf.contrib.layers.flatten(pool2) #tf.contrib.layers.flatten(P)这个函数就是把P保留第一个维度，把第一个维度包含的每一子张量展开成一个行向量，返回张量是一个二维的， shape=（batch_size，….）,一般用于卷积神经网络全链接层前的预处理
        # dense
        dense1 = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)

        #在dropout之前进行正则化
        if alpha != 0 :
            dense1 = alpha * tf.divide(dense1,tf.norm(dense1,ord='euclidean')) #欧几里得范数

        # dropout 这个不应该是用在整个网络中的吗？为什么可以单独来一层？也许放在最后就是为了对前面所有层进行dropout？？
        #解答：CNN中真正属于神经网络的是NN部分，前面的conv和pool都属于C，在这个卷积神经网络中，NN只有一层,,,也就是上面的dense1（后面的logits的dense是输出层，对于输出层不能使用dropout）,,,因此确实是一层nn对应一层dropout
        dropout = tf.layers.dropout(inputs=dense1, rate=params['dropout_rate'], training=is_train)
        # logits
        logits = tf.layers.dense(inputs=dropout,units=10)  # 注意logit层不需要设置激活函数（后面会用到softmax来‘激活’）因为这里还需要raw data来决定类别（激活后变成了可能性）

    return logits

补充：

如何构建CNN的各个层:https://tensorflow.google.cn/tutorials/estimators/cnn#building_the_cnn_mnist_classifier

CNN原理的理解：https://cs231n.github.io/convolutional-networks/#conv

使用TensorFlow实现L2正则化约束的softmax损失函数：https://blog.csdn.net/CoderPai/article/details/78931377

model_fn(features,labels,mode,params) 函数：

def cnn_model_fn(features,labels,mode,params): #参数必须得用features labels不能用其他名字
    '''
    activation方法在tf.nn中
    loss计算方法在tf.losses中
    layer定义方法在tf.layers中
    acc计算方法在tf.metrics中
    optimizer定义方法在tf.train中
    其他工具都直接在tf中

    :param data:
    :param label:
    :param mode:
    :return:
    '''
        #logits
    alpha = 30
    # Because Dropout have different behavior at training and prediction time, we
    # need to create 2 distinct computation graphs that still share the same weights.
    logits_l2 = networks(features, params=params, reuse=False, alpha=alpha,is_train=True)

    # At test time we don't need to normalize or scale, it's redundant as per paper : https://arxiv.org/abs/1703.09507
    logits = networks(features, params=params, reuse=True, alpha=0,is_train=False)#eval和test时需要用到

        #prediction
    prediction={'Pclass':tf.argmax(input=logits,axis=1,name='classes'),'prob':tf.nn.softmax(logits=logits,name='softmax_tensor')}
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode,predictions=prediction)#注意这里传的是整个预测dic，下面eval传的只是prediction中的class
    '''
    EstimatorSpec有如下参数：
      def __new__(cls,
              mode,                     #三种都要指定
              predictions=None,         #PREDICT指定
              loss=None,                #TRAIN 和 EVAL指定
              train_op=None,            #TRAIN指定
              eval_metric_ops=None,     #EVAL指定
              export_outputs=None,
              training_chief_hooks=None,
              training_hooks=None,
              scaffold=None,
              evaluation_hooks=None,
              prediction_hooks=None):
    """Creates a validated `EstimatorSpec` instance.

    Depending on the value of `mode`, different arguments are required. Namely

    * For `mode == ModeKeys.TRAIN`: required fields are `loss` and `train_op`.
    * For `mode == ModeKeys.EVAL`: required field is `loss`.
    * For `mode == ModeKeys.PREDICT`: required fields are `predictions`.

    model_fn can populate all arguments independent of mode. In this case, some
    arguments will be ignored by an `Estimator`. E.g. `train_op` will be
    ignored in eval and infer modes. Example:

    ```python
    def my_model_fn(features, labels, mode):
      predictions = ...
      loss = ...
      train_op = ...
      return tf.estimator.EstimatorSpec(
          mode=mode,
          predictions=predictions,
          loss=loss,
          train_op=train_op)
    ```

    Alternatively, model_fn can just populate the arguments appropriate to the
    given mode. Example:

    ```python
    def my_model_fn(features, labels, mode):
      if (mode == tf.estimator.ModeKeys.TRAIN or
          mode == tf.estimator.ModeKeys.EVAL):
        loss = ...
      else:
        loss = None
      if mode == tf.estimator.ModeKeys.TRAIN:
        train_op = ...
      else:
        train_op = None
      if mode == tf.estimator.ModeKeys.PREDICT:
        predictions = ...
      else:
        predictions = None

      return tf.estimator.EstimatorSpec(
          mode=mode,
          predictions=predictions,
          loss=loss,
          train_op=train_op)
    ```

    Args:
      mode: A `ModeKeys`. Specifies if this is training, evaluation or
        prediction.
      predictions: Predictions `Tensor` or dict of `Tensor`.
      loss: Training loss `Tensor`. Must be either scalar, or with shape `[1]`.
      train_op: Op for the training step.
      eval_metric_ops: Dict of metric results keyed by name.
        The values of the dict can be one of the following:
        (1) instance of `Metric` class.
        (2) Results of calling a metric function, namely a
        `(metric_tensor, update_op)` tuple. `metric_tensor` should be
        evaluated without any impact on state (typically is a pure computation
        results based on variables.). For example, it should not trigger the
        `update_op` or requires any input fetching.
      export_outputs: Describes the output signatures to be exported to
        `SavedModel` and used during serving.
        A dict `{name: output}` where:
        * name: An arbitrary name for this output.
        * output: an `ExportOutput` object such as `ClassificationOutput`,
            `RegressionOutput`, or `PredictOutput`.
        Single-headed models only need to specify one entry in this dictionary.
        Multi-headed models should specify one entry for each head, one of
        which must be named using
        signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY.
        If no entry is provided, a default `PredictOutput` mapping to
        `predictions` will be created.
      training_chief_hooks: Iterable of `tf.train.SessionRunHook` objects to
        run on the chief worker during training.
      training_hooks: Iterable of `tf.train.SessionRunHook` objects to run
        on all workers during training.
      scaffold: A `tf.train.Scaffold` object that can be used to set
        initialization, saver, and more to be used in training.
      evaluation_hooks: Iterable of `tf.train.SessionRunHook` objects to
        run during evaluation.
      prediction_hooks: Iterable of `tf.train.SessionRunHook` objects to
        run during predictions.

    Returns:
      A validated `EstimatorSpec` object.

    Raises:
      ValueError: If validation fails.
      TypeError: If any of the arguments is not the expected type.
    """
    '''

    #tf.summary.scalar('cross_entropy_test', loss)
    if mode == tf.estimator.ModeKeys.TRAIN:
        loss = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy(labels=labels,
                                                                     logits=logits_l2),name='train_loss')  # sparse是指传入的logits会被进一步处理为tf.argmax()
        opt = tf.train.AdamOptimizer(learning_rate=0.001)
        opt_op = opt.minimize(loss=loss,global_step=tf.train.get_global_step())

        #model.train()里传入了hook，所以下面的identity需要对hook勾取的信息进行说明
        # 控制台输出日志信息，hook用来声明输出什么信息，下面的则是在构建graph的时候(通过创建一个节点来)指明这些信息的值来源于那些变量或者op（不这样做会报错，和hook传给model的信息不符or缺失）
        # 这样只是能在控制台中显示，想要在tensorboard中查看还要tf.summary.scalar()
        # accuracys = tf.metrics.accuracy(
        #     labels=labels, predictions=prediction['Pclass'])
        # tf.identity(params['learning_rate'], 'learning_rate')
        # tf.identity(loss, 'cross_entropy')  # loss是带l2的loss,是真正的loss
        # tf.identity(accuracys[1], name='train_accuracy')  # acc是不带l2的acc，相当于对train的真实predict
        #tf.summary.scalar('train_accuracy', accuracys[1]) #tensorboard中最后一个图是这个函数画的，其他的loss图acc图都是
        #tf.summary.scalar('cross_entropy',loss) #这个的输出是只有train_loss

        return tf.estimator.EstimatorSpec(mode=mode,loss=loss,train_op=opt_op)
        #eval_metrics
    eval_metric_ops={'accuracy':tf.metrics.accuracy(labels=labels,predictions=prediction['Pclass'])}
    loss = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits_l2),name='eval_loss')  # sparse是指传入的logits会被进一步处理为tf.argmax()
    return tf.estimator.EstimatorSpec(mode=mode,loss=loss,eval_metric_ops=eval_metric_ops)#评估当然既需要acc也需要loss，前者由eval_metric_ops提供，所以不再需要prediction参数
    #train和eval都有loss传入，然后EstimatorSpec自动将他们写进events.out.tfevents.xxx(train的loss在在model_dir中，eval的loss在model_dir/eval中)两者的loss在可以同一个图中显示，根据文件夹名称不同会显示不同颜色
    # 此外eval还传入了eval_metric_ops，这个也会被自动加入hook，从而被记录在events.out.tfevents.xxx中，所以tb中也有一个叫accuracy的图
    # train中根据具体实现还会记录global_step，且图名就是global_step
'''
def _train_with_estimator_spec(self, estimator_spec, worker_hooks, hooks,global_step_tensor, saving_listeners):
    """Train a model with the given Estimator Spec."""
    if self._warm_start_settings:
      logging.info('Warm-starting with WarmStartSettings: %s' %
                   (self._warm_start_settings,))
      warm_starting_util.warm_start(*self._warm_start_settings)
    # Check if the user created a loss summary, and add one if they didn't.
    # We assume here that the summary is called 'loss'. If it is not, we will
    # make another one with the name 'loss' to ensure it shows up in the right
    # graph in TensorBoard.
    if not any([x.op.name == 'loss'
                for x in ops.get_collection(ops.GraphKeys.SUMMARIES)]):
      summary.scalar('loss', estimator_spec.loss)
    ops.add_to_collection(ops.GraphKeys.LOSSES, estimator_spec.loss)
    worker_hooks.extend(hooks)
    worker_hooks.append(
        training.NanTensorHook(estimator_spec.loss)
    )
    if self._config.log_step_count_steps is not None:
      worker_hooks.append(
          training.LoggingTensorHook(
              {
                  'loss': estimator_spec.loss,
                  'step': global_step_tensor
              },
              every_n_iter=self._config.log_step_count_steps)
      )
    worker_hooks.extend(estimator_spec.training_hooks)

    if not (estimator_spec.scaffold.saver or
            ops.get_collection(ops.GraphKeys.SAVERS)):
      ops.add_to_collection(
          ops.GraphKeys.SAVERS,
          training.Saver(
              sharded=True,
              max_to_keep=self._config.keep_checkpoint_max,
              keep_checkpoint_every_n_hours=(
                  self._config.keep_checkpoint_every_n_hours),
              defer_build=True,
              save_relative_paths=True))

    chief_hooks = []
    all_hooks = worker_hooks + list(estimator_spec.training_chief_hooks)
    saver_hooks = [
        h for h in all_hooks if isinstance(h, training.CheckpointSaverHook)]
    if (self._config.save_checkpoints_secs or
        self._config.save_checkpoints_steps):
      if not saver_hooks:
        chief_hooks = [
            training.CheckpointSaverHook(
                self._model_dir,
                save_secs=self._config.save_checkpoints_secs,
                save_steps=self._config.save_checkpoints_steps,
                scaffold=estimator_spec.scaffold)
        ]
        saver_hooks = [chief_hooks[0]]
    if saving_listeners:
      if not saver_hooks:
        raise ValueError(
            'There should be a CheckpointSaverHook to use saving_listeners. '
            'Please set one of the RunConfig.save_checkpoints_steps or '
            'RunConfig.save_checkpoints_secs.')
      else:
        # It is expected to have one CheckpointSaverHook. If multiple, we pick
        # up the first one to add listener.
        saver_hooks[0]._listeners.extend(saving_listeners)  # pylint: disable=protected-access
    with training.MonitoredTrainingSession(
        master=self._config.master,
        is_chief=self._config.is_chief,
        checkpoint_dir=self._model_dir,
        scaffold=estimator_spec.scaffold,
        hooks=worker_hooks,
        chief_only_hooks=(
            tuple(chief_hooks) + tuple(estimator_spec.training_chief_hooks)),
        save_checkpoint_secs=0,  # Saving is handled by a hook.
        save_summaries_steps=self._config.save_summary_steps,
        config=self._session_config,
        log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
      loss = None
      while not mon_sess.should_stop():
        _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
    return loss
'''
    #而在model_fn的train中定义的三个identity和summary则是额外传入日志文件（显式传入而不是像前面那样隐式写入）的，图名就是summary的第一个参数

补充：

Estimator是个高级的API了，他能帮你自动实现loss的acc的记录，从而在tensorboard中查看他们，本模型的TB如下图所示：

如果通过TensorBoard调参可以参考：https://www.jianshu.com/p/d059ffea9ec0

如果你想对自己中意的变量进行summary（Estimator只对loss，step，acc进行自动记录），以便在tensorboard中查看，需要先在run_loop()中实例化一些hooks（以待调用模型的功能时当做参数传入），然后在model_fn中显式地声明hooks钩取的变量来自哪（ tf.identity(相关变量) ）以及写入event日志（ tf.summary.scalar(相关变量) ）

run_loop(flags) 函数:

def run_mnist(params):
    model_helpers.apply_clean(params)  # 清空model_dir文件夹下的旧文件

        #实例化estimator
    paramsdic = params.flag_values_dict()
    model = tf.estimator.Estimator(model_fn=cnn_model_fn,model_dir=params.model_dir,params=paramsdic) #Estimator的构造函数会把params传给model_fn
    #为啥不能params=params??因为传入的params是一个类！！！absl.flags._flagvalues.FlagValues类，需要调用函数flag_values_dict()将他的属性转化成dic才能被传入model_fn
    #没转化成dic时用params.dropout_rate   代表取出属性
    #转化成dic后用params['dropout_rate']  #代表取出key对应的value

        #实例化hooks(用于监控台输出程序运行的记录日志，记录哪些量由tensor_to_log字典给出)而tensorboard的图似乎和hook没关系？
    tensor_to_log={'prob':'softmax_tensor'}#打印prob，其值来源于softmax_tensor
    train_hooks = hooks_helper.get_train_hooks(name_list=params.hooks,model_dir=params.model_dir,)#tensors_to_log=tensor_to_log)

    os.makedirs(model.eval_dir())
    train_hoooks_for_earlyStoping = stop_if_no_decrease_hook(model,eval_dir=model.eval_dir(),metric_name='accuracy',max_steps_without_decrease=1000,min_steps=100)
    #必须使用loss而不是eval_loss,因为train里自动记录的是名字为‘loss’的值
        #input_fn函数
    def train_input_fn():#这里虽然返回的是一个ds但是实际上这个是被zip(feature,label)的ds,可以直接被parse成feature,label [也就是 model.train中需要input_fn返回的形式]
        ds = dataset.train(params.data_dir)
        ds = ds.cache().shuffle(buffer_size=50000).batch(params.batch_size)
        ds = ds.repeat(params.epochs_between_evals)
        return ds
    def eval_input_fn():
        return dataset.test(params.data_dir).batch(
            params.batch_size).make_one_shot_iterator().get_next()
        #每次返回一个(fea,lab)对??
        #为啥eval的input返回的是迭代器而train的input返回的是整个的dataset？？

        #train和eval
    for i in range(params.train_epochs // params.epochs_between_evals):
        # tf.estimator.train_and_evaluate(model,train_spec=tf.estimator.TrainSpec(train_input_fn,hooks=[train_hoooks_for_earlyStoping]),
        #                                 eval_spec=tf.estimator.EvalSpec(eval_input_fn))
        model.train(input_fn=train_input_fn,hooks=[train_hoooks_for_earlyStoping])# 如果这里参数传入了 hooks=train_hooks 那么model_fn中的train就要把注释的几个identity解开
        if train_hoooks_for_earlyStoping.stopFlag == True :
            break
        eval_results = model.evaluate(input_fn=eval_input_fn)
        print('\nEvaluation results:\n\t%s\n' % eval_results)

        if model_helpers.past_stop_threshold(params.stop_threshold,
                                             eval_results['accuracy']):
            break

补充：

如何在estimator中实现Early_Stopping？借助hooks：https://blog.csdn.net/zongza/article/details/85017351

main函数：

def main(_):
    #print(flags.FLAGS)
    #print(type(flags.FLAGS))
    run_mnist(flags.FLAGS)

if __name__ == '__main__':
    tf.logging.set_verbosity(tf.logging.INFO)
    define_flags()
    absl_app.run(main)