SSD源码解析3-ssd_model_fn()

最新推荐文章于 2024-07-02 17:30:24 发布

业余狙击手19

最新推荐文章于 2024-07-02 17:30:24 发布

阅读量1.2k

点赞数 1

分类专栏： # 目标检测算法

本文链接：https://blog.csdn.net/sxlsxl119/article/details/103190116

版权

目标检测算法专栏收录该内容

28 篇文章 17 订阅

订阅专栏

SSD源码解析4-损失函数（理论+源码）

参考文章：

博客园：深度学习笔记（七）SSD 论文阅读笔记简化

知乎：SSD

知乎：目标检测|SSD原理与实现

知乎：SSD-TensorFlow 源码解析

TensorFlow之estimator详解

解析代码：

解析源码地址

SSD源码简单版

看了一下两个版本的代码，如上面链接所示，

简单版，代码和之前解析的源码类型是一致的，更容易理解些，但是只有预测部分而没有训练部分。虽然能很容易理解，但里面没有标签处理，损失计算等部分。即使看懂了，也有种啥都没学到的感觉。

复杂版，当时看到这个源代码是有点懵的，为啥呢？因为看不懂啊，之前没见过用这种方式写的代码，套路不太一样。反反复复犹犹豫豫了好几次，想着要不要花点精力看复杂版的，也尝试在github上搜了一下看看有没有更合适的版本，结果是并没有，所以就硬着头皮解析这个比较复杂的代码了。前期是先跳过了看不懂的部分，直接去看网络构建部分，anchor生成部分，计算损失部分，数据预处理部分，但是整体运行逻辑还是有点懵。后来看了一点有关TensorFlow的Estimator讲解，稍微有点眉目，但是还不是很了解，有点不知所以然。主要是Estimator的方式不太习惯，如果只把他当作一种框架，你按它固定的格式传入相应的参数就行，还可以接受些。具体的网络搭建，anchor创建，损失计算等和之前还是一样的。

ssd_model_fn()函数框架

该函数主要是：建立网络结构，计算预测框及类别概率，计算损失等，具体步骤如下：
1，建立网络框架，获得6个特征图，得到cls_pred、location_pred
2，从形成的预测框中筛选出一些正负样本用于计算损失
3，根据筛选的正负样本及标记信息计算损失，包括分类交叉熵损失、回归损失、l2损失
4，训练的一些设置，如学习率，优化器，BN等
5，将得到的各个相关变量传入tf.estimator.EstimatorSpec()

ssd_model_fn()源码：

0，先是获取一些变量

其中global_anchor_info是一个全局变量，在input_pipeline()函数末尾出现过，那里是传入值，这里是取出值。

    shape = labels['shape']
    loc_targets = labels['loc_targets']    # 边框  shape(1,8732,4)   38*38*4+19*19*6+10*10*6+5*5*6+3*3*4+1*1*4=8732
    cls_targets = labels['cls_targets']    # 类别  shape(1,8732)
    match_scores = labels['match_scores']  # 得分  shape(1,8732)

    # global 标志实际上是为了提示 python 解释器，表明被其修饰的变量是全局变量。这样解释器就可以从当前空间 (current scope) 中读写相应变量了。
    global global_anchor_info
    # # 从字典中按关键字获取相应的值
    decode_fn = global_anchor_info['decode_fn']   # decode_fn= ymin, xmin, ymax, xmax
    num_anchors_per_layer = global_anchor_info['num_anchors_per_layer']   # [5776, 2166, 600, 150, 36, 4]
    all_num_anchors_depth = global_anchor_info['all_num_anchors_depth']   # [4, 6, 6, 6, 4, 4]

1，建立网络框架，获得6个特征图，得到cls_pred、location_pred

 #=========1，建立网络框架，获得6个特征图，得到cls_pred、location_pred================================================
    with tf.variable_scope(params['model_scope'], default_name=None, values=[features], reuse=tf.AUTO_REUSE):
        # 建立SSD网络框架 net/ssd_net.py
        backbone = ssd_net.VGG16Backbone(params['data_format'])
        # 获得6个特征图， 6个特征图shape分别为(1,38,38,512),(1,19,19,1024),(1,10,10,512),(1,5,5,256),(1,3,3,256),(1,1,1,256)
        feature_layers = backbone.forward(features, training=(mode == tf.estimator.ModeKeys.TRAIN))
        #print(feature_layers)
        # 通过6个特征图获取预测框和类别概率
        # feature_layers  不同层的特征图，共6个；params类别数21；all_num_anchors_depth [4, 6, 6, 6, 4, 4]
        location_pred, cls_pred = ssd_net.multibox_head(feature_layers, params['num_classes'], all_num_anchors_depth, data_format=params['data_format'])

        if params['data_format'] == 'channels_first':  #格式转换
            cls_pred = [tf.transpose(pred, [0, 2, 3, 1]) for pred in cls_pred]
            location_pred = [tf.transpose(pred, [0, 2, 3, 1]) for pred in location_pred]

        # （batch_size,38,38,84）-》（batch_size,-1,21)，其他层特征图以此类推
        cls_pred = [tf.reshape(pred, [tf.shape(features)[0], -1, params['num_classes']]) for pred in cls_pred]
        # （batch_size,38,38,16）-》（batch_size,-1,4)，其他层特征图以此类推
        location_pred = [tf.reshape(pred, [tf.shape(features)[0], -1, 4]) for pred in location_pred]

        cls_pred = tf.concat(cls_pred, axis=1)     # 6个列表合并成1个
        location_pred = tf.concat(location_pred, axis=1)   # 6个列表合并成1个

        cls_pred = tf.reshape(cls_pred, [-1, params['num_classes']])  # （1,8732,21）-》（8732,21）
        location_pred = tf.reshape(location_pred, [-1, 4])    # （1,8732,4）-》（8732,4）
    # =========1，建立网络框架，获得6个特征图，得到cls_pred、location_pred================================================

其中 VGG16Backbone()在net/ssd_net.py中定义，我画了一个粗略图，我就不详细解析了。

2，从形成的预测框中筛选出一些正负样本用于计算损失

 # =========2，从形成的预测框中筛选出一些正负样本用于计算损失==========================================================
    with tf.device('/cpu:0'):
        with tf.control_dependencies([cls_pred, location_pred]):   # 操作依赖，下面的操作，将会在[cls_pred, location_pred]后再进行。
            with tf.name_scope('post_forward'):
                # location_pred（8732，4）-》bboxes_pred（(1，5776，4),(1,2166,4),(1,600,4),(1,150,4),(1,36,4),(1,4,4)
                bboxes_pred = tf.map_fn(lambda _preds : decode_fn(_preds),   # 这里是把偏移量转换成实际坐标，此处并没有执行，下面会调用
                                        tf.reshape(location_pred, [tf.shape(features)[0], -1, 4]),
                                        dtype=[tf.float32] * len(num_anchors_per_layer), back_prop=False)

                #bboxes_pred（(1，5776，4), (1, 2166, 4), (1, 600, 4), (1, 150, 4), (1, 36, 4), (1, 4, 4)->
                #bboxes_pred（(5776，4), (2166, 4), (600, 4), (150, 4), (36, 4), ( 4, 4)
                bboxes_pred = [tf.reshape(preds, [-1, 4]) for preds in bboxes_pred]
                # bboxes_pred（(5776，4), (2166, 4), (600, 4), (150, 4), (36, 4), ( 4, 4)->(8732,4)
                bboxes_pred = tf.concat(bboxes_pred, axis=0)

                # (1,8732)->(8732,)
                flaten_cls_targets = tf.reshape(cls_targets, [-1])
                flaten_match_scores = tf.reshape(match_scores, [-1])
                #(1,8732,4)->(8732,4)
                flaten_loc_targets = tf.reshape(loc_targets, [-1, 4])

                # 统计实际正样本个数
                # each positive examples has one label每个正样本都有一个标签
                positive_mask = flaten_cls_targets > 0    # positive_mask shape(8732,),值：正样本为为True，非正为False
                n_positives = tf.count_nonzero(positive_mask)   # 统计正样本个数,tf.count_nonzero()默认统计所有的非0个数，n_positives是一个数值
                batch_n_positives = tf.count_nonzero(cls_targets, -1)    # cls_targets shape(1,8732)，统计cls_targets最后一维非0个数，batch_n_positives是一个列表

                # 统计实际负样本个数
                batch_negtive_mask = tf.equal(cls_targets, 0) # 等于0的为True，不等于0的为False,所以是确定负样本标记
                batch_n_negtives = tf.count_nonzero(batch_negtive_mask, -1)   # 统计 batch_negtive_mask最后一维非0（负样本）个数

                # 确定负样本选择数目
                batch_n_neg_select = tf.cast(params['negative_ratio'] * tf.cast(batch_n_positives, tf.float32), tf.int32)  # 负样本权重系数*正样本个数，保证正负样本比例是1:3
                batch_n_neg_select = tf.minimum(batch_n_neg_select, tf.cast(batch_n_negtives, tf.int32))   # 实际正样本数的3倍（egative_ratio）和实际负样本数中取较小值，防止负样本数达不到正样本的3倍

                # hard negative mining for classification   负难 挖掘，就是从负样本里选择一些样本做预测
                predictions_for_bg = tf.nn.softmax(tf.reshape(cls_pred, [tf.shape(features)[0], -1, params['num_classes']]))[:, :, 0]
                # tf.where(input, a,b)，其中a，b均为尺寸一致的tensor，作用是将a中对应input中true的位置的元素值不变，其余元素进行替换，替换成b中对应位置的元素值
                prob_for_negtives = tf.where(batch_negtive_mask,
                                       0. - predictions_for_bg,
                                       # ignore all the positives
                                       0. - tf.ones_like(predictions_for_bg))
                topk_prob_for_bg, _ = tf.nn.top_k(prob_for_negtives, k=tf.shape(prob_for_negtives)[1])
                score_at_k = tf.gather_nd(topk_prob_for_bg, tf.stack([tf.range(tf.shape(features)[0]), batch_n_neg_select - 1], axis=-1))

                selected_neg_mask = prob_for_negtives >= tf.expand_dims(score_at_k, axis=-1)   # 选择的k个负样本标记

                # include both selected negtive and all positive examples
                # 包含所有正样本和筛选的负样本
                final_mask = tf.stop_gradient(tf.logical_or(tf.reshape(tf.logical_and(batch_negtive_mask, selected_neg_mask), [-1]), positive_mask))
                total_examples = tf.count_nonzero(final_mask)   # 最终正负样本总数

                cls_pred = tf.boolean_mask(cls_pred, final_mask)  # 从cls_pred中取final_mask为true的值，分类预测
                location_pred = tf.boolean_mask(location_pred, tf.stop_gradient(positive_mask))  # 从clocation_pred中取positive_mask为true的值，回归预测框
                flaten_cls_targets = tf.boolean_mask(tf.clip_by_value(flaten_cls_targets, 0, params['num_classes']), final_mask)
                flaten_loc_targets = tf.stop_gradient(tf.boolean_mask(flaten_loc_targets, positive_mask))

                # 预测字典
                predictions = {
                            'classes': tf.argmax(cls_pred, axis=-1),
                            'probabilities': tf.reduce_max(tf.nn.softmax(cls_pred, name='softmax_tensor'), axis=-1),
                            'loc_predict': bboxes_pred }

                # tf.metrics指标模块tf.metrics来计算常用的指标，这里是计算正确率
                cls_accuracy = tf.metrics.accuracy(flaten_cls_targets, predictions['classes'])
                metrics = {'cls_accuracy': cls_accuracy}   # 正确率字典

                # Create a tensor named train_accuracy for logging purposes.
                # 创建一个名为train_accuracy的张量用于记录。
                tf.identity(cls_accuracy[1], name='cls_accuracy')
                tf.summary.scalar('cls_accuracy', cls_accuracy[1])
    # =========2，从形成的预测框中筛选出一些正负样本用于计算损失==========================================================

3，根据筛选的正负样本及标记信息计算损失，包括分类交叉熵损失、回归损失、l2损失

# =========3，根据筛选的正负样本及标记信息计算损失，包括分类交叉熵损失、回归损失、l2损失================================
    # 计算损失
    # Calculate loss, which includes softmax cross entropy and L2 regularization.
    # 计算损失，包括softmax交叉熵和L2正则化。

    # 分类的交叉熵损失，并乘以权重系数3+1
    cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=flaten_cls_targets, logits=cls_pred) * (params['negative_ratio'] + 1.)
    # 创建一个名为cross_entropy_loss的张量用于记录。
    tf.identity(cross_entropy, name='cross_entropy_loss')
    tf.summary.scalar('cross_entropy_loss', cross_entropy)

    #*******预测框回归smooth_l1损失
    loc_loss = modified_smooth_l1(location_pred, flaten_loc_targets, sigma=1.)    # 这时是框坐标的偏移量
    loc_loss = tf.reduce_mean(tf.reduce_sum(loc_loss, axis=-1), name='location_loss')
    tf.summary.scalar('location_loss', loc_loss)
    tf.losses.add_loss(loc_loss)

    # l2_loss一般用于优化目标函数中的正则项，防止参数太多复杂容易过拟合
    # l2 计算模型中所有可训练变量（除了带_bn和conv4_3_scale的变量）的l2范数（变形版，求平方和的一半）,l2范数是求平方和开根号
    l2_loss_vars = []
    for trainable_var in tf.trainable_variables():
        if '_bn' not in trainable_var.name:
            if 'conv4_3_scale' not in trainable_var.name:
                l2_loss_vars.append(tf.nn.l2_loss(trainable_var))   #tf.nn.l2_loss 利用L2范数来计算张量的误差值，output = sum(t**2)/2
            else:
                l2_loss_vars.append(tf.nn.l2_loss(trainable_var) * 0.1)
    # Add weight decay to the loss. We exclude the batch norm variables because doing so leads to a small improvement in accuracy.
    # 增加重量衰减。 我们将批处理规范变量排除在外，因为这样做会导致准确性略有提高。
    total_loss = tf.add(cross_entropy + loc_loss, tf.multiply(params['weight_decay'], tf.add_n(l2_loss_vars), name='l2_loss'), name='total_loss')
    # =========3，根据筛选的正负样本及标记信息计算损失，包括分类交叉熵损失、回归损失、l2损失================================

4，训练的一些设置，如学习率，优化器，BN等

# =========4，训练的一些设置，如学习率，优化器，BN等==================================================================
    if mode == tf.estimator.ModeKeys.TRAIN:
        global_step = tf.train.get_or_create_global_step()

        lr_values = [params['learning_rate'] * decay for decay in params['lr_decay_factors']]  # 获取各阶段学习率值  [0.0001, 0.001, 0.0001, 1e-05]
        # 根据global_step得到当前阶段对应的学习率
        learning_rate = tf.train.piecewise_constant(tf.cast(global_step, tf.int32),
                                                    [int(_) for _ in params['decay_boundaries']],
                                                    lr_values)
        # 取当前学习率与最终学习率的最大值，防止低于最终学习率
        truncated_learning_rate = tf.maximum(learning_rate, tf.constant(params['end_learning_rate'], dtype=learning_rate.dtype), name='learning_rate')
        # Create a tensor named learning_rate for logging purposes.
        tf.summary.scalar('learning_rate', truncated_learning_rate)

        optimizer = tf.train.MomentumOptimizer(learning_rate=truncated_learning_rate,    # 动量优化器
                                                momentum=params['momentum'])
        # *****************************************************************************************
        optimizer = tf.contrib.estimator.TowerOptimizer(optimizer)
        # *****************************************************************************************

        # Batch norm requires update_ops to be added as a train_op dependency.
        # BN批处理规范要求将update_ops添加为train_op依赖项。
        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
        with tf.control_dependencies(update_ops):
            train_op = optimizer.minimize(total_loss, global_step)
    else:
        train_op = None
    # =========4，训练的一些设置，如学习率，优化器，BN等==================================================================

5，上面那么多，相当于建立框架，相关的变量和操作需要放到下面tf.estimator.EstimatorSpec中，传入estimator进行训练，感觉类似sess.run里面的feed_dict传入相关操作。

    # *****************************************************************************************
    '''
    # tf.estimator.EstimatorSpec是一个class(类)，是定义在model_fn中的，并且model_fn返回的也是它的一个实例，这个实例是用来初始化Estimator类的
    mode:  其根据不同的mode值，需要不同的参数创建不同的实例（主要是 训练train，验证dev，测试test）：
           For mode==ModeKeys.TRAIN: 需要的参数是 loss and train_op.
           For mode==ModeKeys.EVAL:  需要的参数是  loss.
           For mode==ModeKeys.PREDICT: 需要的参数是 predictions.
    predictions: 模型的预测输出
    loss:损失
    train_op:是一个操作，用来训练
    eval_metric_ops：tf.metrics.accuracy()
    '''
    return tf.estimator.EstimatorSpec(
                              mode=mode,    # 训练，评估，预测
                              predictions=predictions,  # 预测字典，包括类别，概率，预测框
                              loss=total_loss,    # 整个损失，包括分类交叉熵损失，回归损失，l2损失
                              train_op=train_op,   # 上面那么多得到train_op变为参数
                              eval_metric_ops=metrics,  # 正确率字典
                              scaffold=tf.train.Scaffold(init_fn=get_init_fn()))  # 重新加载ckpt，初始化相关变量
    # *****************************************************************************************

业余狙击手19

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SSD源码解析3-ssd_model_fn()

SSD源码解析1-整体结构和框架SSD源码解析2-input_pipeline()SSD源码解析3-ssd_model_fn()SSD源码解析4-损失函数（理论+源码）参考文章：博客园：深度学习笔记（七）SSD 论文阅读笔记简化知乎：SSD知乎：目标检测|SSD原理与实现知乎：SSD-TensorFlow 源码解析TensorFlow之esti...
复制链接

扫一扫