Tensorflow---Faster RCNN网络（细节，代码部分函数）（五）

最新推荐文章于 2024-02-24 13:39:25 发布

进我的收藏吃灰吧~~

最新推荐文章于 2024-02-24 13:39:25 发布

阅读量208

点赞数

分类专栏： Faster R-CNN tensorflow 目标检测文章标签：网络列表卷积 python 深度学习

本文链接：https://blog.csdn.net/weixin_42206075/article/details/112237244

版权

目标检测同时被 3 个专栏收录

41 篇文章 12 订阅

订阅专栏

tensorflow

40 篇文章 2 订阅

订阅专栏

Faster R-CNN

5 篇文章 1 订阅

订阅专栏

Tensorflow—Faster RCNN网络（细节，代码部分函数）（五）

经过了四个blog的介绍，Faster RCNN网络也基本上讲完了，但是里面还有一些细节没有扣明白，这明显就不符合本人的作风啊~所以特地抽出一篇blog来介绍下内部一些难懂的点。
值得注意的，这篇blog的内容大部分是通过在网上自己搜索出来的答案，最后加上自己的理解才写上去的，所以若有错误的地方，还请大佬们指正~

一、utils.convert_collection_to_dict

'''
在原文中的位置：Faster_RCNN_TensorFlow-master\rpn_proposal\vggnet.py（vgg16函数）
'''
  with variable_scope.variable_scope(scope, 'vgg_16', [inputs]) as sc:
    end_points_collection = sc.original_name_scope + '_end_points'
    # Collect outputs for conv2d, fully_connected and max_pool2d.
    with arg_scope(
        [layers.conv2d, layers_lib.fully_connected, layers_lib.max_pool2d],
        outputs_collections=end_points_collection):
							...........
      # Convert end_points_collection into a end_point dict.
      end_points = utils.convert_collection_to_dict(end_points_collection)

**解释：**slim.utils.convert_collection_to_dic把with作用域下面的计算结果，都汇总到一个list中，每个成员是一个tuple，包括每层网络的名字，每一层的输出结果，形状，数据类型
在这里插入图片描述

二、tf.nn.embedding_lookup

'''
在原文中的位置：Faster_RCNN_TensorFlow-master\step1_train_rpn.py
'''
cls_logits = tf.concat([tf.nn.embedding_lookup(cls[i], bbox_indxs[i])[tf.newaxis] for i in range(BATCHSIZE)], axis=0) #shape=(1, 256, 2)
    reg_logits = tf.concat([tf.nn.embedding_lookup(reg[i], bbox_indxs[i])[tf.newaxis] for i in range(BATCHSIZE)], axis=0) #shape=(1, 256, 4)

**解释：**主要是选取一个张量里面指定索引对应的元素

import tensorflow as tf
import numpy as np

n_a = np.random.random([5,1])
a = tf.constant(value=n_a, dtype=tf.float32, shape=[5,1], name="Const")

res = tf.nn.embedding_lookup(
    params=a, #需要筛选的张量
    ids=[0,1], #指定的索引
    partition_strategy="mod",
    name=None,
    validate_indices=True,  # pylint: disable=unused-argument
    max_norm=None
)

sess = tf.InteractiveSession()

print(sess.run(a))
print('*'*50)
print(sess.run(res))

效果：
[[0.8681717 ]
 [0.04858067]
 [0.01002295]
 [0.09027253]
 [0.71396136]]
**************************************************
[[0.8681717 ]
 [0.04858067]]

三、tf.add_n([tf.nn.l2_loss(var) for var in tf.trainable_variables()])

'''
在原文中的位置：Faster_RCNN_TensorFlow-master\step1_train_rpn.py
'''
regular = tf.add_n([tf.nn.l2_loss(var) for var in tf.trainable_variables()])

解释：遍历trainable variables添加L2正则化项，步骤：
1.遍历可训练参数，将每个参数传入tf.nn.l2_loss()进行计算并相加起来；
2.乘以weight_decay并与base_loss相加。

weight_decay = 0.001
 
base_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits))
l2_loss = weight_decay * tf.add_n([tf.nn.l2_loss(tf.cast(v, tf.float32)) for v in tf.trainable_variables()])
loss = base_loss + l2_loss

注意：该过程对每个trainable variable都进行了l2正则化，包括权值w和偏置b。有种说法是如果对偏执b进行l2正则化将会导致欠拟合，一般只需要对权值w进行正则化。

tf.add_n([p1, p2, p3…])函数是实现一个列表的元素的相加。就是输入的对象是一个列表，列表里的元素可以是向量，矩阵等

四、np.union1d与np.setdiff1d

'''
在原文中的位置：Faster_RCNN_TensorFlow-master\rpn_proposal\utils.py（generate_minibatch函数）
'''
illegal_idx0 = np.union1d(np.where(anchors_x1<0)[0], np.where(anchors_x2>=IMG_W)[0])  # (6612,)
illegal_idx1 = np.union1d(np.where(anchors_y1<0)[0], np.where(anchors_y2>=IMG_H)[0])  # (8500,)
illegal_idx = np.union1d(illegal_idx0, illegal_idx1)  # (11204,),即17100个框中有11204个框的xy超过了图片本身
# np.setdiff1d：找到2个数组中集合元素的差异，返回值：在ar1中但不在ar2中的已排序的唯一值。
legal_idx = np.setdiff1d(np.array(range(nums)), illegal_idx)

解释：
1.np.union1d：找到两个array数组的并集
2.np.setdiff1d：找到2个数组中集合元素的差异，返回值：在ar1中但不在ar2中的已排序的唯一值。

np.union1d([-1, 0, 1], [-2, 0, 2])
结果：
array([-2, -1,  0,  1,  2])

------------------------------------------------------
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
b = np.array([3, 4, 7, 6, 7, 8, 11, 12, 14])
c = np.setdiff1d(a,b)
结果：
array([1, 2, 5, 9])

五、tf.image.non_max_suppression

'''
在原文中的位置：step1_train_rpn.py
'''
cls, reg = cls[0], reg[0] #shape=(17100, 2),shape=(17100, 4)
scores = tf.nn.softmax(cls)[:, 1] #shape=(17100,),计算 17100个anchor box属于前景的概率
anchors = tf.constant(anchors, dtype=tf.float32) #shape=(17100, 4)
normal_bbox, reverse_bbox = offset2bbox(reg, anchors)  # reverse_bbox：[y0, x0, y1, x1]； normal_bbox：[x0, y0, x1, y1],shape=(17100,4)
nms_idxs = tf.image.non_max_suppression(reverse_bbox, scores, max_output_size=2000, iou_threshold=NMS_THRESHOLD)  # 非极大值抑制NMS-tf实现api

解释：

tf.image.non_max_suppression(
reverse_bbox,  #待进行nms操作的tensor对象（shape一般为（bbox_num，4），其中4表示的是（y1,x1,y2,x2），这也就解释了为啥传入的是reverse_bbox，而不是normal_bbox）
scores,  #shape为（bbox_num，），表示的是上面的每个bbox对应的score
max_output_size=2000, #最后进行筛选出来的框数量
iou_threshold=NMS_THRESHOLD) #表示判断框是否与IOU重叠过多的阈值

返回：形状[M]的一维整数张量，表示从box张量中选择的框下标，其中M <= max_output_size。

举个例子吧

import tensorflow as tf
import numpy as np

bboxes = tf.constant(value=[[1,2,3,4],[2,3,1,4],[2,4,3,1],[2,3,4,5]],
                dtype=tf.float32)
score = tf.constant(value=[0.7,0.8,0.4,0.6],dtype=tf.float32)

sess = tf.InteractiveSession()
idx = sess.run(tf.image.non_max_suppression(boxes=a,
                        scores=b,
                        max_output_size=3,
                        iou_threshold=0.5,))
re = tf.nn.embedding_lookup(params=a,
    ids=idx,
    partition_strategy="mod")

print(idx)
print('-'*80)
print(sess.run(re))
效果：

[1 0 3]
--------------------------------------------------------------------------------
[[2. 3. 1. 4.]
 [1. 2. 3. 4.]
 [2. 3. 4. 5.]]

六、tf.image.crop_and_resize

'''
在原文中的位置：Faster_RCNN_TensorFlow-master\fast_rcnn\ops.py（roi_pooling函数）
'''
inputs = tf.image.crop_and_resize(inputs, boxes, box_idx, [POOLED_H, POOLED_W])

解释：
这个函数其实就是tf中进行ROI pooling的api，
tf.image.crop_and_resize(
inputs, #输入的图片
boxes, #图片中待进行roi的proposal
box_idx, #boxes和image之间的索引，假如这里的inputs两张图片，那么这个参数就要在这两张图片里面选择了
crop_size，#裁剪尺寸)

七、tf.squeeze

'''在原文中的位置：Faster_RCNN_TensorFlow-master\fast_rcnn\vggnet.py'''
net = tf.squeeze(net, axis=[1, 2])

**解释：**这个函数的作用是删除tensor中维度为1的维度，类似于扁平化操作。
在这个代码中，经过了最后一层的全卷积之后的tensor的shape为（64，1，1，4096），经过了tf.squeeze之后shape变成了（64,4096），目的是因为在这里的代码是进行全卷积操作，而论文中是进行全连接操作，全连接最后会直接得到（64,4096），但是全卷积得到的却是（64，1，1，4096）。

进我的收藏吃灰吧~~

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
Tensorflow---Faster RCNN网络（细节，代码部分函数）（五）

Tensorflow—Faster RCNN网络（细节，代码部分函数）（五）经过了四个blog的介绍，Faster RCNN网络也基本上讲完了，但是里面还有一些细节没有扣明白，这明显就不符合本人的作风啊~所以特地抽出一篇blog来介绍下内部一些难懂的点。值得注意的，这篇blog的内容大部分是通过在网上自己搜索出来的答案，最后加上自己的理解才写上去的，所以若有错误的地方，还请大佬们指正~一、utils.convert_collection_to_dict'''在原文中的位置：Faster_RCNN_
复制链接

扫一扫