Graph-FPN代码解读（4）

最新推荐文章于 2024-06-17 16:53:25 发布

Re-赟

最新推荐文章于 2024-06-17 16:53:25 发布

阅读量416

点赞数 1

文章标签：深度学习 tensorflow 人工智能

本文链接：https://blog.csdn.net/weixin_45935290/article/details/129812215

版权

所有的模型和损失函数都加载好之后，所要做的便是将数据集加载下来。

在这里插入图片描述
采用的是tfds.load函数进行COCO数据集的下载，tf.load函数相当于同时执行了下面三个函数：

builder = tfds.builder(name, data_dir=data_dir, **builder_kwargs)
builder.download_and_prepare(**download_and_prepare_kwargs)
ds = builder.as_dataset(
    split=split,
    as_supervised=as_supervised,
    shuffle_files=shuffle_files,
    read_config=read_config,
    decoders=decoders,
    **as_dataset_kwargs,
)

参数name表示数据集的名字
参数split表示数据集的分隔方式
参数data_dir表示读写数据集的目录
参数batch_size指定数据集中每一条数据的大小，默认为1。
参数shuffle_files表示是否打乱输入的数据
参数download表示是否将数据集下载到本地，默认为True，如果设置为False，相当于在三个步骤中少了builder.download_and_prepare()这一步。
参数as_supervised为True表示数据集的每一条数据保存为监督学习的方式–2元组（input, label)，如果为False表示数据集的每一条数据保存为字典类型{feature1:input, feature:label}。

返回tf.data.Dataset类型的数据集，我们可以查看一下这些下载的数据集。

它们都是tfrecord格式的，TFRecord 是Google官方推荐的一种数据格式，是Google专门为TensorFlow设计的一种数据格式。tf.Example是TFRecord的基本结果，其实他就是一个Protobuffer定义的message，表示一组string到bytes value的映射。可以看到tfrecord里存储的真正数据类型有三种：

bytes_list: 可以存储string 和byte两种数据类型。
float_list: 可以存储float(float32)与double(float64) 两种数据类型 。
int64_list: 可以存储：bool, enum, int32, uint32, int64, uint64 。

使用代码查看一下下载好的数据集文件，首先是这些文件的类型

在这里插入图片描述
紧接着查看每一个元素的输出，可以看到tfrecord文件中每一个元素的组成结构，由8个部分组成：‘objects/id’, ‘image’, ‘objects/area’, ‘image/filename’, ‘objects/label’, ‘image/id’, ‘objects/is_crowd’, ‘objects/bbox’
在这里插入图片描述
使用debug调试可以看到train_dataset的格式是如下：

具体来看，以其中一张图片为例

'image': <tf.Tensor: shape=(462, 640, 3), dtype=uint8, numpy=
array([[[  5,  11,  11],
        [ 15,  15,  13],
        [ 17,  20,  25],
        ...,
        [ 82,  81,  76],
        [ 88,  86,  91],
        [ 86,  85,  83]]], dtype=uint8
'image/filename': <tf.Tensor: shape=(), dtype=string, numpy=b'000000460139.jpg'>
'image/id': <tf.Tensor: shape=(), dtype=int64, numpy=460139>
'objects': {
		'area': <tf.Tensor: shape=(3,), dtype=int64, numpy=array([17821, 16942,  4344])>
		'bbox': <tf.Tensor: shape=(3, 4), dtype=float32, numpy=array([[0.54380953, 0.13464062, 0.98651516, 0.33742186],[0.50707793, 0.517875  , 0.8044805 , 0.891125  ],[0.3264935 , 0.36971876, 0.65203464, 0.4431875 ]],dtype=float32)>
		'id': <tf.Tensor: shape=(3,), dtype=int64, numpy=array([152282, 155195, 185150])>
		'is_crowd': <tf.Tensor: shape=(3,), dtype=bool, numpy=array([False, False, False])>
		'label': <tf.Tensor: shape=(3,), dtype=int64, numpy=array([3, 3, 0])>
		}}

之后将train和val数据集进行预处理，送入pipeline函数中

在这里插入图片描述
查看pipeline函数的内容：

在这里插入图片描述
首先是第一行的dataset.map，它将每一个数据传入preprocess_data这个函数中进行处理。
查看preprocess_data的内容：
查看传入的sampl类型和格式，可以看到这是一个字典类型

{'image': <tf.Tensor 'args_0:0' shape=(None, None, 3) dtype=uint8>, 
'image/filename': <tf.Tensor 'args_1:0' shape=() dtype=string>,
'image/id': <tf.Tensor 'args_2:0' shape=() dtype=int64>, 
'objects': {'area': <tf.Tensor 'args_3:0' shape=(None,) dtype=int64>, 
				'bbox': <tf.Tensor 'args_4:0' shape=(None, 4) dtype=float32>, 
				'id': <tf.Tensor 'args_5:0' shape=(None,) dtype=int64>, 
				'is_crowd': <tf.Tensor 'args_6:0' shape=(None,) dtype=bool>, 
				'label': <tf.Tensor 'args_7:0' shape=(None,) dtype=int64>}}

<class 'dict'>

下一步取出了sample[“image”]这个字段的值，这是一个Tensor

Tensor("args_0:0", shape=(None, None, 3), dtype=uint8)`

swap_xy是用来交换x，y坐标的顺序。其中tf.stack()是一个矩阵拼接函数，axis=1表示在第二维的数据进行拼接。下面是交换前后的坐标对比：

在这里插入图片描述

sample[“objects”][“label”]是Tensor，代表类别标签值

Tensor("args_7:0", shape=(None,), dtype=int64)

在这里插入图片描述

random_flip_horizontal是随机将图片进行翻转，其中tf.image.flip_left_right输出沿宽度维度翻转的image 的内容。

resize_and_pad_image是将图片进行大小的缩放变为224的大小，之后同样的将bbox的值也进行缩放使得box框符合变换后的尺寸

Tensor("resize/Squeeze:0", shape=(224, 224, 3), dtype=float32)
(224, 224, 3)

缩放bbox的代码
bbox = tf.stack(
        [
            bbox[:, 0] * image_shape[1],
            bbox[:, 1] * image_shape[0],
            bbox[:, 2] * image_shape[1],
            bbox[:, 3] * image_shape[0],
        ],
        axis=-1,
    )

在这里插入图片描述
convert_to_xywh是将box的格式变为[ center, width and height ] 的格式，原本的格式是[xmin, ymin, xmax, ymax]

在这里插入图片描述
回到pipeline函数的主体，此时的dataset变为了3个Tensor，分别对应image, bbox, class_id的返回值

具体查看一个例子

<tf.Tensor: shape=(224, 224, 3), dtype=float32, numpy=array([[[162.04018  , 160.8616   , 149.4308   ],
        [132.71652  , 128.625    , 121.32367  ],
        [119.41742  , 119.2567   , 107.41072  ],
        ...,
        [150.94882  , 142.01132  , 131.48007  ],
        [165.14725  , 157.14725  , 146.14725  ],
        [156.79443  , 150.26318  , 139.32568  ]]], dtype=float32)>
<tf.Tensor: shape=(3, 4), dtype=float32, numpy=array([[171.129   , 171.39636 ,  45.423004,  99.16606 ],
       [ 66.19199 , 146.89455 ,  83.608   ,  66.61817 ],
       [132.9545  , 109.59515 ,  16.456978,  72.92121 ]], dtype=float32)>
 <tf.Tensor: shape=(3,), dtype=int32, numpy=array([3, 3, 0], dtype=int32)>

第二行的dataset.shuffle是将数据进行打乱。此时batch=1
第三行的padded_batch是非常见的一个操作，比如对一个变长序列，通过padding操作将每个序列补成一样的长度，此时每一个Tensor的维度都增加了1，也就是说为了后面对于batch操作方便这里增加了一个维度用来表示batch，1的意思就是只有1个batch。

在这里插入图片描述

第四行，又进行了一个dataset.map的操作，这次查看的是LabelEncoder().encode_batch函数，其目的是为一个batch制作box and classification的target

首先是LabelEncoder这个类的初始化函数，创建了一个_anchor_box和_box_variance

在这里插入图片描述

anchor_box: 锚框生成器用于对边界框进行编码
box_variance: 用于缩放边界框目标的缩放因子

来看看AnchorBox()这个类的定义

在这里插入图片描述

aspect_ratios: 一个包含浮点数的列表，表示特征图上每个位置锚框的长宽比
scales: 一个包含浮点数的列表，表示特征图上每个位置锚框的缩放比例。
num_anchors:特征图上每个位置的锚框数量
areas:一个包含浮点数的列表，表示特征金字塔中每个特征图的锚框面积。
strides: 一个包含浮点数的列表，表示特征金字塔中每个特征图的步幅。
anchor_dims:计算特征金字塔上所有比例和缩放比例的锚框尺寸。

在这里插入图片描述
重点查看一下最后得到的anchor_dims

在这里插入图片描述

接着查看encode_batch函数

在这里插入图片描述
batch_images的值就是那一张图片，因为我们batch设置的是1

Tensor("args_0:0", shape=(1, 224, 224, 3), dtype=float32)

而label经过处理后得到分类和回归任务的标签，而labels是TensorArray

label  Tensor("while/concat_9:0", shape=(None, 5), dtype=float32)
labels  <tensorflow.python.ops.tensor_array_ops.TensorArray object at 0x7f3a9c513a90>

可以看到主要的处理函数在self._encode_sample中，现在我们来查看这个函数

在这里插入图片描述
anchor_boxes = self._anchor_box.get_anchors(image_shape[1], image_shape[2])，寻找get_anchors这个函数的定义，其中 image_shape[1]=224，image_shape[2]=224

    def get_anchors(self, image_height, image_width):
        anchors = [
            self._get_anchors(
                tf.math.ceil(image_height / 2 ** i),
                tf.math.ceil(image_width / 2 ** i),
                i,
            )
            for i in range(3, 8)
        ]
        return tf.concat(anchors, axis=0)

    def _get_anchors(self, feature_height, feature_width, level):
        rx = tf.range(feature_width, dtype=tf.float32) + 0.5
        ry = tf.range(feature_height, dtype=tf.float32) + 0.5
        centers = tf.stack(tf.meshgrid(rx, ry), axis=-1) * self._strides[level - 3]
        centers = tf.expand_dims(centers, axis=-2)
        centers = tf.tile(centers, [1, 1, self._num_anchors, 1])
        dims = tf.tile(
            self._anchor_dims[level - 3], [feature_height, feature_width, 1, 1]
        )
        anchors = tf.concat([centers, dims], axis=-1)
        return tf.reshape(
            anchors, [feature_height * feature_width * self._num_anchors, 4]
        )

因为 for i in range(3, 8)是从3-8进行了循环，所以选其中一个来看。

在这里插入图片描述
这是i=3的情况，此时 feature_height 和 feature_width 的值为28。center和dim分别代表产生anchor的中心和维度，大小为28 * 28 * 9 * 2。feature_height 所有的值为28，14，7，3.5，1.75，0.875

centers = tf.stack(tf.meshgrid(rx, ry), axis=-1) * self._strides[level - 3]
tf.meshgrid 是 TensorFlow 的一个函数，用于在 N 维空间中创建一个网格，生成一个包含 N 个张量的元组。tf.stack 是 TensorFlow 的一个函数，用于在一个新的维度上堆叠多个张量。
e.g: 因为第一个循环中strides值为8，所以可以看到第一个数值是0.5 * 8 =4

在这里插入图片描述
centers = tf.expand_dims(centers, axis=-2)扩展了维度

centers = tf.tile(centers, [1, 1, self._num_anchors, 1])
tf.tile 是 TensorFlow 库中的一个函数，它允许你沿着指定的维度复制一个张量

dims = tf.tile( self._anchor_dims[level - 3], [feature_height, feature_width, 1, 1] )
第一个参数是需要复制的张量，第二个参数是一个列表，表示在每个维度上要复制的次数
在这里插入图片描述
anchors = tf.concat([centers, dims], axis=-1)

最后返回时进行了feature_height * feature_width * self._num_anchors的reshape操作
综上一个循环结束，anchors应该是所有预测框，center是预测框的中心坐标而dim是预测框的维度（长宽）

所有循环走完，得到了总的anchor，这样的所有anchor是指在5个不同特征维度下所有产生的anchor

在这里插入图片描述
最后将所有anchor总和到一起得到 9441 = 7056+1764+441+144+36

在这里插入图片描述

继续看，cls_ids = tf.cast(cls_ids, dtype=tf.float32)用来得到分类的标签

在这里插入图片描述

matched_gt_idx, positive_mask, ignore_mask = self._match_anchor_boxes(anchor_boxes, gt_boxes) 基于IOU将真实框与先验框进行匹配

1. 计算M个anchor_boxes和N个gt_boxes之间的IOU配对，得到一个大小为(M, N)的矩阵。
2. 对于每一行中IOU最大的真实框，如果IOU大于match_iou，则将其分配给对应的先验框。
3. 如果一行中最大的IOU小于ignore_iou，则将该先验框分配为背景类别。
4. 在训练过程中，没有被分配类别的其余先验框将被忽略。

参数：
anchor_boxes: 一个浮点数张量，形状为(total_anchors, 4)，表示给定输入图像形状的所有先验框，其中每个先验框的格式为[x，y，width，height]。
gt_boxes: 一个浮点数张量，形状为(num_objects, 4)，表示真实框，其中每个框的格式为[x，y，width，height]。
match_iou: 一个浮点数值，表示确定是否可以将真实框分配给先验框的最小IOU阈值。
ignore_iou: 一个浮点数值，表示在其以下的IOU阈值下，将先验框分配给背景类别。

返回值：
matched_gt_idx: 匹配的目标对象的索引
positive_mask: 已分配真实框的先验框的掩码。
ignore_mask: 在训练过程中需要忽略的先验框的掩码。

iou_matrix = compute_iou(anchor_boxes, gt_boxes)计算所有预选框和groundtruth的IOU值
在这里插入图片描述
max_iou： tf.reduce_max 是 TensorFlow 中的一个函数，用于沿着指定的轴计算张量中的元素最大值。

matched_gt_idx = tf.argmax(iou_matrix, axis=1) tf.argmax 是 TensorFlow 中的一个函数，用于在一个张量的指定轴上找到最大值的索引
在这里插入图片描述
tf.greater_equal 是 TensorFlow 中的一个函数，用于比较两个张量的元素是否满足大于等于关系，并返回一个新的布尔型张量
positive_mask
negative_mask
ignore_mask

matched_gt_boxes = tf.gather(gt_boxes, matched_gt_idx)
tf.gather是TensorFlow中的一个函数，用于在张量中根据索引收集元素。大小为9441，代表每一个预选框所iou最大的groundtruth
在这里插入图片描述
box_target = self._compute_box_target(anchor_boxes, matched_gt_boxes)

这个操作通常出现在目标检测中，是计算锚框（anchor boxes）与真实框（ground truth boxes）之间的偏移量和缩放量，以便用于训练目标检测模型。
具体来说，这个操作分为两个部分：

第一部分：计算偏移量
(matched_gt_boxes[:, :2] - anchor_boxes[:, :2]) / anchor_boxes[:, 2:]
这个部分计算锚框中心点（x_a, y_a）与真实框中心点（x, y）之间的相对偏移量，同时将其除以锚框的宽度（w_a）和高度（h_a）。
这样就得到了一个二维向量，表示真实框相对于锚框中心的偏移量（相对位置），并且这个向量的值域在[-1, 1]之间。

第二部分：计算缩放量
tf.math.log(matched_gt_boxes[:, 2:] / anchor_boxes[:, 2:])
这个部分计算真实框的宽度（w）和高度（h）相对于锚框的宽度和高度的比例的对数值。
这个比例可以看作是真实框相对于锚框的缩放量（相对大小），并且这个缩放量可以是正值或负值。

通过这个操作得到了

在这里插入图片描述
总结一下

首先将groundtruth得到的所有box与预先设定好的anchor组成一个大矩阵，矩阵的值是，实例中是9441*18。沿着9441每一行得到Iou的最大值，max_iou是最大的值而matched_gt_idx是最大值的索引。matched_gt_boxes根据索引得到最大iou的groundtruth值，大小为9441 * 4，之后将anchor_boxes和matched_gt_boxes进行_compute_box_target，将每一个特征的原本box值和最大iou的grountruth的box值计算偏差值和缩放值。

p.s 这里可能有一个疑问，groundtruth的anchor是224尺度下的值那我们怎么知道在28 * 28特征值尺度下anchor的box坐标呢?
个人理解是我们使用224尺度下groundtruth的anchor计算了与28尺度下预选anchor的IOU，选出了最大IOU的anchor进行偏差和缩放的计算，这样的结果就是28尺度下groundtruth的值

查看所有参数的值

在这里插入图片描述
经过这一步的处理，dataset变为只有2个参数

在这里插入图片描述
查看一个具体的例子

<tf.Tensor: shape=(1, 224, 224, 3), dtype=float32, numpy=array([[[[ -89.10864  , -100.12945  , -107.383125 ],
         [ -76.200165 ,  -92.90846  ,  -99.85857  ],
         [  94.87351  ,   94.049126 ,   94.54545  ],
         ...,
         [ -30.704102 ,  -41.575592 ,  -48.056915 ],
         [ -28.014801 ,  -40.7499   ,  -46.019203 ],
         [ -24.400917 ,  -40.67811  ,  -45.65058  ]]]], dtype=float32)>,
<tf.Tensor: shape=(1, 9441, 5), dtype=float32, numpy=array([[[ 21.598133  ,  36.98972   ,   3.4842806 ,   3.9224317 ,
          -1.        ],
        [ 17.14245   ,  29.35876   ,   2.329036  ,   2.7671864 ,
          -1.        ],
        [ 13.605972  ,  23.302063  ,   1.1737903 ,   1.611941  ,
          -1.        ],
        ...,
        [ -0.4722148 ,  -1.2458739 , -10.793796  ,  -8.463868  ,
          -1.        ],
        [ -0.3747971 ,  -0.98885083, -11.94904   ,  -9.619114  ,
          -1.        ],
        [ -0.29747665,  -0.7848514 , -13.104286  , -10.774359  ,
          -1.        ]]], dtype=float32)>)

之后的 dataset = dataset.apply(tf.data.experimental.ignore_errors())是从另一个 Dataset 创建一个 Dataset 并默默地忽略任何错误。

至此数据的预处理过程基本结束。

在这里插入图片描述

回到主函数train.py，此时还有最后两段代码。第一个是tf.keras.callbacks.ModelCheckpoint，其以某种频率保存 Keras 模型或模型权重的回调

tf.keras.callbacks.ModelCheckpoint(
        filepath=os.path.join(config.root_dir, config.weight, "weights" + "_epoch_{epoch}"),
        monitor='val_loss',
        save_best_only=True,
        save_weights_only=True,
        verbose=1,
        save_freq="epoch"
        )
    ]

第二段是 model.fit用于执行训练过程

    hist = model.fit(
        train_dataset,
        #validation_data=val_dataset.take(50),
        #train_dataset,
        validation_data=val_dataset,
        epochs=config.num_epochs,
        callbacks=callbacks_list,
        verbose=1,
        batch_size = config.batch_size
    )
model.fit( 训练集的输入特征，
                 训练集的标签，  
                 batch_size,  #每一个batch的大小
                 epochs,   #迭代次数
                 validation_data = (测试集的输入特征，测试集的标签），
                 validation_split = 从测试集中划分多少比例给训练集，
                 validation_freq = 测试的epoch间隔数）

Re-赟

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Graph-FPN代码解读（4）

matched_gt_boxes根据索引得到最大iou的groundtruth值，大小为9441 * 4，之后将anchor_boxes和matched_gt_boxes进行_compute_box_target，将每一个特征的原本box值和最大iou的grountruth的box值计算偏差值和缩放值。swap_xy是用来交换x，y坐标的顺序。resize_and_pad_image是将图片进行大小的缩放变为224的大小，之后同样的将bbox的值也进行缩放使得box框符合变换后的尺寸。
复制链接

扫一扫