保姆级 Keras 实现 Faster R-CNN 十二
上一篇 文章中我们实现了 RoiPoolingLayer 层, 它将的功能是将不同大小的 ROI 换为固定大小的特征图作为后续步骤的输入. 在其之后, 就是我们比较熟悉的全连接层了, 实现起来也相对容易
一. 定义 Fast R-CNN 网络
在 RPN 网络也有一次分类与回归, 与 Fast R-CNN 不同的地方是 RPN 分类只分背景与目标, 是一个二分类. 回归是修正 anchor box. Fast R-CNN 分类需要区分 RPN 送出来的建议区域确切的目标类别, 是多分类. 回归修正的建议区域矩形, 也就是 Proposal 层输出的矩形
之前我们有讲过, Faster R-CNN = RPN + Fast R-CNN, 前面的文章中已经完成了 RPN, 现在来完成 Faster R-CNN 网络, 定义如下
# Fast R-CNN 网络
# pooled_rois: RoiPooling 输出
# cells: 全连接网络的神经元的数量
# num_classes: 类别数量
def fast_rcnn(pooled_rois, cells, num_classes):
flatten = TimeDistributed(keras.layers.Flatten(), name = "roi_flatten")(pooled_rois)
fc1 = TimeDistributed(keras.layers.Dense(cells, kernel_initializer = "uniform",
activation = "relu"), name = "fc_1")(flatten)
fc2 = TimeDistributed(keras.layers.Dense(cells, kernel_initializer = "uniform",
activation = "relu"), name = "fc_2")(fc1)
# 分类分支
y_cls = TimeDistributed(keras.layers.Dense(num_classes,kernel_initializer = "uniform",
activation = "softmax"), name = "rcnn_cls")(fc2)
# 回归分支
y_reg = TimeDistributed(keras.layers.Dense(4, kernel_initializer = "zero",
activation = "linear"), name ="rcnn_reg")(fc2)
return y_cls, y_reg
上面的定义是很简单, 两个 Dense(全连接) 层, 然后两个分支, 一个用于分类, 一个用于回归. 对照 Faster R-CNN 的网络结构就可以看明白
要讲的是里面有一个神奇的函数 TimeDistributed, 我也不知道怎么翻译好, 就讲一下功能吧. 上一篇 文章最后输出的信息中, outputs.shape == (4, 256, 7, 7, 512), 也就是 RoiPooling 层输出的形状. shape 各个维度表示 [batch_size, num_rois, pool_size_rows, pool_size_cols, feature_channels]. 在上面的定义中, 如果不增加 TimeDistributed 包装一下, 连到全连接层的时候就会有问题, 为什么呢?
假设我们直接将 pooled_rois 作为 keras.layers.Flatten 的输入, Flatten 的作用就是把输入的数据展开打平的意思, 将多维的数据变成 1 维, 常用于卷积层到全连接层的过度. 但是这个操作不会涉及到 batch_size 维度, 所以一个 batch 内, Flatten 会将 [batch_size, num_rois, pool_size_rows, pool_size_cols, feature_channels] 变成 [batch_size, num_rois × pool_size_rows × pool_size_cols × feature_channels] 的二维数据. 以上面的 outputs.shape == (4, 256, 7, 7, 512) 为例. 4 个特征图, 一张图中有 256 个 ROI, 每个 ROI 的形状是 (7, 7, 512), 经过 Flatten 后变成 [4, 256 × 7 × 7 × 512] = [4, 6422528], Flatten 后面是 Dense 全连接层, 假设神经元的个数为 2048, 那 Flatten 与 Dense 层之间的参数个数为
6422528
×
2048
=
13
,
153
,
337
,
344
6422528 × 2048 = 13,153,337,344
6422528×2048=13,153,337,344
用 float32 数据表示会占 49 G 的空间, 所以单一个层就没有办法训练了, 那怎么解决这个问题?
TimeDistributed 函数就派上用场了. 这个函数有个特点, 默认只对第 2 维(时间维度) 感兴趣, 上面 outputs.shape 的第 2 维是什么? 是 num_rois, 是抠出来的 ROI 数量, TimeDistributed 从第 2 维把 pooled_rois 拆开, 可以理解为将第 2 维当成另一种形式的 batch_size. 这样的话, outputs.shape 就从 (4, 256, 7, 7, 512) 变成 (4 × 256, 7, 7, 512), 现在 “batch_size” 为 1024. 那 Flatten 作用在 pooled_rois 上之后, 就变成了 (1024, 25088), 与 Dense 相连的参数个数为
25088
×
2048
=
51
,
380
,
224
25088 × 2048 = 51,380,224
25088×2048=51,380,224
用 float32 数据表示会占 196 M 的空间, 是完全可以处理的
功能讲完了后, 看一下 TimeDistributed 的各个参数, 就会更明白了
函数原型
tf.keras.layers.TimeDistributed(
layer, **kwargs
)
TimeDistributed 是一个包装器, 它将一个层应用于输入张量的时间维度上的每个时间步. 它有以下参数:
- layer: 要应用的层对象. 可以是任何 Keras 层的实例, 例如 Dense、Conv2D 等
- kwargs: 可选的关键字参数, 用于传递给被包装的层, 如上面代码中的 name 参数
使用 TimeDistributed 包装器时, 需要注意以下几点:
- 输入张量的形状应满足 TimeDistributed的 要求, 即至少为 3 维
- 被包装的层将被应用于输入张量的时间维度上的每个时间步
- TimeDistributed 包装器不会改变输入张量的其他维度的形状
- 输出张量的形状将取决于被包装的层和输入张量的形状
二. 定义 Faster R-CNN 模型
上面的 Faste R-CNN 加上前面文章中的各种定义, 现在可以定义一个完整的 Faster R-CNN 模型, 不过这个模型只能做前向计算, 还不能训练
# 组合成 Faster R-CNN 模型
x = keras.layers.Input(shape = (None, None, 3), name = "input")
feature = vgg16_conv(x)
rpn_cls, rpn_reg = rpn(feature)
proposal = ProposalLayer(base_anchors, num_rois = TRAIN_NUM, iou_thres = 0.7,
name = "proposal")([x, rpn_cls, rpn_reg])
pooled_rois = RoiPoolingLayer(name = "roi_pooling")([x, feature, proposal])
y_cls, y_reg = fast_rcnn(pooled_rois, cells = 2048, num_classes = len(CATEGORIES))
faster_rcnn = keras.Model(x, [y_cls, y_reg], name = "faster_rcnn")
faster_rcnn.summary()
Model: "faster_rcnn"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input (InputLayer) (None, None, None, 3 0
__________________________________________________________________________________________________
vgg16_x1_1 (Conv2D) (None, None, None, 6 1792 input[0][0]
__________________________________________________________________________________________________
vgg16_x1_2 (Conv2D) (None, None, None, 6 36928 vgg16_x1_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, None, None, 6 0 vgg16_x1_2[0][0]
__________________________________________________________________________________________________
vgg16_x2_1 (Conv2D) (None, None, None, 1 73856 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
vgg16_x2_2 (Conv2D) (None, None, None, 1 147584 vgg16_x2_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, None, None, 1 0 vgg16_x2_2[0][0]
__________________________________________________________________________________________________
vgg16_x3_1 (Conv2D) (None, None, None, 2 295168 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
vgg16_x3_2 (Conv2D) (None, None, None, 2 590080 vgg16_x3_1[0][0]
__________________________________________________________________________________________________
vgg16_x3_3 (Conv2D) (None, None, None, 2 590080 vgg16_x3_2[0][0]
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D) (None, None, None, 2 0 vgg16_x3_3[0][0]
__________________________________________________________________________________________________
vgg16_x4_1 (Conv2D) (None, None, None, 5 1180160 max_pooling2d_3[0][0]
__________________________________________________________________________________________________
vgg16_x4_2 (Conv2D) (None, None, None, 5 2359808 vgg16_x4_1[0][0]
__________________________________________________________________________________________________
vgg16_x4_3 (Conv2D) (None, None, None, 5 2359808 vgg16_x4_2[0][0]
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D) (None, None, None, 5 0 vgg16_x4_3[0][0]
__________________________________________________________________________________________________
vgg16_x5_1 (Conv2D) (None, None, None, 5 2359808 max_pooling2d_4[0][0]
__________________________________________________________________________________________________
vgg16_x5_2 (Conv2D) (None, None, None, 5 2359808 vgg16_x5_1[0][0]
__________________________________________________________________________________________________
vgg16_x5_3 (Conv2D) (None, None, None, 5 2359808 vgg16_x5_2[0][0]
__________________________________________________________________________________________________
rpn_conv (Conv2D) (None, None, None, 5 2359808 vgg16_x5_3[0][0]
__________________________________________________________________________________________________
rpn_cls (Conv2D) (None, None, None, 9 4617 rpn_conv[0][0]
__________________________________________________________________________________________________
rpn_reg (Conv2D) (None, None, None, 3 18468 rpn_conv[0][0]
__________________________________________________________________________________________________
proposal (ProposalLayer) (None, 256, 4) 0 input[0][0]
rpn_cls[0][0]
rpn_reg[0][0]
__________________________________________________________________________________________________
roi_pooling (RoiPoolingLayer) (None, 256, 7, 7, 51 0 input[0][0]
vgg16_x5_3[0][0]
proposal[0][0]
__________________________________________________________________________________________________
roi_flatten (TimeDistributed) (None, 256, 25088) 0 roi_pooling[0][0]
__________________________________________________________________________________________________
fc_1 (TimeDistributed) (None, 256, 2048) 51382272 roi_flatten[0][0]
__________________________________________________________________________________________________
fc_2 (TimeDistributed) (None, 256, 2048) 4196352 fc_1[0][0]
__________________________________________________________________________________________________
rcnn_cls (TimeDistributed) (None, 256, 21) 43029 fc_2[0][0]
__________________________________________________________________________________________________
rcnn_reg (TimeDistributed) (None, 256, 4) 8196 fc_2[0][0]
==================================================================================================
Total params: 72,727,430
Trainable params: 72,727,430
Non-trainable params: 0
__________________________________________________________________________________________________
可以看到分类部分的输出形状为 (None, 256, 21), 回归部分的输出形状为 (None, 256, 4). None 表示 batch_size, 256 表示从每一张特征图上抠出 256 个 ROI 区域输入到后面分类和回归计算, 21 表示有 21 个类别, 4 表示一个建议框有 4 个修正参数
到这里 Faster R-CNN 模型就完成了, 后面的文章会添加或者修改一些必要的函数, 使模型能够训练起来
三. 不同版本 Keras 可能会遇到的问题
如果 Keras == 2.3.1, 那上面的代码没有问题, 有的同学用的更新版本的 Keras 定义模型时可能会报错如下
The last dimension of the inputs to `Dense` should be defined. Found `None`.
这是什么原因呢? Dense 层输入的 shape 必须指定最后一维. 上面定义模型时 RoiPoolingLayer 连接到 fast_rcnn 中的 Flatten, Flatten 再连接到 Dense. Dense 输入依赖于 ProposalLayer 输出. 为了作一个比较, 用 Keras == 2.6.0 定义的模型输如下
# 组合成 Faster R-CNN 模型
x = keras.layers.Input(shape = (None, None, 3), name = "input")
feature = vgg16_conv(x)
rpn_cls, rpn_reg = rpn(feature)
proposal = ProposalLayer(base_anchors, num_rois = TRAIN_NUM, iou_thres = 0.7,
name = "proposal")([x, rpn_cls, rpn_reg])
pooled_rois = RoiPoolingLayer(name = "roi_pooling")([x, feature, proposal])
# 这里注释掉, 让 pooled_rois 作为模型输出
# y_cls, y_reg = fast_rcnn(pooled_rois, cells = 2048, num_classes = len(CATEGORIES))
faster_rcnn = keras.Model(x, pooled_rois, name = "faster_rcnn")
faster_rcnn.summary()
为了能打印模型结构, 上面的定义中, 直接将 pooled_rois 作为输出层, 结果如下
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input (InputLayer) [(None, None, None, 0
__________________________________________________________________________________________________
vgg16_x1_1 (Conv2D) (None, None, None, 6 1792 input[0][0]
__________________________________________________________________________________________________
vgg16_x1_2 (Conv2D) (None, None, None, 6 36928 vgg16_x1_1[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D) (None, None, None, 6 0 vgg16_x1_2[0][0]
__________________________________________________________________________________________________
vgg16_x2_1 (Conv2D) (None, None, None, 1 73856 max_pooling2d[0][0]
__________________________________________________________________________________________________
vgg16_x2_2 (Conv2D) (None, None, None, 1 147584 vgg16_x2_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, None, None, 1 0 vgg16_x2_2[0][0]
__________________________________________________________________________________________________
vgg16_x3_1 (Conv2D) (None, None, None, 2 295168 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
vgg16_x3_2 (Conv2D) (None, None, None, 2 590080 vgg16_x3_1[0][0]
__________________________________________________________________________________________________
vgg16_x3_3 (Conv2D) (None, None, None, 2 590080 vgg16_x3_2[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, None, None, 2 0 vgg16_x3_3[0][0]
__________________________________________________________________________________________________
vgg16_x4_1 (Conv2D) (None, None, None, 5 1180160 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
vgg16_x4_2 (Conv2D) (None, None, None, 5 2359808 vgg16_x4_1[0][0]
__________________________________________________________________________________________________
vgg16_x4_3 (Conv2D) (None, None, None, 5 2359808 vgg16_x4_2[0][0]
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D) (None, None, None, 5 0 vgg16_x4_3[0][0]
__________________________________________________________________________________________________
vgg16_x5_1 (Conv2D) (None, None, None, 5 2359808 max_pooling2d_3[0][0]
__________________________________________________________________________________________________
vgg16_x5_2 (Conv2D) (None, None, None, 5 2359808 vgg16_x5_1[0][0]
__________________________________________________________________________________________________
vgg16_x5_3 (Conv2D) (None, None, None, 5 2359808 vgg16_x5_2[0][0]
__________________________________________________________________________________________________
rpn_conv (Conv2D) (None, None, None, 5 2359808 vgg16_x5_3[0][0]
__________________________________________________________________________________________________
rpn_cls (Conv2D) (None, None, None, 9 4617 rpn_conv[0][0]
__________________________________________________________________________________________________
rpn_reg (Conv2D) (None, None, None, 3 18468 rpn_conv[0][0]
__________________________________________________________________________________________________
proposal (ProposalLayer) (None, None, 4) 0 input[0][0]
rpn_cls[0][0]
rpn_reg[0][0]
__________________________________________________________________________________________________
roi_pooling (RoiPoolingLayer) (None, None, 7, 7, N 0 input[0][0]
vgg16_x5_3[0][0]
proposal[0][0]
==================================================================================================
Total params: 17,097,581
Trainable params: 17,097,581
Non-trainable params: 0
__________________________________________________________________________________________________
可以看到 ProposalLayer 输出从 (None, 256, 4) 变成了 (None, None, 4), 最后一维 4 还在, 而 RoiPoolingLayer 输出从 (None, 256, 7, 7, 512) 变成了 (None, None, 7, 7, None). 最后一个维度不确定, 导致 Dense 输入最后一维不确定而报错
要怎么解决这个问题呢?
上一篇 文章中 RoiPoolingLayer 定义时, compute_output_shape 计算了输出维度, 但相同的代码 Keras == 2.6.0 时输出最后一维变成了None, 我们需要将这个 None 找回来
RoiPoolingLayer 定义中, build 函数直接使用了父类 build 函数, 我们可以在其中增加一个成员变量 self.feature_channels, 用 build 函数的 input_shape 参数来确认最后一维, Keras == 2.6.0 时 RoiPoolingLayer 修改如下, 修改部分查看注释
# 定义 RoiPooling Layer
class RoiPoolingLayer(Layer):
def __init__(self, pool_size = (7, 7), **kwargs):
self.pool_size = pool_size
super(RoiPoolingLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Keras == 2.6.0 修改
self.feature_channels = input_shape[1][3]
super(RoiPoolingLayer, self).build(input_shape)
def call(self, inputs):
images, features, rois = inputs
image_shape = tf.shape(images)[1: 3]
feature_shape = tf.shape(features)
roi_shape = tf.shape(rois)
batch_size = feature_shape[0]
num_rois = roi_shape[1]
# Keras == 2.6.0 修改
feature_channels = self.feature_channels # feature_shape[3]
y_scale = 1.0 / tf.cast(image_shape[0] - 1, dtype = tf.float32)
x_scale = 1.0 / tf.cast(image_shape[1] - 1, dtype = tf.float32)
y1 = rois[..., 0] * y_scale
x1 = rois[..., 1] * x_scale
y2 = rois[..., 2] * y_scale
x2 = rois[..., 3] * x_scale
rois = tf.stack([y1, x1, y2, x2], axis = -1)
# 为每个 roi 分配对应 feature 的索引序号
indices = tf.range(batch_size, dtype = tf.int32)
indices = tf.repeat(indices, num_rois, axis = -1)
rois = tf.reshape(rois, (-1, roi_shape[-1]))
crops = tf.image.crop_and_resize(image = features,
boxes = rois,
box_indices = indices,
crop_size = self.pool_size,
method = "bilinear")
crops = tf.reshape(crops,
(batch_size, num_rois,
self.pool_size[0], self.pool_size[1], feature_channels))
return crops
def compute_output_shape(self, input_shape):
image_shape, feature_shape, roi_shape = input_shape
batch_size = image_shape[0]
num_rois = roi_shape[1]
feature_channels = feature_shape[3]
return (batch_size, num_rois, self.pool_size[0], self.pool_size[1], feature_channels)
经过上面的修改, 就可以在 Keras == 2.6.0 中正常运行了
四. 代码下载
示例代码可下载 Jupyter Notebook 示例代码
上一篇: 保姆级 Keras 实现 Faster R-CNN 十一
下一篇: 保姆级 Keras 实现 Faster R-CNN 十三 (训练)