保姆级 Keras 实现 Faster R-CNN 十二

Mr-MegRob

已于 2024-05-12 19:38:31 修改

阅读量822

点赞数 2

于 2023-09-04 20:28:05 首次发布

本文链接：https://blog.csdn.net/yx123919804/article/details/132641935

版权

Keras 同时被 2 个专栏收录

22 篇文章

订阅专栏

Faster R-CNN

15 篇文章

订阅专栏

保姆级 Keras 实现 Faster R-CNN 十二

一. 定义 Fast R-CNN 网络
二. 定义 Faster R-CNN 模型
三. 不同版本 Keras 可能会遇到的问题
四. 代码下载

上一篇文章中我们实现了 RoiPoolingLayer 层, 它将的功能是将不同大小的 ROI 换为固定大小的特征图作为后续步骤的输入. 在其之后, 就是我们比较熟悉的全连接层了, 实现起来也相对容易

一. 定义 Fast R-CNN 网络

在 RPN 网络也有一次分类与回归, 与 Fast R-CNN 不同的地方是 RPN 分类只分背景与目标, 是一个二分类. 回归是修正 anchor box. Fast R-CNN 分类需要区分 RPN 送出来的建议区域确切的目标类别, 是多分类. 回归修正的建议区域矩形, 也就是 Proposal 层输出的矩形

之前我们有讲过, Faster R-CNN = RPN + Fast R-CNN, 前面的文章中已经完成了 RPN, 现在来完成 Faster R-CNN 网络, 定义如下

# Fast R-CNN 网络
# pooled_rois: RoiPooling 输出
# cells: 全连接网络的神经元的数量
# num_classes: 类别数量
def fast_rcnn(pooled_rois, cells, num_classes):
    flatten = TimeDistributed(keras.layers.Flatten(), name = "roi_flatten")(pooled_rois)
    
    fc1 = TimeDistributed(keras.layers.Dense(cells, kernel_initializer = "uniform",
                                             activation = "relu"), name = "fc_1")(flatten)
    
    fc2 = TimeDistributed(keras.layers.Dense(cells, kernel_initializer = "uniform",
                                             activation = "relu"), name = "fc_2")(fc1)

    # 分类分支
    y_cls = TimeDistributed(keras.layers.Dense(num_classes,kernel_initializer = "uniform",
                                               activation = "softmax"), name = "rcnn_cls")(fc2)
    # 回归分支
    y_reg = TimeDistributed(keras.layers.Dense(4, kernel_initializer = "zero",
                                               activation = "linear"), name ="rcnn_reg")(fc2)
    
    return y_cls, y_reg

上面的定义是很简单, 两个 Dense(全连接) 层, 然后两个分支, 一个用于分类, 一个用于回归. 对照 Faster R-CNN 的网络结构就可以看明白

要讲的是里面有一个神奇的函数 TimeDistributed, 我也不知道怎么翻译好, 就讲一下功能吧. 上一篇文章最后输出的信息中, outputs.shape == (4, 256, 7, 7, 512), 也就是 RoiPooling 层输出的形状. shape 各个维度表示 [batch_size, num_rois, pool_size_rows, pool_size_cols, feature_channels]. 在上面的定义中, 如果不增加 TimeDistributed 包装一下, 连到全连接层的时候就会有问题, 为什么呢?

假设我们直接将 pooled_rois 作为 keras.layers.Flatten 的输入, Flatten 的作用就是把输入的数据展开打平的意思, 将多维的数据变成 1 维, 常用于卷积层到全连接层的过度. 但是这个操作不会涉及到 batch_size 维度, 所以一个 batch 内, Flatten 会将 [batch_size, num_rois, pool_size_rows, pool_size_cols, feature_channels] 变成 [batch_size, num_rois × pool_size_rows × pool_size_cols × feature_channels] 的二维数据. 以上面的 outputs.shape == (4, 256, 7, 7, 512) 为例. 4 个特征图, 一张图中有 256 个 ROI, 每个 ROI 的形状是 (7, 7, 512), 经过 Flatten 后变成 [4, 256 × 7 × 7 × 512] = [4, 6422528], Flatten 后面是 Dense 全连接层, 假设神经元的个数为 2048, 那 Flatten 与 Dense 层之间的参数个数为
$6422528 \times 2048 = 13, 153, 337, 344$
用 float32 数据表示会占 49 G 的空间, 所以单一个层就没有办法训练了, 那怎么解决这个问题?

TimeDistributed 函数就派上用场了. 这个函数有个特点, 默认只对第 2 维(时间维度) 感兴趣, 上面 outputs.shape 的第 2 维是什么? 是 num_rois, 是抠出来的 ROI 数量, TimeDistributed 从第 2 维把 pooled_rois 拆开, 可以理解为将第 2 维当成另一种形式的 batch_size. 这样的话, outputs.shape 就从 (4, 256, 7, 7, 512) 变成 (4 × 256, 7, 7, 512), 现在 “batch_size” 为 1024. 那 Flatten 作用在 pooled_rois 上之后, 就变成了 (1024, 25088), 与 Dense 相连的参数个数为
$25088 \times 2048 = 51, 380, 224$
用 float32 数据表示会占 196 M 的空间, 是完全可以处理的

功能讲完了后, 看一下 TimeDistributed 的各个参数, 就会更明白了

函数原型

tf.keras.layers.TimeDistributed(
    layer, **kwargs
)

TimeDistributed 是一个包装器, 它将一个层应用于输入张量的时间维度上的每个时间步. 它有以下参数:

layer: 要应用的层对象. 可以是任何 Keras 层的实例, 例如 Dense、Conv2D 等
kwargs: 可选的关键字参数, 用于传递给被包装的层, 如上面代码中的 name 参数

使用 TimeDistributed 包装器时, 需要注意以下几点:

输入张量的形状应满足 TimeDistributed的要求, 即至少为 3 维
被包装的层将被应用于输入张量的时间维度上的每个时间步
TimeDistributed 包装器不会改变输入张量的其他维度的形状
输出张量的形状将取决于被包装的层和输入张量的形状

二. 定义 Faster R-CNN 模型

上面的 Faste R-CNN 加上前面文章中的各种定义, 现在可以定义一个完整的 Faster R-CNN 模型, 不过这个模型只能做前向计算, 还不能训练

# 组合成 Faster R-CNN 模型
x = keras.layers.Input(shape = (None, None, 3), name = "input")

feature = vgg16_conv(x)
rpn_cls, rpn_reg = rpn(feature)

proposal = ProposalLayer(base_anchors, num_rois = TRAIN_NUM, iou_thres = 0.7,
                         name = "proposal")([x, rpn_cls, rpn_reg])

pooled_rois = RoiPoolingLayer(name = "roi_pooling")([x, feature, proposal])
y_cls, y_reg = fast_rcnn(pooled_rois, cells = 2048, num_classes = len(CATEGORIES))

faster_rcnn = keras.Model(x, [y_cls, y_reg], name = "faster_rcnn")
faster_rcnn.summary()

Model: "faster_rcnn"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input (InputLayer)              (None, None, None, 3 0                                            
__________________________________________________________________________________________________
vgg16_x1_1 (Conv2D)             (None, None, None, 6 1792        input[0][0]                      
__________________________________________________________________________________________________
vgg16_x1_2 (Conv2D)             (None, None, None, 6 36928       vgg16_x1_1[0][0]                 
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, None, None, 6 0           vgg16_x1_2[0][0]                 
__________________________________________________________________________________________________
vgg16_x2_1 (Conv2D)             (None, None, None, 1 73856       max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
vgg16_x2_2 (Conv2D)             (None, None, None, 1 147584      vgg16_x2_1[0][0]                 
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, None, None, 1 0           vgg16_x2_2[0][0]                 
__________________________________________________________________________________________________
vgg16_x3_1 (Conv2D)             (None, None, None, 2 295168      max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
vgg16_x3_2 (Conv2D)             (None, None, None, 2 590080      vgg16_x3_1[0][0]                 
__________________________________________________________________________________________________
vgg16_x3_3 (Conv2D)             (None, None, None, 2 590080      vgg16_x3_2[0][0]                 
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, None, None, 2 0           vgg16_x3_3[0][0]                 
__________________________________________________________________________________________________
vgg16_x4_1 (Conv2D)             (None, None, None, 5 1180160     max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
vgg16_x4_2 (Conv2D)             (None, None, None, 5 2359808     vgg16_x4_1[0][0]                 
__________________________________________________________________________________________________
vgg16_x4_3 (Conv2D)             (None, None, None, 5 2359808     vgg16_x4_2[0][0]                 
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, None, None, 5 0           vgg16_x4_3[0][0]                 
__________________________________________________________________________________________________
vgg16_x5_1 (Conv2D)             (None, None, None, 5 2359808     max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
vgg16_x5_2 (Conv2D)             (None, None, None, 5 2359808     vgg16_x5_1[0][0]                 
__________________________________________________________________________________________________
vgg16_x5_3 (Conv2D)             (None, None, None, 5 2359808     vgg16_x5_2[0][0]                 
__________________________________________________________________________________________________
rpn_conv (Conv2D)               (None, None, None, 5 2359808     vgg16_x5_3[0][0]                 
__________________________________________________________________________________________________
rpn_cls (Conv2D)                (None, None, None, 9 4617        rpn_conv[0][0]                   
__________________________________________________________________________________________________
rpn_reg (Conv2D)                (None, None, None, 3 18468       rpn_conv[0][0]                   
__________________________________________________________________________________________________
proposal (ProposalLayer)        (None, 256, 4)       0           input[0][0]                      
                                                                 rpn_cls[0][0]                    
                                                                 rpn_reg[0][0]                    
__________________________________________________________________________________________________
roi_pooling (RoiPoolingLayer)   (None, 256, 7, 7, 51 0           input[0][0]                      
                                                                 vgg16_x5_3[0][0]                 
                                                                 proposal[0][0]                   
__________________________________________________________________________________________________
roi_flatten (TimeDistributed)   (None, 256, 25088)   0           roi_pooling[0][0]                
__________________________________________________________________________________________________
fc_1 (TimeDistributed)          (None, 256, 2048)    51382272    roi_flatten[0][0]                
__________________________________________________________________________________________________
fc_2 (TimeDistributed)          (None, 256, 2048)    4196352     fc_1[0][0]                       
__________________________________________________________________________________________________
rcnn_cls (TimeDistributed)      (None, 256, 21)      43029       fc_2[0][0]                       
__________________________________________________________________________________________________
rcnn_reg (TimeDistributed)      (None, 256, 4)       8196        fc_2[0][0]                       
==================================================================================================
Total params: 72,727,430
Trainable params: 72,727,430
Non-trainable params: 0
__________________________________________________________________________________________________

可以看到分类部分的输出形状为 (None, 256, 21), 回归部分的输出形状为 (None, 256, 4). None 表示 batch_size, 256 表示从每一张特征图上抠出 256 个 ROI 区域输入到后面分类和回归计算, 21 表示有 21 个类别, 4 表示一个建议框有 4 个修正参数

到这里 Faster R-CNN 模型就完成了, 后面的文章会添加或者修改一些必要的函数, 使模型能够训练起来

三. 不同版本 Keras 可能会遇到的问题

如果 Keras == 2.3.1, 那上面的代码没有问题, 有的同学用的更新版本的 Keras 定义模型时可能会报错如下

The last dimension of the inputs to `Dense` should be defined. Found `None`.

这是什么原因呢? Dense 层输入的 shape 必须指定最后一维. 上面定义模型时 RoiPoolingLayer 连接到 fast_rcnn 中的 Flatten, Flatten 再连接到 Dense. Dense 输入依赖于 ProposalLayer 输出. 为了作一个比较, 用 Keras == 2.6.0 定义的模型输如下

# 组合成 Faster R-CNN 模型
x = keras.layers.Input(shape = (None, None, 3), name = "input")

feature = vgg16_conv(x)
rpn_cls, rpn_reg = rpn(feature)

proposal = ProposalLayer(base_anchors, num_rois = TRAIN_NUM, iou_thres = 0.7,
                         name = "proposal")([x, rpn_cls, rpn_reg])

pooled_rois = RoiPoolingLayer(name = "roi_pooling")([x, feature, proposal])
# 这里注释掉, 让 pooled_rois 作为模型输出
# y_cls, y_reg = fast_rcnn(pooled_rois, cells = 2048, num_classes = len(CATEGORIES))

faster_rcnn = keras.Model(x, pooled_rois, name = "faster_rcnn")
faster_rcnn.summary()

为了能打印模型结构, 上面的定义中, 直接将 pooled_rois 作为输出层, 结果如下

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input (InputLayer)              [(None, None, None,  0                                            
__________________________________________________________________________________________________
vgg16_x1_1 (Conv2D)             (None, None, None, 6 1792        input[0][0]                      
__________________________________________________________________________________________________
vgg16_x1_2 (Conv2D)             (None, None, None, 6 36928       vgg16_x1_1[0][0]                 
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, None, None, 6 0           vgg16_x1_2[0][0]                 
__________________________________________________________________________________________________
vgg16_x2_1 (Conv2D)             (None, None, None, 1 73856       max_pooling2d[0][0]              
__________________________________________________________________________________________________
vgg16_x2_2 (Conv2D)             (None, None, None, 1 147584      vgg16_x2_1[0][0]                 
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, None, None, 1 0           vgg16_x2_2[0][0]                 
__________________________________________________________________________________________________
vgg16_x3_1 (Conv2D)             (None, None, None, 2 295168      max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
vgg16_x3_2 (Conv2D)             (None, None, None, 2 590080      vgg16_x3_1[0][0]                 
__________________________________________________________________________________________________
vgg16_x3_3 (Conv2D)             (None, None, None, 2 590080      vgg16_x3_2[0][0]                 
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, None, None, 2 0           vgg16_x3_3[0][0]                 
__________________________________________________________________________________________________
vgg16_x4_1 (Conv2D)             (None, None, None, 5 1180160     max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
vgg16_x4_2 (Conv2D)             (None, None, None, 5 2359808     vgg16_x4_1[0][0]                 
__________________________________________________________________________________________________
vgg16_x4_3 (Conv2D)             (None, None, None, 5 2359808     vgg16_x4_2[0][0]                 
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, None, None, 5 0           vgg16_x4_3[0][0]                 
__________________________________________________________________________________________________
vgg16_x5_1 (Conv2D)             (None, None, None, 5 2359808     max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
vgg16_x5_2 (Conv2D)             (None, None, None, 5 2359808     vgg16_x5_1[0][0]                 
__________________________________________________________________________________________________
vgg16_x5_3 (Conv2D)             (None, None, None, 5 2359808     vgg16_x5_2[0][0]                 
__________________________________________________________________________________________________
rpn_conv (Conv2D)               (None, None, None, 5 2359808     vgg16_x5_3[0][0]                 
__________________________________________________________________________________________________
rpn_cls (Conv2D)                (None, None, None, 9 4617        rpn_conv[0][0]                   
__________________________________________________________________________________________________
rpn_reg (Conv2D)                (None, None, None, 3 18468       rpn_conv[0][0]                   
__________________________________________________________________________________________________
proposal (ProposalLayer)        (None, None, 4)      0           input[0][0]                      
                                                                 rpn_cls[0][0]                    
                                                                 rpn_reg[0][0]                    
__________________________________________________________________________________________________
roi_pooling (RoiPoolingLayer)   (None, None, 7, 7, N 0           input[0][0]                      
                                                                 vgg16_x5_3[0][0]                 
                                                                 proposal[0][0]                   
==================================================================================================
Total params: 17,097,581
Trainable params: 17,097,581
Non-trainable params: 0
__________________________________________________________________________________________________

可以看到 ProposalLayer 输出从 (None, 256, 4) 变成了 (None, None, 4), 最后一维 4 还在, 而 RoiPoolingLayer 输出从 (None, 256, 7, 7, 512) 变成了 (None, None, 7, 7, None). 最后一个维度不确定, 导致 Dense 输入最后一维不确定而报错

要怎么解决这个问题呢?

上一篇文章中 RoiPoolingLayer 定义时, compute_output_shape 计算了输出维度, 但相同的代码 Keras == 2.6.0 时输出最后一维变成了None, 我们需要将这个 None 找回来

RoiPoolingLayer 定义中, build 函数直接使用了父类 build 函数, 我们可以在其中增加一个成员变量 self.feature_channels, 用 build 函数的 input_shape 参数来确认最后一维, Keras == 2.6.0 时 RoiPoolingLayer 修改如下, 修改部分查看注释

# 定义 RoiPooling Layer
class RoiPoolingLayer(Layer):
    def __init__(self, pool_size = (7, 7), **kwargs):
        self.pool_size = pool_size
        super(RoiPoolingLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        # Keras == 2.6.0 修改
        self.feature_channels = input_shape[1][3]
        super(RoiPoolingLayer, self).build(input_shape)

    def call(self, inputs):
        images, features, rois = inputs
        image_shape = tf.shape(images)[1: 3]
        feature_shape = tf.shape(features)
        roi_shape = tf.shape(rois)
        
        batch_size = feature_shape[0]
        num_rois = roi_shape[1]
        # Keras == 2.6.0 修改
        feature_channels = self.feature_channels # feature_shape[3]
        
        y_scale = 1.0 / tf.cast(image_shape[0] - 1, dtype = tf.float32)
        x_scale = 1.0 / tf.cast(image_shape[1] - 1, dtype = tf.float32)
        
        y1 = rois[..., 0] * y_scale
        x1 = rois[..., 1] * x_scale
        y2 = rois[..., 2] * y_scale
        x2 = rois[..., 3] * x_scale
        
        rois = tf.stack([y1, x1, y2, x2], axis = -1)
        
        # 为每个 roi 分配对应 feature 的索引序号
        indices = tf.range(batch_size, dtype = tf.int32)
        indices = tf.repeat(indices, num_rois, axis = -1)
        
        rois = tf.reshape(rois, (-1, roi_shape[-1]))

        crops = tf.image.crop_and_resize(image = features,
                                         boxes = rois,
                                         box_indices = indices,
                                         crop_size = self.pool_size,
                                         method = "bilinear")
        
        crops = tf.reshape(crops,
                           (batch_size, num_rois,
                            self.pool_size[0], self.pool_size[1], feature_channels))
        
        return crops
    
    def compute_output_shape(self, input_shape):
        image_shape, feature_shape, roi_shape = input_shape
        batch_size = image_shape[0]
        num_rois = roi_shape[1]
        feature_channels = feature_shape[3]
        
        return (batch_size, num_rois, self.pool_size[0], self.pool_size[1], feature_channels)