faster RCNN(keras版本)代码讲解博客索引:
1.faster RCNN(keras版本)代码讲解(1)-概述
2.faster RCNN(keras版本)代码讲解(2)-数据准备
3.faster RCNN(keras版本)代码讲解(3)-训练流程详情
4.faster RCNN(keras版本)代码讲解(4)-共享卷积层详情
5.faster RCNN(keras版本)代码讲解(5)-RPN层详情
6.faster RCNN(keras版本)代码讲解(6)-ROI Pooling层详情
一.ROI Pooling的原理和作用
将不同大小的特征图,压缩成相同大小。比如有两个特征图(16*16),压缩成相同大小(4,4),也就是说,对于(16*16)的特征图,每(4*4)大小的框就做一个max pooling;目的也是减少原始图片为了裁剪或者压缩成固定大小带来的信息损失;还有将不同大小图片生成的特征图转换成同等大小特征图,便于后面全连接层或者分类器接收。但这个keras 版本的fast rcnn中,作者在backend 为theano中使用ROI Pooling,然后在backend 为tensorflow中使用ROI Align(mask rcnn 代码),比ROI Pooling更好。
class RoiPoolingConv(Layer):
'''ROI pooling layer for 2D inputs.
See Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,
K. He, X. Zhang, S. Ren, J. Sun
# Arguments
pool_size: int
Size of pooling region to use. pool_size = 7 will result in a 7x7 region.
num_rois: number of regions of interest to be used
# Input shape
list of two 4D tensors [X_img,X_roi] with shape:
X_img:
`(1, channels, rows, cols)` if dim_ordering='th'
or 4D tensor with shape:
`(1, rows, cols, channels)` if dim_ordering='tf'.
X_roi:
`(1,num_rois,4)` list of rois, with ordering (x,y,w,h)
# Output shape
3D tensor with shape:
`(1, num_rois, channels, pool_size, pool_size)`
'''
def __init__(self, pool_size, num_rois, **kwargs):
self.dim_ordering = K.image_dim_ordering()
assert self.dim_ordering in {'tf', 'th'}, 'dim_ordering must be in {tf, th}'
self.pool_size = pool_size
self.num_rois = num_rois
super(RoiPoolingConv, self).__init__(**kwargs)
def build(self, input_shape):
if self.dim_ordering == 'th':
self.nb_channels = input_shape[0][1]
elif self.dim_ordering == 'tf':
self.nb_channels = input_shape[0][3]
def compute_output_shape(self, input_shape):
if self.dim_ordering == 'th':
return None, self.num_rois, self.nb_channels, self.pool_size, self.pool_size
else:
return None, self.num_rois, self.pool_size, self.pool_size, self.nb_channels
def call(self, x, mask=None):
assert(len(x) == 2)
#特征图
img = x[0]
#原始图像上框的坐标
rois = x[1]
input_shape = K.shape(img)
outputs = []
for roi_idx in range(self.num_rois):
x = rois[0, roi_idx, 0]
y = rois[0, roi_idx, 1]
w = rois[0, roi_idx, 2]
h = rois[0, roi_idx, 3]
row_length = w / float(self.pool_size)
col_length = h / float(self.pool_size)
num_pool_regions = self.pool_size
#NOTE: the RoiPooling implementation differs between theano and tensorflow due to the lack of a resize op
# in theano. The theano implementation is much less efficient and leads to long compile times
#真的roi pooling
if self.dim_ordering == 'th':
for jy in range(num_pool_regions):
for ix in range(num_pool_regions):
x1 = x + ix * row_length
x2 = x1 + row_length
y1 = y + jy * col_length
y2 = y1 + col_length
x1 = K.cast(x1, 'int32')
x2 = K.cast(x2, 'int32')
y1 = K.cast(y1, 'int32')
y2 = K.cast(y2, 'int32')
x2 = x1 + K.maximum(1,x2-x1)
y2 = y1 + K.maximum(1,y2-y1)
new_shape = [input_shape[0], input_shape[1],
y2 - y1, x2 - x1]
x_crop = img[:, :, y1:y2, x1:x2]
xm = K.reshape(x_crop, new_shape)
pooled_val = K.max(xm, axis=(2, 3))
outputs.append(pooled_val)
elif self.dim_ordering == 'tf':
x = K.cast(x, 'int32')
y = K.cast(y, 'int32')
w = K.cast(w, 'int32')
h = K.cast(h, 'int32')
#使用的是mask rcnn中的ROIAlign,效果更好
rs = tf.image.resize_images(img[:, y:y+h, x:x+w, :], (self.pool_size, self.pool_size))
# print("resize_images",img[:, y:y+h, x:x+w, :].shape)
# print("resize_result",rs.shape)
outputs.append(rs)
final_output = K.concatenate(outputs, axis=0)
final_output = K.reshape(final_output, (1, self.num_rois, self.pool_size, self.pool_size, self.nb_channels))
if self.dim_ordering == 'th':
final_output = K.permute_dimensions(final_output, (0, 1, 4, 2, 3))
else:
final_output = K.permute_dimensions(final_output, (0, 1, 2, 3, 4))
return final_output