(1) 模型
- 主干网络:base_layers = ResNet50(inputs)
- rpn网络: rpn = get_rpn(base_layers, num_anchors)
- fast rcnn网络: classifier = get_classifier(feature_map_input, roi_input, config.num_rois, nb_classes=num_classes, trainable=True)
inputs:[600*600*3]
base_layers.shape:[38,38,1024]
inputs-->ZeroPadding2D+Conv2D-->300*300*64-->
BatchNormalization + Activation + MaxPooling2D-->150*150*64-->
conv_block+identity_block*2-->150*150*256-->
conv_block+identity_block*3-->75*75*512-->
conv_block+identity_block*5-->38*38*1024
1. input_tensor 进行第一次卷积、批归一化、激活 x = (Conv2D+BatchNormalization+Activation)(input_tensor)
2. 压缩宽高 x = (Conv2D+BatchNormalization+Activation)(x)
3. 恢复通道 x = Conv2D+BatchNormalization(x)
4. Activation(shortcut + input_tensor)
1. input_tensor 进行第一次卷积、批归一化、激活 x = (Conv2D+BatchNormalization+Activation)(input_tensor)
2. 压缩宽高 x = (Conv2D+BatchNormalization+Activation)(x)
3. 恢复通道 x = Conv2D+BatchNormalization(x)
4. input_tensor 进行卷积、批归一化shortcut = (Conv2D+BatchNormalization)(input_tensor)
5. Activation(shortcut + x)
rpn = get_rpn(base_layers, num_anchors)
inputs : base_layers[-1,38,38,1024], num_anchors=9
outputs :x_class, x_regr, base_layers
1. base_layers-->x=Conv2D(512, (3, 3))-->[-1,38,38,512]
2. x-->x_class = Conv2D(num_anchors, (1, 1))-->[-1,38,38,9]
3. x-->x_regr = Conv2D(num_anchors * 4, (1, 1)-->[-1,38,38,36]
4. x_class--> reshape-->[-1,1]
5. x_regr--> reshape-->[-1,4]
classifier = get_classifier(feature_map_input, roi_input, config.num_rois, nb_classes = num_classes, trainable=True)
inputs : feature_map_input[38,38,1024], roi_input = 32
outputs : P_cls[-1,32,cls+1], P_regr[-1,32,cls*4]
1. out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois])
1.1 遍历建议框,计算建议框x,y左上角,wh宽高
1.2 在特征图上截取,并统一到14*14的宽高
1.3 out_roi_pool.shape[1,32,14,14,1024]
2. out = classifier_layers(out_roi_pool, input_shape=input_shape, trainable=True)
2.1 out_roi_pool[1,32,14,14,1024]——>conv_block_td-->[1,32,7,7,2048]
2.2 identity_block_td*2-->[1,32,7,7,2048]
2.3 TimeDistributed(AveragePooling2D))-->[1,32,1,1,2048]
3.out_class = TimeDistributed(Dense(nb_classes, activation='softmax', kernel_initializer='zero'), name='dense_class_{}'.format(nb_classes))(out)
3.1 [1,32,1,1,2048]-->[-1,32,cls+1]
4.out_regr = TimeDistributed(Dense(4 * (nb_classes-1), activation='linear', kernel_initializer='zero'), name='dense_regress_{}'.format(nb_classes))(out)
4.1 [1,32,1,1,2048]-->[-1,32,cls*4]
(2) 预测
2.1 frcnn = FRCNN()
2.2 r_image = frcnn.detect_image(image)
2.2.1 图片预处理,归一化
2.2.2 preds = self.model_rpn.predict(photo)
2.2.3 anchors = get_anchors(self.get_img_output_length(width,height),width,height)
(1) anchors = generate_anchors()
(2) network_anchors = shift(shape,anchors)
(3) 缩放框在0,1之间
2.2.4 rpn_results = self.bbox_util.detection_out(preds,anchors,1,confidence_threshold=0.8)
(1) decode_bbox = self.decode_boxes(mbox_loc[i], mbox_priorbox)
a. 根据先验框的左上角和右下角计算先验框的中心点和宽高
b. 根据rpn网络输出的预测偏移和先验框计算预测框中心点和宽高
c. 预测框中心点和宽高转化为左上角和右下角坐标
d. 防止预测框坐标超出0与1
(2) 对解码后对框处理,取出得分高于confidence_threshold的框
(3) 进行iou的非极大抑制
(4) 把标签、置信度、框取出来
(5) 按照置信度进行排序
(6) 选出置信度最大的keep_top_k个
2.2.5 把解码后的框放缩在feature上,数值大小在0,38之间
2.2.6 分批次把建议框传入classifier,对最后一次建议框填充到32个。
2.2.7 [P_cls, P_regr] = self.model_classifier.predict([base_layer,ROIs])
2.2.8 results = np.array(self.bbox_util.nms_for_out(np.array(labels),np.array(probs),np.array(boxes),self.num_classes-1,0.4))
(1) 遍历每一个类别
(2) 对某一类别的框按照物体概率从大到小排列,取出概率最大框的下标
(3) 计算其它框和第一个框的IoU,保留IoU小于阈值的框
(4) 对剩余的框进行b~c的操作,直到剩余框为0
2.2.9 框映射到原图
2.2.10 画框
2.3 r_image.show()
(3) 训练
3.1 model_rpn, model_classifier,model_all = get_model(config,NUM_CLASSES)
3.1.1 rpn = get_rpn(base_layers, num_anchors)
3.1.2 classifier = get_classifier(base_layers, roi_input, config.num_rois, nb_classes=num_classes, trainable=True)
3.1.3 model_all = Model([inputs, roi_input], rpn[:2] + classifier)
3.2 gen = Generator(bbox_util, lines, NUM_CLASSES, solid=True)
3.3 rpn_train = gen.generate()
3.3.1 打乱数据顺序,把真实框的左上角和右下角坐标转换成中心点和宽高
3.3.2 anchors = get_anchors(get_img_output_length(width,height),width,height)
(1) anchors = generate_anchors()
(2) network_anchors = shift(shape,anchors)
(3) 先验框缩放框在0,1之间
3.3.3 assignment = self.bbox_util.assign_boxes(y,anchors)
(1) ingored_boxes = np.apply_along_axis(self.ignore_box, 1, boxes[:, :4])
(2) encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])
3.3.4 平衡正负样本的数量
3.4 训练参数设置TensorBoard、model_rpn.compile、
3.5 loss1:'regression' : smooth_l1();'classification': cls_loss()
3.5.1 smooth_l1():得到y_true、y_pred、anchor_state
3.5.2 smooth_l1() :找到正样本
3.5.3 smooth_l1() :计算 smooth L1 loss ;f(x) = 0.5 * (sigma * x)^2 if |x| < 1 / sigma / sigma;|x| - 0.5 / sigma / sigma otherwise
3.5.4 cls_loss():得到y_true、y_pred、anchor_state
3.5.5 找出存在目标的先验框,计算正样本的损失
3.5.6 cls_loss():找出实际上为背景的先验框,计算负样本的损失
3.5.7 cls_loss():计算正、负样本的数量,正、负样本的损失除以对应数量
3.5.8 cls_loss():loss = cls_loss_for_object + cls_loss_for_back
3.6 loss2:class_loss_cls;class_loss_regr(NUM_CLASSES-1)
3.6.1 class_loss_cls:K.mean(categorical_crossentropy(y_true[0, :, :], y_pred[0, :, :]))
3.6.2 class_loss_regr(NUM_CLASSES-1): smooth_1
3.7 X, Y, boxes = next(rpn_train)
3.8 loss_rpn = model_rpn.train_on_batch(X,Y)
3.9 P_rpn = model_rpn.predict_on_batch(X)
3.10 anchors = get_anchors(get_img_output_length(width,height),width,height)
3.11 results = bbox_util.detection_out(P_rpn,anchors,1, confidence_threshold=0)
3.12 R = results[0][:, 2:]
3.13 X2, Y1, Y2, IouS = calc_iou(R, config, boxes[0], width, height, NUM_CLASSES)
3.13.1 把真实框和先验框都映射到特征图上
3.13.2 遍历每一个真实框,计算真实框与先验框的IuO,选出大于阈值的框
3.13.3 筛选和真实框匹配的先验框
3.13.4 对不满足IoU阈值的框,label设置为-1;对满足条件的真实框编码
3.13.5 得到类别、框的标记,偏移坐标
3.14 loss_class = model_classifier.train_on_batch([X, X2[:, sel_samples, :]], [Y1[:, sel_samples, :], Y2[:, sel_samples, :]])
(4) 评价