1.Faster RCNN的训练过程
4-Step Alternating Training. In this paper, we adopt a pragmatic 4-step training algorithm to learn shared features via alternating optimization. In the first step, we train the RPN as described in Section 3.1.3. This network is initialized with an ImageNet-pre-trained model and fine-tuned end-to-end for the region proposal task. In the second step, we train a separate detection network by Fast R-CNN using the proposals generated by the step-1 RPN. This detection network is also initialized by the ImageNet-pre-trained model. At this point the two networks do not share convolutional layers. In the third step, we use the detector network to initialize RPN training, but we
fix the shared convolutional layers and only fine-tune the layers unique to RPN. Now the two networks share convolutional layers. Finally, keeping the shared convolutional layers fixed, we fine-tune the unique layers of Fast R-CNN. As such, both networks share the same convolutional layers and form a unified network. A similar alternating training can be run for more iterations, but we have observed negligible improvements.
Faster R-CNN的训练,是在已经训练好的model的基础上继续进行训练。实际中训练过程分为4个步骤:
- 在已经训练好的model上,训练RPN网络,对应stage1_rpn_train.pt
利用步骤1中训练好的RPN网络,收集proposals,对应rpn_test.pt - 第一次训练Fast RCNN网络,对应stage1_fast_rcnn_train.pt
- 第二训练RPN网络,对应stage2_rpn_train.pt
再次利用步骤3中训练好的RPN网络,收集proposals,对应rpn_test.pt - 第二次训练Fast RCNN网络,对应stage2_fast_rcnn_train.pt
上面的四个步骤每个步骤都对应着一个.pt文件,每个pt文件都有所有的卷积层。这里应用一张别人画的流程图:
还有一种是end-to-end训练,网络只有一种,写下train.prototxt里面(caffe)
2.Faster RCNN的网络搭建
来看一下pytorch中faster rcnn的网络结构是怎么搭建的。
下面这段代码用来初始化网络结构,在trainval_net.py里面.
# initilize the network here.
if args.net == 'vgg16':
fasterRCNN = vgg16(imdb.classes, pretrained=True, class_agnostic=args.class_agnostic)
elif args.net == 'res101':
fasterRCNN = resnet(imdb.classes, 101, pretrained=True, class_agnostic=args.class_agnostic)
elif args.net == 'res50':
fasterRCNN = resnet(imdb.classes, 50, pretrained=True, class_agnostic=args.class_agnostic)
elif args.net == 'res152':
fasterRCNN = resnet(imdb.classes, 152, pretrained=True, class_agnostic=args.class_agnostic)
else:
print("network is not defined")
pdb.set_trace()
fasterRCNN.create_architecture()
以vgg为例,首先初始化vgg16
class vgg16(_fasterRCNN):
def __init__(self, classes, pretrained=False, class_agnostic=False):
self.model_path = 'data/pretrained_model/vgg16_caffe.pth'
self.dout_base_model = 512
self.pretrained = pretrained
self.class_agnostic = class_agnostic
_fasterRCNN.__init__(self, classes, class_agnostic)
vgg16是从fasterRCNN类中继承的,同时调用了父类的构造函数
class _fasterRCNN(nn.Module):
""" faster RCNN """
def __init__(self, classes, class_agnostic):
super(_fasterRCNN, self).__init__()
self.classes = classes
self.n_classes = len(classes)
self.class_agnostic = class_agnostic
# loss
self.RCNN_loss_cls = 0
self.RCNN_loss_bbox = 0
# define rpn
self.RCNN_rpn = _RPN(self.dout_base_model)
self.RCNN_proposal_target = _ProposalTargetLayer(self.n_classes)
self.RCNN_roi_pool = _RoIPooling(cfg.POOLING_SIZE, cfg.POOLING_SIZE, 1.0/16.0)
self.RCNN_roi_align = RoIAlignAvg(cfg.POOLING_SIZE, cfg.POOLING_SIZE, 1.0/16.0)
self.grid_size = cfg.POOLING_SIZE * 2 if cfg.CROP_RESIZE_WITH_MAX_POOL else cfg.POOLING_SIZE
self.RCNN_roi_crop = _RoICrop()
父类的构造函数定义了rpn,ROIPooling层。
之后执行fasterRCNN.create_architecture(),这个函数在faster_rcnn.py中
def create_architecture(self):
self._init_modules()
self._init_weights()
而_init_modules在vgg16.py里面这段代码定义了前面的卷积层和后面的全连接层,也就是下图的1,2(3是rpn,在之前定义过)
def _init_modules(self):
vgg = models.vgg16()
if self.pretrained:
print("Loading pretrained weights from %s" %(self.model_path))
state_dict = torch.load(self.model_path)
vgg.load_state_dict({k:v for k,v in state_dict.items() if k in vgg.state_dict()})
vgg.classifier = nn.Sequential(*list(vgg.classifier._modules.values())[:-1])
# not using the last maxpool layer
self.RCNN_base = nn.Sequential(*list(vgg.features._modules.values())[:-1])
# Fix the layers before conv3:
for layer in range(10):
for p in self.RCNN_base[layer].parameters(): p.requires_grad = False
# self.RCNN_base = _RCNN_base(vgg.features, self.classes, self.dout_base_model)
self.RCNN_top = vgg.classifier
# not using the last maxpool layer
self.RCNN_cls_score = nn.Linear(4096, self.n_classes)
if self.class_agnostic:
self.RCNN_bbox_pred = nn.Linear(4096, 4)
else:
self.RCNN_bbox_pred = nn.Linear(4096, 4 * self.n_classes)
随后在定义一下学习率、优化方法、是否使用之前的节点、是否用多GPU、是否用tensorboard等等,然后就可以开始训练了。
for epoch in range(args.start_epoch, args.max_epochs + 1):
# setting to train mode
fasterRCNN.train()
loss_temp = 0
start = time.time()
if epoch % (args.lr_decay_step + 1) == 0:
adjust_learning_rate(optimizer, args.lr_decay_gamma)
lr *= args.lr_decay_gamma
data_iter = iter(dataloader)
for step in range(iters_per_epoch):
data = next(data_iter)
im_data.data.resize_(data[0].size()).copy_(data[0])
im_info.data.resize_(data[1].size()).copy_(data[1])
gt_boxes.data.resize_(data[2].size()).copy_(data[2])
num_boxes.data.resize_(data[3].size()).copy_(data[3])
fasterRCNN.zero_grad()
rois, cls_prob, bbox_pred, \
rpn_loss_cls, rpn_loss_box, \
RCNN_loss_cls, RCNN_loss_bbox, \
rois_label = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)
loss = rpn_loss_cls.mean() + rpn_loss_box.mean() \
+ RCNN_loss_cls.mean() + RCNN_loss_bbox.mean()
loss_temp += loss.item()
# backward
optimizer.zero_grad()
loss.backward()
if args.net == "vgg16":
clip_gradient(fasterRCNN, 10.)
optimizer.step()
每一个epoch计算4个loss,然后用loss.backward反向传播,optimizer进行优化。和之前写过的mnist的训练很相似。
# begin to training
for epoch in range(EPOCH):
for step, (b_x, b_y) in enumerate(train_loder):
output = net(b_x)[0]
# print(output)
loss = loss_func(output, b_y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
但是论文里写faster RCNN的训练分成了四步,这里好像是参考的py-faster-rcnn中end-to-end网络结构。