rfcn 共享_Xingyu Chen

RFCN

20170411

Results

a position-sensitive score maps -- To incorporate translation variance into FCN

a position-sensitive RoI pooling layer -- shepherds information from these score maps

two-stage object detection stra -- (i) region proposal, and (ii) region classification

Region Proposal Network (RPN) [18] -- a fully convolutional architecture in itself

Others' Note

介绍

位置敏感度图”的方法,将FCN 网络引入到目标检测中来

101层的ResNet, mAP: 83.6%, 170ms/f

想法来源

检测思路

2. 共享计算的全卷积子网络 (每个子网络与 RoI 无关)[“fully convolutional” subnetwork independent of RoIs]

3. 不共享计算的作用于各自 RoI 的子网络[RoI-wise subnetwork]

全连接卷积(fully convolutional)的设计网络: ResNets,GoogLeNets

为给 FCN 引入平移变化, 作者在ResNet的检测流程中加入RoI pooling层到卷积之间, 使得其网络可以达到对于区域特定的目标,来完成作者认为的目标检测所需要不同于分类的translation-invariant

出发点:能不找到一种方法可以最大限度的使用共享网络参数,也就是上层的,在RPN网络之后我是不是可以直接得到分类的结果。

网络结构

RPN+detection两个部分, RPN 给出感兴趣区域,R-FCN 对该感兴趣区域分类

对于每一个类别,一共是有9(3×3)个feature map, 这是分别编号1-9。

每一个RoI对应一块的feature map 中的区域,根据前面的k=3,可以将这块区域平均分成9份,标记为上左,上中,上右……在形成后面的score map 的时候,分别取编号1的feature map的上左,编号2的feature map 的上中……分别进行pooling(跨通道的池化),文中使用的是avgpooling,然后形成3×3=9的score map,然后对这9个数,进行投票,作者使用average为投票分值,来判断这个RoI的位置是不是恰到好处。

rc:每一个分类的分数, C+1类;

z: 对应分类的得分;

x0,y0: Proposal的对应位置;

n: Proposal的像素点个数。

$r_c(i, j|\theta)$: 第C类第i, j个bin的池化响应; i, j -- bin的位置, $0<=i, j<=k-1$; $\theta$: 参数

分类:对该 RoI 每类的所有相对空间位置的分数平均池化 (或投票)。

定位

$k^2(C+1)$维的卷积层后,增加1个$4k^2$维的卷积层来回归边界框。每个 RoI 产生的$4k^2$维向量经平均投票后,用faster R-CNN 的参数化得到1个4维向量$(t_x,t_y,t_w,t_h)$。

5. 训练

每个 RoI 的损失函数为交叉熵损失与边界框回归损失的和。$c^*=0$说明 RoI 的真实标签为背景

Reference

[6, 7] -- RCNN

[18] -- Faster RCNN, Region Proposal Network(RPN)

[9] -- ResNet

[24, 26] -- GoogLeNet

[15] -- FCN

[17, 14] -- no region method for detection

py-R-Fcn 阅读并训练underwater dataset

train

time pythonc2 ./tools/train_net.py --gpu ${GPU_ID} \

--solver models/${PT_DIR}/${NET}/rfcn_end2end/solver_ohem.prototxt \

--weights data/imagenet_models/${NET}-model.caffemodel \

--imdb ${TRAIN_IMDB} \

--iters ${ITERS} \

--cfg experiments/cfgs/rfcn_end2end_ohem.yml \

${EXTRA_ARGS}

数据准备

修改:增加数据的描述文件 lib/datasets/underwater.py

修改: lib/datasets/factory.py

for year in ['20170522']:

for split in ['train', 'val', 'trainval', 'test']:

name = 'underwater_{}_{}'.format(year, split)

__sets[name] = (lambda split=split, year=year: underwater(split, year))

阅读:生成imdb

# lib/datasets/factory.py

def get_imdb(name):

"""Get an imdb (image database) by name."""

if not __sets.has_key(name):

raise KeyError('Unknown dataset: {}'.format(name))

return __sets[name]()

_sets是一个字典,字典的key是数据集的名称,字典的value是一个lambda表达式(即一个函数指针),执行的实际上是pascal_voc函数,参数是split 和 year(ps:在train_vpn函数中,name是voc_2007_trainval,所以这里对应的split和year分别是trainval和2007);pascal_voc是一个类,这是调用了该类的构造函数,返回的也是该类的一个实例,所以这下我们清楚了imdb实际上就是pascal_voc的一个实例;

tools/train_net.py

阅读:最关键的是获取数据imdb,roidb

def get_roidb(imdb_name):

imdb = get_imdb(imdb_name)

print 'Loaded dataset `{:s}` for training'.format(imdb.name)

imdb.set_proposal_method(cfg.TRAIN.PROPOSAL_METHOD)

print 'Set proposal method: {:s}'.format(cfg.TRAIN.PROPOSAL_METHOD)

roidb = get_training_roidb(imdb)

return roidb

# 上面调用的函数,为imdb添加了roidb的数据,我们看看如何添加的,见下面这个函数:

def set_proposal_method(self, method):

method = eval('self.' + method + '_roidb')

self.roidb_handler = method

# 这里method传入的是一个str:gt,所以method=eval('self.gt_roidb')

def get_training_roidb(imdb):

"""Returns a roidb (Region of Interest database) for use in training."""

if cfg.TRAIN.USE_FLIPPED:

print 'Appending horizontally-flipped training examples...'

imdb.append_flipped_images()

print 'done'

print 'Preparing training data...'

rdl_roidb.prepare_roidb(imdb)

print 'done'

# get_training_roidb(imdb)将roidb中的元素由5011个,通过水平对称变成10022个;将index这个list的元素相应的也翻一番

def append_flipped_images(self):

num_images = self.num_images

widths = self._get_widths()

for i in xrange(num_images):

boxes = self.roidb[i]['boxes'].copy()

oldx1 = boxes[:, 0].copy()

oldx2 = boxes[:, 2].copy()

boxes[:, 0] = widths[i] - oldx2 - 1

boxes[:, 2] = widths[i] - oldx1 - 1 # 新框的xmin和xmax都要更新

assert (boxes[:, 2] >= boxes[:, 0]).all()

entry = {'boxes' : boxes,

'gt_overlaps' : self.roidb[i]['gt_overlaps'],

'gt_classes' : self.roidb[i]['gt_classes'],

'flipped' : True}

self.roidb.append(entry) # 把这个新的框添加到roidb中

self._image_index = self._image_index * 2#将索引的list 复制拼接

# 首先我们看看append_flipped_images函数:可以发现,roidb是imdb的一个成员变量,roidb是一个list(每个元素对应一张图片),list中的元素是一个字典,字典中存放了5个key,分别是boxes信息,每个box的class信息,是否是flipped的标志位,重叠信息gt_overlaps,以及seg_areas;分析该函数可知,将box的值按照水平对称,原先roidb中只有5011个元素,经过水平对称后通过append增加到5011*2=10022个;

def prepare_roidb(imdb):

"""Enrich the imdb's roidb by adding some derived quantities that

are useful for training. This function precomputes the maximum

overlap, taken over ground-truth boxes, between each ROI and

each ground-truth box. The class with maximum overlap is also

recorded.

"""

sizes = [PIL.Image.open(imdb.image_path_at(i)).size

for i in xrange(imdb.num_images)]

roidb = imdb.roidb

for i in xrange(len(imdb.image_index)):

roidb[i]['image'] = imdb.image_path_at(i)

roidb[i]['width'] = sizes[i][0]

roidb[i]['height'] = sizes[i][1]

# need gt_overlaps as a dense array for argmax

gt_overlaps = roidb[i]['gt_overlaps'].toarray()

# max overlap with gt over classes (columns)

max_overlaps = gt_overlaps.max(axis=1)

# gt class that had the max overlap

max_classes = gt_overlaps.argmax(axis=1)

roidb[i]['max_classes'] = max_classes

roidb[i]['max_overlaps'] = max_overlaps

# sanity checks

# max overlap of 0 => class should be zero (background)

zero_inds = np.where(max_overlaps == 0)[0]

assert all(max_classes[zero_inds] == 0)

# max overlap > 0 => class should not be zero (must be a fg class)

nonzero_inds = np.where(max_overlaps > 0)[0]

assert all(max_classes[nonzero_inds] != 0)

上面只是准备好了roidb的相关信息而已,真正的数据处理操作是在

class RoIDataLayer(caffe.Layer) 类的def forward(self, bottom, top)函数中开始的,这个类在faster-rcnn-root/lib/roi_data_layer/layer.py文件中

blobs = self._get_next_minibatch()这句话产生了我们需要的数据blobs;这个函数又调用了minibatch.py文件中的def get_minibatch(roidb, num_classes)函数;

然后又调用了def _get_image_blob(roidb, scale_inds)函数;在这个函数中,我们终于发现了cv2.imread函数,也就是最终的读取图片到内存的地方

def _get_image_blob(roidb, scale_inds):

"""Builds an input blob from the images in the roidb at the specified

scales.

"""

num_images = len(roidb)

processed_ims = []

im_scales = []

for i in xrange(num_images):

im = cv2.imread(roidb[i]['image']) #终于在这里读取图片了

if roidb[i]['flipped']:

im = im[:, ::-1, :]

target_size = cfg.TRAIN.SCALES[scale_inds[i]]

im, im_scale = prep_im_for_blob(im, cfg.PIXEL_MEANS, target_size,

cfg.TRAIN.MAX_SIZE)

im_scales.append(im_scale)

processed_ims.append(im)

# Create a blob to hold the input images

blob = im_list_to_blob(processed_ims)

return blob, im_scales

lib/fast_rcnn/train.py

def train_net(solver_prototxt, roidb, output_dir,

pretrained_model=None, max_iters=40000):

"""Train a Fast R-CNN network."""

roidb = filter_roidb(roidb) #过滤无效(无bbox)的roidb

sw = SolverWrapper(solver_prototxt, roidb, output_dir,

pretrained_model=pretrained_model)

# 创建solver

print 'Solving...'

model_paths = sw.train_model(max_iters) #开始训练

print 'done solving'

return model_paths

注意:

annotation文件中的difficult还是Diffcult

训练时assert(boxes[:,2]>=boxes[:,0]).all()的问题

(1). 问题分析: bbox坐标(x,y)可能为0,或标定区域溢出图片, 而faster rcnn会对Xmin,Ymin,Xmax,Ymax进行减一操作,如果Xmin为0,减一后变为65535

(2). 问题解决

a. 修改lib/datasets/imdb.py,append_flipped_images()函数

数据整理,在一行代码为 boxes[:, 2] = widths[i] - oldx1 - 1下加入代码:

for b in range(len(boxes)):

if boxes[b][2]< boxes[b][0]:

boxes[b][0] = 0

b. 修改lib/datasets/pascal_voc.py,_load_pascal_annotation(,)函数

将对Xmin,Ymin,Xmax,Ymax减一去掉,变为:

c. 可选,如果a和b可以解决问题,就没必要用c修改lib/fast_rcnn/config.py,不使图片实现翻转,如下改为:

# Use horizontally-flipped images during training?

__C.TRAIN.USE_FLIPPED = False

网络结构

修改:创建model/underwater/ResNet-50(拷贝pascal_voc)

修改:ResNet-50/class-aware/train_ohem.prototxt

layer {

name: 'input-data'

...

param_str: "'num_classes': 5"

layer {

name: 'roi-data'

...

param_str: "'num_classes': 5"

layer {

bottom: "conv_new_1"

top: "rfcn_cls"

name: "rfcn_cls"

num_output: 245 #5*(7^2) cls_num*(score_maps_size^2)

layer {

bottom: "conv_new_1"

top: "rfcn_bbox"

name: "rfcn_bbox"

num_output: 980 #4*5*(7^2) 4*cls_num*(score_maps_size^2)

layer {

bottom: "rfcn_cls"

bottom: "rois"

output_dim: 5

layer {

bottom: "rfcn_bbox"

bottom: "rois"

output_dim: 20 #4*cls_num

修改:ResNet-50/class-aware/test.prototxt

layer {

bottom: "conv_new_1"

top: "rfcn_cls"

name: "rfcn_cls"

num_output: 245 #5*(7^2) cls_num*(score_maps_size^2)

layer {

bottom: "conv_new_1"

top: "rfcn_bbox"

name: "rfcn_bbox"

num_output: 980 #4*5*(7^2) 4*cls_num*(score_maps_size^2)

layer {

bottom: "rfcn_cls"

bottom: "rois"

output_dim: 5

layer {

bottom: "rfcn_bbox"

bottom: "rois"

output_dim: 20 #4*cls_num

layer {

name: "cls_prob_reshape"

dim: 5 #cls_num

layer {

name: "bbox_pred_reshape"

dim: 20

修改 ResNet-50/train_agnostic.prototxt

layer {

name: 'input-data'

param_str: "'num_classes': 5" #cls_num

layer {

bottom: "conv_new_1"

top: "rfcn_cls"

name: "rfcn_cls"

type: "Convolution"

convolution_param {

num_output: 245 #cls_num*(score_maps_size^2)

layer {

bottom: "rfcn_cls"

bottom: "rois"

output_dim: 5 #cls_num

修改 ResNet-50/train_agnostic_ohem.prototxt

同上

修改 ResNet-50/test_agnostic.prototxt

layer {

bottom: "conv_new_1"

top: "rfcn_cls"

num_output: 245 #cls_num*(score_maps_size^2) ###

layer {

bottom: "rfcn_cls"

bottom: "rois"

output_dim: 5 #cls_num

layer {

name: "cls_prob_reshape"

dim: 5 #cls_num

修改 ResNet-50/solver_ohem.prototxt && ResNet-50/solver.prototxt

train_net: "models/underwater/ResNet-50/rfcn_end2end/train_agnostic[_ohem].prototxt"

配置文件

修改:lib/fast_rcnn/config.py, 并修改

# Model directory

__C.MODELS_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'models', 'underwater'))

测试

将训练得到的模型(RFCN_ROOT/output/rfcn_end2end_ohem/voc_0712_trainval里最后的caffemodel)拷贝到RFCN_ROOT/data/rfcn_models下,然后打开RFCN_ROOT/tools/demo_rfcn.py,将CLASSES修改成你的标签,NETS修改成你的model,im_names修改成你的测试图片(放在data/demo下),最后:

cd $RFCN_ROOT

./tools/demo_rfcn.py --net ResNet-50

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值