R-CNN detection
1.背景
R-CNN是通过Caffe微调模型拟分类区域的检测器,详细内容可以参考项目网址和论文:
Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. CVPR 2014. Arxiv 2013.
2.实现
2.1.Selective Search
Selective Search是R-CNN的区域拟提。selective_search_ijcv_with_pythonPython模块负责通过selective search MATLAB实现来提取拟提。先下载模块放置入caffe的根目录中(否则找不到模块中的函数)并重命名为selective_search_ijcv_with_python
$ cd ~/selectiveselective_search_ijcv_with_python
$ matlab
在MATLAB中运行demo.m编译必要的函数,并添加PYTHONPATH
$ export PYTHONPATH=/path/to/caffe/selectiveselective_search_ijcv_with_python
2.2.下载Caffe R-CNN ImageNet model
$ ./scripts/download_model_binary.py models/bvlc_reference_rcnn_ilsvrc13
2.3.产生h5文件(包含DataFrame , selected windows, and their detection scores)
$ mkdir -p _temp #建立临时文件
$ echo /path/to/caffe/images/fish-bike.jpg > _temp/det_input.txt #图像路径
$ ./python/detect.py --crop_mode=selective_search --pretrained_model=./models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=./models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5
2.4.数据处理
$ jupyter notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_hdf('_temp/det_output.h5', 'df')
print(df.shape)
print(df.iloc[0])
输出:
(1570, 5)
prediction [-2.62247, -2.84579, -2.85122, -3.20838, -1.94...
ymin 79.846
xmin 9.62
ymax 246.31
xmax 339.624
Name: /home/dawin/caffe/examples/images/fish-bike.jpg, dtype: object
注释:1570个区域由selective search的R-CNN配置产生,区域的数量是基于每张图片的内容和大小变化的,因为selective search并不是尺度不变的。
接着,加载ILSVRC13检测类名录和建立一个预测数据表,要使用ILSVRC12辅佐数据,下载ILSVRC12辅佐数据:
$ sudo ./data/ilsvrc12/get_ilsvrc12_aux.sh
with open('./data/ilsvrc12/det_synset_words.txt') as f:
labels_df = pd.DataFrame([
{
'synset_id': l.strip().split(' ')[0],
'name': ' '.join(l.strip().split(' ')[1:]).split(',')[0]
}
for l in f.readlines()
])
labels_df.sort_values('synset_id')
predictions_df = pd.DataFrame(np.vstack(df.prediction.values), columns=labels_df['name'])
print(predictions_df.iloc[0])
输出:
name
accordion -2.622472
airplane -2.845789
ant -2.851220
antelope -3.208377
apple -1.949951
armadillo -2.472936
artichoke -2.201685
axe -2.327404
baby bed -2.737925
backpack -2.176763
bagel -2.681062
balance beam -2.722539
banana -2.390628
band aid -1.598909
banjo -2.298197
baseball -2.311024
basketball -2.733875
bathing cap -2.800533
beaker -2.405005
bear -3.286877
bee -2.990223
bell pepper -2.543806
bench -1.843543
bicycle -2.775216
binder -2.523010
bird -3.543336
bookshelf -2.767997
bow tie -2.532697
bow -2.003349
bowl -2.873867
...
strawberry -2.559138
stretcher -2.048682
sunglasses -2.313832
swimming trunks -2.062061
swine -2.795033
syringe -2.287062
table -2.234039
tape player -3.081044
tennis ball -2.099031
tick -2.808466
tie -2.337841
tiger -2.615820
toaster -2.382439
traffic light -2.645303
train -3.441731
trombone -2.582362
trumpet -2.352854
turtle -2.360860
tv or monitor -2.761044
unicycle -2.218469
vacuum -1.907717
violin -2.757080
volleyball -2.723690
waffle iron -2.418540
washer -2.408994
water bottle -2.174899
watercraft -2.837426
whale -3.120339
wine bottle -2.772961
zebra -2.742914
Name: 0, dtype: float32
查看激活:
plt.gray()
plt.matshow(predictions_df.values)
plt.xlabel('Classes')
plt.ylabel('Windows')
输出:
<matplotlib.text.Text at 0x114f15f90>
<matplotlib.figure.Figure at 0x114254b50>
在所有窗口取最大值和绘制顶级类:
max_s = predictions_df.max(0)
max_s.sort(ascending=False)
print(max_s[:10])
输出:
name
person 1.835771
bicycle 0.866110
unicycle 0.057080
motorcycle -0.006123
banjo -0.028209
turtle -0.189833
electric fan -0.206787
cart -0.214236
lizard -0.393519
helmet -0.477942
dtype: float32
注释:实际上顶部的检测是人和自行车。选取好定位是一项进行中的工作,我们选取顶层得分的人和自行车检测。
# Find, print, and display the top detections: person and bicycle.
i = predictions_df['person'].argmax()
j = predictions_df['bicycle'].argmax()
# Show top predictions for top detection.
f = pd.Series(df['prediction'].iloc[i], index=labels_df['name'])
print('Top detection:')
print(f.order(ascending=False)[:5])
print('')
# Show top predictions for second-best detection.
f = pd.Series(df['prediction'].iloc[j], index=labels_df['name'])
print('Second-best detection:')
print(f.order(ascending=False)[:5])
# Show top detection in red, second-best top detection in blue.
im = plt.imread('examples/images/fish-bike.jpg')
plt.imshow(im)
currentAxis = plt.gca()
det = df.iloc[i]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))
det = df.iloc[j]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))
输出:
Top detection:
name
person 1.835771
swimming trunks -1.150371
rubber eraser -1.231106
turtle -1.266038
plastic bag -1.303266
dtype: float32
Second-best detection:
name
bicycle 0.866110
unicycle -0.359140
scorpion -0.811621
lobster -0.982891
lamp -1.096809
dtype: float32
<matplotlib.patches.Rectangle at 0x7f02e8099d10>
让我们检测所有的 ‘自行车’ 和 用NMS使得他们摆脱重叠窗口:
def nms_detections(dets, overlap=0.3):
"""
Non-maximum suppression: Greedily select high-scoring detections and
skip detections that are significantly covered by a previously
selected detection.
This version is translated from Matlab code by Tomasz Malisiewicz,
who sped up Pedro Felzenszwalb's code.
Parameters
----------
dets: ndarray
each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']
overlap: float
minimum overlap ratio (0.3 default)
Output
------
dets: ndarray
remaining after suppression.
"""
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
ind = np.argsort(dets[:, 4])
w = x2 - x1
h = y2 - y1
area = (w * h).astype(float)
pick = []
while len(ind) > 0:
i = ind[-1]
pick.append(i)
ind = ind[:-1]
xx1 = np.maximum(x1[i], x1[ind])
yy1 = np.maximum(y1[i], y1[ind])
xx2 = np.minimum(x2[i], x2[ind])
yy2 = np.minimum(y2[i], y2[ind])
w = np.maximum(0., xx2 - xx1)
h = np.maximum(0., yy2 - yy1)
wh = w * h
o = wh / (area[i] + area[ind] - wh)
ind = ind[np.nonzero(o <= overlap)[0]]
return dets[pick, :]
scores = predictions_df['bicycle']
windows = df[['xmin', 'ymin', 'xmax', 'ymax']].values
dets = np.hstack((windows, scores[:, np.newaxis]))
nms_dets = nms_detections(dets)
显示顶层 3 个图像中NMS检测到的 ‘自行车’ ,并注意顶部的得分框 (红色) 和其余的框之间的差距。
plt.imshow(im)
currentAxis = plt.gca()
colors = ['r', 'b', 'y']
for c, det in zip(colors, nms_dets[:3]):
currentAxis.add_patch(
plt.Rectangle((det[0], det[1]), det[2]-det[0], det[3]-det[1],
fill=False, edgecolor=c, linewidth=5)
)
print 'scores:', nms_dets[:3, 4]
输出:
scores: [ 0.86610961 -0.70051467 -1.34796405]