一、前言
1. 滑动窗口检测器
1.1 滑动窗口介绍
一种用于目标检测的暴力方法:从左到右,从上到下滑动窗口(窗口大小、滑动步长预先设定),利用分类识别目标。
得到窗口内的图片送入分类器,但是很多分类器只取固定大小的图像,所以这些图像需要经过一定的变形转换。但是,这不影响分类的准确率,因为分类器是可以处理变形后的图像。
将图像变形(warped)转换成固定大小:
变形图像块被输入CNN分类器中,提取4096个特征,使用SVM分类器识别类别和该边界框的另一个线性回归器
下面是伪代码,我们创建很多窗口来检测不同位置的不同目标。要提升性能,一个显而易见的办法就是减少窗口的数量。
for window in windows
patch = get_patch(image, window)
results = detector(patch)
后期如果有时间,真代码会整理后上传,期待…
2. 选择性搜索
2.1 什么是选择性搜索
不使用暴力方法,而是用候选区域方法(region proposal method)创建目标检测的感兴趣区域(ROI)。
概括为:从图片中找出物体可能存在的区域。
2.2 选择性搜索的思想(策略)
- 使用一种图像分割手段,将图像分割成小区域 (1k~2k 个);
- 查看现有小区域,按照合并规则合并可能性最高的相邻两个区域。重复直到整张图像合并成一个区域位置;
- 输出所有曾经存在过的区域,所谓候选区域;
2.3 选择性搜索的特点
- 能够适应不同尺度(Capture All Scales):
传统的穷举搜索(Exhaustive- Selective)通过改变窗口大小来适应物体的不同尺度。选择搜索(Selective-Search)同样无法避免这个问题,算法采用图像分割(Image Segmentation)以及使用一种层次算法(Hierarchical Algorithm)有效地解决了这个问题; - 多样化(Diversification):
无法从单一特征来定位物体,因此使用颜色、纹理、大小等多种策略对分割好的区域进行合并; - 速度快
3. 滑动窗口 VS 选择性搜索
- 滑动窗口:
复杂度太高,产生了很多的冗余候选区域,而且由于不可能每个尺度都兼顾到,因此得到的目标位置也不可能那么准,在现实当中不可行; - 选择性搜索:
选择性搜索有效地去除冗余候选区域,使得计算量大大的减小;
二、python代码实现
selectivesearch.py
# -*- coding: utf-8 -*-
import skimage.io
import skimage.feature
import skimage.color
import skimage.transform
import skimage.util
import skimage.segmentation
import numpy
# "Selective Search for Object Recognition" by J.R.R. Uijlings et al.
#
# - Modified version with LBP extractor for texture vectorization
def _generate_segments(im_orig, scale, sigma, min_size):
"""
segment smallest regions by the algorithm of Felzenswalb and
Huttenlocher
"""
# open the Image
im_mask = skimage.segmentation.felzenszwalb(
skimage.util.img_as_float(im_orig), scale=scale, sigma=sigma,
min_size=min_size)
# merge mask channel to the image as a 4th channel
im_orig = numpy.append(
im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2)
im_orig[:, :, 3] = im_mask
return im_orig
def _sim_colour(r1, r2):
"""
calculate the sum of histogram intersection of colour
"""
return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])])
def _sim_texture(r1, r2):
"""
calculate the sum of histogram intersection of texture
"""
return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])])
def _sim_size(r1, r2, imsize):
"""
calculate the size similarity over the image
"""
return 1.0 - (r1["size"] + r2["size"]) / imsize
def _sim_fill(r1, r2, imsize):
"""
calculate the fill similarity over the image
"""
bbsize = (
(max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"]))
* (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))
)
return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize
def _calc_sim(r1, r2, imsize):
return (_sim_colour(r1, r2) + _sim_texture(r1, r2)
+ _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize))
def _calc_colour_hist(img):
"""
calculate colour histogram for each region
the size of output histogram will be BINS * COLOUR_CHANNELS(3)
number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf]
extract HSV
"""
BINS = 25
hist = numpy.array([])
for colour_channel in (0, 1, 2):
# extracting one colour channel
c = img[:, colour_channel]
# calculate histogram for each colour and join to the result
hist = numpy.concatenate(
[hist] + [numpy.histogram(c, BINS, (0.0, 255.0))[0]])
# L1 normalize
hist = hist / len(img)
return hist
def _calc_texture_gradient(img):
"""
calculate texture gradient for entire image
The original SelectiveSearch algorithm proposed Gaussian derivative
for 8 orientations, but we use LBP instead.
output will be [height(*)][width(*)]
"""
ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2]))
for colour_channel in (0, 1, 2):
ret[:, :, colour_channel] = skimage.feature.local_binary_pattern(
img[:, :, colour_channel], 8, 1.0)
return ret
def _calc_texture_hist(img):
"""
calculate texture histogram for each region
calculate the histogram of gradient for each colours
the size of output histogram will be
BINS * ORIENTATIONS * COLOUR_CHANNELS(3)
"""
BINS = 10
hist = numpy.array([])
for colour_channel in (0, 1, 2):
# mask by the colour channel
fd = img[:, colour_channel]
# calculate histogram for each orientation and concatenate them all
# and join to the result
hist = numpy.concatenate(
[hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]])
# L1 Normalize
hist = hist / len(img)
return hist
def _extract_regions(img):
R = {}
# get hsv image
hsv = skimage.color.rgb2hsv(img[:, :, :3])
# pass 1: count pixel positions
for y, i in enumerate(img):
for x, (r, g, b, l) in enumerate(i):
# initialize a new region
if l not in R:
R[l] = {
"min_x": 0xffff, "min_y": 0xffff,
"max_x": 0, "max_y": 0, "labels": [l]}
# bounding box
if R[l]["min_x"] > x:
R[l]["min_x"] = x
if R[l]["min_y"] > y:
R[l]["min_y"] = y
if R[l]["max_x"] < x:
R[l]["max_x"] = x
if R[l]["max_y"] < y:
R[l]["max_y"] = y
# pass 2: calculate texture gradient
tex_grad = _calc_texture_gradient(img)
# pass 3: calculate colour histogram of each region
for k, v in R.items():
# colour histogram
masked_pixels = hsv[:, :, :][img[:, :, 3] == k]
R[k]["size"] = len(masked_pixels / 4)
R[k]["hist_c"] = _calc_colour_hist(masked_pixels)
# texture histogram
R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k])
return R
def _extract_neighbours(regions):
def intersect(a, b):
if (a["min_x"] < b["min_x"] < a["max_x"]
and a["min_y"] < b["min_y"] < a["max_y"]) or (
a["min_x"] < b["max_x"] < a["max_x"]
and a["min_y"] < b["max_y"] < a["max_y"]) or (
a["min_x"] < b["min_x"] < a["max_x"]
and a["min_y"] < b["max_y"] < a["max_y"]) or (
a["min_x"] < b["max_x"] < a["max_x"]
and a["min_y"] < b["min_y"] < a["max_y"]):
return True
return False
R = regions.items()
r = [elm for elm in R]
R = r
neighbours = []
for cur, a in enumerate(R[:-1]):
for b in R[cur + 1:]:
if intersect(a[1], b[1]):
neighbours.append((a, b))
return neighbours
def _merge_regions(r1, r2):
new_size = r1["size"] + r2["size"]
rt = {
"min_x": min(r1["min_x"], r2["min_x"]),
"min_y": min(r1["min_y"], r2["min_y"]),
"max_x": max(r1["max_x"], r2["max_x"]),
"max_y": max(r1["max_y"], r2["max_y"]),
"size": new_size,
"hist_c": (
r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
"hist_t": (
r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
"labels": r1["labels"] + r2["labels"]
}
return rt
def selective_search(
im_orig, scale=1.0, sigma=0.8, min_size=50):
'''Selective Search
Parameters
----------
im_orig : ndarray
Input image
scale : int
Free parameter. Higher means larger clusters in felzenszwalb segmentation.
sigma : float
Width of Gaussian kernel for felzenszwalb segmentation.
min_size : int
Minimum component size for felzenszwalb segmentation.
Returns
-------
img : ndarray
image with region label
region label is stored in the 4th value of each pixel [r,g,b,(region)]
regions : array of dict
[
{
'rect': (left, top, right, bottom),
'labels': [...]
},
...
]
'''
assert im_orig.shape[2] == 3, "3ch image is expected"
# load image and get smallest regions
# region label is stored in the 4th value of each pixel [r,g,b,(region)]
img = _generate_segments(im_orig, scale, sigma, min_size)
if img is None:
return None, {}
imsize = img.shape[0] * img.shape[1]
R = _extract_regions(img)
# extract neighbouring information
neighbours = _extract_neighbours(R)
# calculate initial similarities
S = {}
for (ai, ar), (bi, br) in neighbours:
S[(ai, bi)] = _calc_sim(ar, br, imsize)
# hierarchal search
while S != {}:
# get highest similarity
# i, j = sorted(S.items(), cmp=lambda a, b: cmp(a[1], b[1]))[-1][0]
i, j = sorted(list(S.items()), key = lambda a: a[1])[-1][0]
# merge corresponding regions
t = max(R.keys()) + 1.0
R[t] = _merge_regions(R[i], R[j])
# mark similarities for regions to be removed
key_to_delete = []
for k, v in S.items():
if (i in k) or (j in k):
key_to_delete.append(k)
# remove old similarities of related regions
for k in key_to_delete:
del S[k]
# calculate similarity set with the new region
for k in filter(lambda a: a != (i, j), key_to_delete):
n = k[1] if k[0] in (i, j) else k[0]
S[(t, n)] = _calc_sim(R[t], R[n], imsize)
regions = []
for k, r in R.items():
regions.append({
'rect': (
r['min_x'], r['min_y'],
r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),
'size': r['size'],
'labels': r['labels']
})
return img, regions
调用selectivesearch.py
test.py
import matplotlib.pyplot as plt
from PIL import Image
import matplotlib.patches as mpatches
from selectivesearch import selective_search
import numpy as np
import skimage.data
def main():
# img_path = '6.jpg'
# img = Image.open(img_path)
# img_data = np.asarray(img)
# img = skimage.data.astronaut()
img_data = skimage.data.imread("6.jpg")
# perform selective search
img_lbl, regions = selective_search(img_data)
# 计算利用Selective Search算法得到了多少个候选区域
print('regions:',len(regions))
# 创建候选框集合candidate
candidates = set() #创建一个集合 元素不会重复,每一个元素都是一个list(左上角x,左上角y,宽,高),表示一个候选区域的边框
for r in regions:
# excluding same rectangle (with different segments)
if r['rect'] in candidates: #排除重复的候选区
continue
# excluding regions smaller than 2000 pixels
if r['size'] < 2000: #排除小于 2000 pixels的候选区域(并不是bounding box中的区域大小)
continue
# distorted rects
x, y, w, h = r['rect'] #排除扭曲的候选区域边框 即只保留近似正方形的
if w / h > 1.2 or h / w > 1.2:
continue
candidates.add(r['rect'])
print('candidates:',len(candidates))
# draw rectangles on the original image
fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6)) #在原始图像上绘制候选区域边框
# img = plt.imread(img_path)
ax.imshow(img_data)
for x, y, w, h in candidates:
print(x, y, w, h)
rect = mpatches.Rectangle(
(x, y), w, h, fill=False, edgecolor='red', linewidth=1)
ax.add_patch(rect)
plt.savefig('6_ss.jpg', dpi=600)
plt.show()
if __name__ == '__main__':
main()
运行结果:
原图片:
输出图片:
参考:
R-CNN算法学习(步骤一:候选区域生成)
https://blog.csdn.net/m0_37970224/article/details/85238603
目标检测之选择性搜索-Selective Search
https://www.cnblogs.com/gezhuangzhuang/p/10451296.html
Selective Search 选择性搜索算法原理
https://blog.csdn.net/wangc1994/article/details/102548037
知乎大佬,总结的很到位
https://zhuanlan.zhihu.com/p/23006190
https://github.com/yangxue0827/RCNN/blob/master/selectivesearch.py
https://github.com/kzktsan/MyTools/blob/master/ss/example/example.py