【笔记】图像的选择搜索算法(selective search)
1)介绍
在目标检测当中,RCNN神经网络在提取候选框中使用了选择搜索算法(selective search);可参考 博客
首先明白选择搜索算法是干嘛的:为了分割图像,提取图像候选区域;;
感性理解:数字图像包含很多信息,可以抽象出来颜色,形状,纹理等,图像中物体和背景之间是有一定层次关系的,我们可以通过纹理,颜色等等特征来进行将物体和背景分割出来;选择搜索算法就是要获取区域,并使用一定策略来合并相似区域,得到更大尺度的区域;从而提供目标候区域;
2)具体算法
1、算法总结:
1)输入一张图像生成区域集合 R R R
2)初始化相似度集合 S = ∅ S=\empty S=∅
3)计算两两相邻区域之间的相似度,将其添加到相似度集合 S S S中;
4)在 S S S中求最大值,即找出相似度最高的两个区域;将其合并为新区域 $ r_t , 添 加 到 新 区 域 集 合 ,添加到新区域集合 ,添加到新区域集合R$中;
5)计算新区域 r t r_t rt和相邻区域的相似度(合并 r t r_t rt的两个子区域的领域区域和新区域相邻);添加到相似度集合 S S S中,并移除步骤4子区域有关的相似度值;
6)重复步骤4 和 5,直到 S = = ∅ S==\empty S==∅ ;最后一步得到的新区域 r t r_t rt为完整的图像;
7)筛选区域集合R;(去除像素数量小于某个值,以及匡高比大于1.2的);输出每个区域的外接框,就是物体位置的所有可能结果;
总结:
1)区域的产生方法,需要综合考虑颜色纹理等多种特征;
2)搜索的过程,区域集合R不断扩增,是一个区域合并的过程,这个过程适应大尺度区域,小区域由一开始生成;
3)相较于滑动框穷举方法,该方式胜在初始区域集合要原小于穷举结合;且搜索过程优于使用不同尺度框来穷举;
2、初始图像分割
那么这个图像是如何分割得到原始区域的呢?
先看代码:这里直接调用skimage的分割方法;
def _generate_segments(im_orig, scale, sigma, min_size):
"""
合并Felzenswalb掩码作为图像的第四通道
"""
# 图像分割,使用skimage的函数进行分割
im_mask = segmentation.felzenszwalb(util.img_as_float(im_orig), scale=scale, sigma=sigma, min_size=min_size)
# merge mask channel to the image as a 4th channel
# 扩充一个通道,在第四通道存储每一个像素所属最小区域标签
im_mask_ = np.zeros(im_orig.shape[:2])[:, :, np.newaxis] # (424, 640, 1)
im_orig = np.append(im_orig, im_mask_, axis=2) # shape(424, 640, 4)
im_orig[:, :, -1] = im_mask
return im_orig
其中im_mask存储是一个矩阵,和原图像同等大小,每一个像素位置标记原图像像素所属区域;
关于图像分割参考:https://www.leiphone.com/news/201902/oIBt9Bj8WAp5rs7f.html
3、图像分割算法综述
1)阈值分割
其中包括有监督阈值分割和无监督阈值分割
2)基于轮廓的分割
3)SLIC(简单线性迭代聚类)
SLIC 算法实际上使用了一种叫做 k-means 的机器学习算法。它接收图像的所有像素值,并尝试将它们分离到给定数量的子区域中。
3)总结
图像分割是图像处理中非常重要的一个步骤。它是一个热门的研究领域,应用非常广泛,从计算机视觉到医学图像、从交通和视频监控等领域都有涉及。
Python scikit-image
提供了一个非常强大的库,该库具有大量用于图像处理的算法。它是免费的,没有任何限制,在其背后有一个活跃的社区。你可以查看他们的文档,了解关于库及其用例的更多信息。
跑题了,跑题了!
总之可以使用skimage方便的进行图像分割的操作,底层源码,使用c语言编译,保证了分割的速度;
4)代码
附上;select search的代码;
# -*- coding: utf-8 -*-
"""
"""
import skimage.io
import skimage.transform
import skimage.util
from skimage import segmentation, util, color, feature, io
from matplotlib import patches
import matplotlib.pyplot as plt
import numpy as np
import numpy
def _generate_segments(im_orig, scale, sigma, min_size):
"""
"""
# 图像分割,使用skimage的函数进行分割
im_mask = segmentation.felzenszwalb(util.img_as_float(im_orig), scale=scale, sigma=sigma, min_size=min_size)
# merge mask channel to the image as a 4th channel
# 扩充一个通道,在第四通道存储每一个像素所属最小区域标签
im_mask_ = np.zeros(im_orig.shape[:2])[:, :, np.newaxis] # (424, 640, 1)
im_orig = np.append(im_orig, im_mask_, axis=2) # shape(424, 640, 4)
im_orig[:, :, -1] = im_mask
return im_orig
def _sim_colour(r1, r2):
"""
计算颜色相似度的总和
"""
return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])])
def _sim_texture(r1, r2):
"""
计算纹理相似度的和
"""
return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])])
def _sim_size(r1, r2, imsize):
"""
calculate the size similarity over the image
"""
return 1.0 - (r1["size"] + r2["size"]) / imsize
def _sim_fill(r1, r2, imsize):
"""
计算填充相似度
"""
bbsize = (
(max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"]))
* (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))
)
return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize
def _calc_sim(r1, r2, imsize):
sim_colour = _sim_colour(r1, r2)
sim_texture = _sim_texture(r1, r2)
sim_size = _sim_size(r1, r2, imsize)
sim_fill = _sim_fill(r1, r2, imsize)
return (sim_colour + sim_texture + sim_size + sim_fill)
def _calc_texture_gradient(img):
"""
calculate texture gradient for entire image
The original Selective Search algorithm proposed Gaussian derivative
for 8 orientations, but we use LBP instead.
"""
im_texture = np.zeros(img.shape[:3]) # (424, 640, 4)
for colour_channel in (0, 1, 2):
im_texture[:, :, colour_channel] = feature.local_binary_pattern(
img[:, :, colour_channel], 8, 1.0)
return im_texture
def _calc_colour_hist(img):
"""
calculate colour histogram for each region
the size of output histogram will be BINS * COLOUR_CHANNELS(3)
number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf]
extract HSV
"""
BINS = 25
hist = numpy.array([])
for colour_channel in (0, 1, 2):
c = img[:, colour_channel]
# calculate histogram for each colour and join to the result
hist = numpy.concatenate(
[hist] + [np.histogram(c, BINS, (0.0, 255.0))[0]])
# L1 normalize
hist = hist / len(img)
return hist
def _calc_texture_hist(img):
"""
计算每个区域的纹理直方图
计算每种颜色的梯度直方图
the size of output histogram will be
BINS * ORIENTATIONS * COLOUR_CHANNELS(3)
"""
BINS = 10
hist = np.array([])
# extracting colour channel
for colour_channel in (0, 1, 2):
fd = img[:, colour_channel]
# calculate histogram for each orientation and concatenate them all
# and join to the result
hist = np.concatenate(
[hist] + [np.histogram(fd, BINS, (0.0, 1.0))[0]])
# L1 Normalize
hist = hist / len(img)
return hist
def _extract_regions(img):
"""
利用Felzenswalb和Huttenlocher算法对最小区域进行分割
"""
R = {}
# 记录每一个由FFelzenswalb算法分割出来的区域
# pass 1: count pixel positions accroding segmentation image
for y, i in enumerate(img): # iter rows
for x, (r, g, b, l) in enumerate(i): # iter cols
# initialize a new region
if l not in R:
R[l] = {
"min_x": 0xffff, "min_y": 0xffff,
"max_x": 0, "max_y": 0, "labels": [l]}
# bounding box
if R[l]["min_x"] > x:
R[l]["min_x"] = x
if R[l]["min_y"] > y:
R[l]["min_y"] = y
if R[l]["max_x"] < x:
R[l]["max_x"] = x
if R[l]["max_y"] < y:
R[l]["max_y"] = y
# pass 2: calculate texture gradient and hsv
tex_grad = _calc_texture_gradient(img)
hsv = color.rgb2hsv(img[:, :, :3])
# pass 3: calculate colour histogram of each region
for k, v in list(R.items()):
masked = [img[:, :, 3] == k] # true / false
# colour histogram
# mask the pixels in color
masked_pixels = hsv[:, :, :][masked] # shape( color_size, 3)
R[k]["size"] = len(masked_pixels / 4)
R[k]["hist_c"] = _calc_colour_hist(masked_pixels)
# texture histogram
# mask the pixels in texture
masked_texture = tex_grad[:, :][masked] # shape( color_size, 3)
R[k]["hist_t"] = _calc_texture_hist(masked_texture)
return R
def _extract_neighbours(regions):
"""
regions: dict
"""
def intersect(a, b):
if (a["min_x"] < b["min_x"] < a["max_x"]
and a["min_y"] < b["min_y"] < a["max_y"]) or (
a["min_x"] < b["max_x"] < a["max_x"]
and a["min_y"] < b["max_y"] < a["max_y"]) or (
a["min_x"] < b["min_x"] < a["max_x"]
and a["min_y"] < b["max_y"] < a["max_y"]) or (
a["min_x"] < b["max_x"] < a["max_x"]
and a["min_y"] < b["min_y"] < a["max_y"]):
return True
return False
R = list(regions.items())
neighbours = []
for idx, a in enumerate(R[:-1]):
for b in R[idx + 1:]:
if intersect(a[1], b[1]):
neighbours.append((a, b))
return neighbours
def _merge_regions(r1, r2):
new_size = r1["size"] + r2["size"]
rt = {
"min_x": min(r1["min_x"], r2["min_x"]),
"min_y": min(r1["min_y"], r2["min_y"]),
"max_x": max(r1["max_x"], r2["max_x"]),
"max_y": max(r1["max_y"], r2["max_y"]),
"size": new_size,
"hist_c": (
r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
"hist_t": (
r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
"labels": r1["labels"] + r2["labels"]
}
return rt
def selective_search(im_orig, scale=1.0, sigma=0.8, min_size=50):
'''
核心代码
:param im_orig:
:param scale:
:param sigma:
:param min_size:
:return:
'''
assert im_orig.shape[2] == 3, "输入应该是彩色图片" # 彩色图像第三个通道为3
# load image and get smallest regions
# region label is stored in the 4th value of each pixel [r,g,b,(region)]
img = _generate_segments(im_orig, scale, sigma, min_size)
# region_label = img[:,:,-1]
# plt.imshow(region_label)
# plt.show()
if img is None:
print("ERROR in felzenszwalb")
return None, {}
# img: (640, 424, 4)
R = _extract_regions(img)
imsize = img.shape[0] * img.shape[1]
# extract neighbouring information
neighbours = _extract_neighbours(R)
# calculate initial similarities
S = {}
for (ai, ar), (bi, br) in neighbours:
S[(ai, bi)] = _calc_sim(ar, br, imsize)
# print(list(S.items()))
# exit()
# hierarchal search
while S != {}:
# get highest similarity
# i, j = sorted(list(S.items()), cmp=lambda a, b: cmp(a[1], b[1]))[-1][0]
highest = sorted(list(S.items()), key=lambda a: a[1])[-1]
i, j = highest[0]
# merge corresponding regions
t = max(R.keys()) + 1.0
R[t] = _merge_regions(R[i], R[j])
# mark similarities for regions to be removed
key_to_delete = []
for k, v in S.items():
if (i in k) or (j in k):
key_to_delete.append(k)
# remove old similarities of related regions
for k in key_to_delete:
del S[k]
# calculate similarity set with the new region
for k in filter(lambda a: a != (i, j), key_to_delete):
n = k[1] if k[0] in (i, j) else k[0]
S[(t, n)] = _calc_sim(R[t], R[n], imsize)
regions = []
for k, r in list(R.items()):
regions.append({
'rect': (
r['min_x'], r['min_y'],
r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),
'size': r['size'],
'labels': r['labels']
})
return regions
def main():
# skimage.io 读入图片
img = io.imread(r'D:\Projects\selective_search_py\data\naroto.jpg')
regions = selective_search(img, scale=500, sigma=0.9, min_size=10)
candidates = set() # 过滤找到的区域
for r in regions:
if r['rect'] in candidates: continue
if r['size'] < 100: continue
candidates.add(r['rect'])
fig = plt.figure()
ax1 = fig.add_subplot(1, 2, 1)
ax1.imshow(img)
ax2 = fig.add_subplot(1, 2, 2)
ax2.imshow(img)
for x, y, w, h in candidates:
rect = patches.Rectangle((x, y), w, h, fill=False, edgecolor='red', linewidth=1)
ax2.add_patch(rect)
plt.show()
if __name__ == '__main__':
main()
☘️