Selective Search（选择性搜索）学习总结

最新推荐文章于 2022-12-01 23:45:18 发布

宿魄man

最新推荐文章于 2022-12-01 23:45:18 发布

阅读量462

点赞数

分类专栏：目标检测

本文链接：https://blog.csdn.net/weixin_42234720/article/details/96111777

版权

目标检测专栏收录该内容

7 篇文章 0 订阅

订阅专栏

选择性搜索：
• 一种目标用于目标检测的暴力方法就是从左往右，从上往下滑动窗口，利用分类识别目标，为了识别不同距离出检测不同的目标类型，可以使用不同大小、高宽比的窗口进行滑动，然后将窗口内的图片进行特征提取后送入分类器，进行分类判断即可完成目标检测的操作。很显然这种方法复杂度太高，会产生太多的冗余候选区域，实际应用中是不太可行的。
• 在穷举暴力法的基础上，进行一些剪枝操作，只选择固定大小和高宽比的窗口，这种方式在某些特定应用场景中是有效的，但是对于普通的目标检测而言，计算复杂度还是比较高的。
• 为了解决刚刚的区域窗口获取过程中的问题，选择性搜索算法被提出来了，其问题的解决核心在于如何有效的去除冗余候选区域；在选择性搜索算法中，利用冗余候选区域大多是发生重叠的这个特征，自底向上的合并相邻的相似区域，从而减少冗余。
Selective Search流程

算法执行流程：
• 1. 将图像划分为细粒度的小块；使用算法：felzenszwalb；
• 2. 使用贪婪策略，计算两个相邻的区域的相似度，然后合并最相似的两块区域，直到最终只剩下一个完整的图像(S集合中没有相似度就结束区域的合并)；
• 3. 获取相似度最相似的形成最终的候选区域。
Selective Search算法原理
区域合并：
• 区域合并采用了多样性的策略，如果仅仅考虑单一策略，容易导致合并不相似的区域，比如仅考虑纹理的时候，不同颜色的区域很可能会误合并，所以在选择性搜索中采用三种多样性策略来增加候选区域的可能性：
• 多种颜色空间：考虑RGB、灰度、HSV等
• 多种相似度度量标准，既考虑颜色相似度，又考虑纹理、大小、重叠情况等。
• 通过更改阈值初始化原始区域，阈值越大，分割区域的块越少。
• 通过色彩空间变换，将原始色彩空间转换到多达八中的色彩空间。作者采用了8中不同的颜色方式，主要是为了考虑场景以及光照条件等。这个策略主要应用于中图像分割算法中原始区域的生成 (两个像素点的相似度计算时，计算不同颜色空间下的两点距离)。主要使用的颜色空间有：（1）RGB，（2）灰度I，（3）Lab，（4） rgI（归一化的rg通道加上灰度），（5）HSV，（6）rgb（归一化的RGB），（7）C，（8）H（HSV的H通道）

•颜色相似度：
• 使用L1-norm归一化获取图像每个颜色通道的25bins直方图，这样每个区域可以得到一个75维的向量，区域之间的相似度通过下列公式计算；值越大表示越相似，值越小表示越不相似。
颜色相似度计算
• 纹理相似度：
• 采用SIFT特征(尺度不变特征变换)，具体特征求解做法为：对每个颜色通道的8个不同方向计算方差σ=1的高斯微分；使用L1-norm归一化获取图像每个颜色通道每个方向的10个bins的直方图，这样可以得到一个240维的向量(3810)，其相似度计算方式以及合并后的纹理特征计算方式同颜色相似度计算过程。
文理相似度计算
• 如果仅仅基于颜色和纹理特征来进行合并的话，会导致合并之后的区域不断吞并周围的区域，后果就是多尺度仅应用与某个局部，而不是全局的多尺度，因此，需要给小的区域更多的权重，这样可以保证图像每个位置都是多尺度的合并，也就是优先合并小的区域。
小区域优先合并
• 合并上述四种相似度，即可得到合并区域的相似度的值：
合并区域相似度计算
• 给予最先合并的区域一个比较大的权重，给予最后一个合并的区域一个最小的权重系数，如果存在一样合并的方式的时候，乘以一个随机数来给定权重，如果相同区域被多次进行合并重叠，那么对应的权重进行权重叠加操作，最终根据权重系数进行排序，从而得到最优的候选区域。
• 通过这种方式合并候选区域后，将每个候选区域对应的图像提取图像特征向量后，将特征向量输入SVM分类器即可进行目标检测操作。

相关代码tensorflow实现：

-- coding: utf-8 --

import skimage.io
import skimage.feature
import skimage.color
import skimage.transform
import skimage.util
import skimage.segmentation
import numpy

“Selective Search for Object Recognition” by J.R.R. Uijlings et al.

- Modified version with LBP extractor for texture vectorization

def _generate_segments(im_orig, scale, sigma, min_size):
“”"
segment smallest regions by the algorithm of Felzenswalb and
Huttenlocher
“”"

# open the Image
im_mask = skimage.segmentation.felzenszwalb(
    skimage.util.img_as_float(im_orig), scale=scale, sigma=sigma,
    min_size=min_size)

# merge mask channel to the image as a 4th channel
im_orig = numpy.append(
    im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2)
im_orig[:, :, 3] = im_mask

return im_orig

def _sim_colour(r1, r2):
“”"
calculate the sum of histogram intersection of colour
“”"
return sum([min(a, b) for a, b in zip(r1[“hist_c”], r2[“hist_c”])])

def _sim_texture(r1, r2):
“”"
calculate the sum of histogram intersection of texture
“”"
return sum([min(a, b) for a, b in zip(r1[“hist_t”], r2[“hist_t”])])

def _sim_size(r1, r2, imsize):
“”"
calculate the size similarity over the image
“”"
return 1.0 - (r1[“size”] + r2[“size”]) / imsize

def _sim_fill(r1, r2, imsize):
“”"
calculate the fill similarity over the image
“”"
bbsize = (
(max(r1[“max_x”], r2[“max_x”]) - min(r1[“min_x”], r2[“min_x”]))
* (max(r1[“max_y”], r2[“max_y”]) - min(r1[“min_y”], r2[“min_y”]))
)
return 1.0 - (bbsize - r1[“size”] - r2[“size”]) / imsize

def _calc_sim(r1, r2, imsize):
return (_sim_colour(r1, r2) + _sim_texture(r1, r2)
+ _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize))

def _calc_colour_hist(img):
“”"
calculate colour histogram for each region

    the size of output histogram will be BINS * COLOUR_CHANNELS(3)

    number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf]

    extract HSV
"""

BINS = 25
hist = numpy.array([])

for colour_channel in (0, 1, 2):
    # extracting one colour channel
    c = img[:, colour_channel]

    # calculate histogram for each colour and join to the result
    hist = numpy.concatenate(
        [hist] + [numpy.histogram(c, BINS, (0.0, 255.0))[0]])

# L1 normalize
hist = hist / len(img)

return hist

def _calc_texture_gradient(img):
“”"
calculate texture gradient for entire image

    The original SelectiveSearch algorithm proposed Gaussian derivative
    for 8 orientations, but we use LBP instead.

    output will be [height(*)][width(*)]
"""
ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2]))

for colour_channel in (0, 1, 2):
    ret[:, :, colour_channel] = skimage.feature.local_binary_pattern(
        img[:, :, colour_channel], 8, 1.0)

return ret

def _calc_texture_hist(img):
“”"
calculate texture histogram for each region

    calculate the histogram of gradient for each colours
    the size of output histogram will be
        BINS * ORIENTATIONS * COLOUR_CHANNELS(3)
"""
BINS = 10

hist = numpy.array([])

for colour_channel in (0, 1, 2):
    # mask by the colour channel
    fd = img[:, colour_channel]

    # calculate histogram for each orientation and concatenate them all
    # and join to the result
    hist = numpy.concatenate(
        [hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]])

# L1 Normalize
hist = hist / len(img)

return hist

def _extract_regions(img):
R = {}

# get hsv image
hsv = skimage.color.rgb2hsv(img[:, :, :3])

# pass 1: count pixel positions
for y, i in enumerate(img):

    for x, (r, g, b, l) in enumerate(i):

        # initialize a new region
        if l not in R:
            R[l] = {
                "min_x": 0xffff, "min_y": 0xffff,
                "max_x": 0, "max_y": 0, "labels": [l]}

        # bounding box
        if R[l]["min_x"] > x:
            R[l]["min_x"] = x
        if R[l]["min_y"] > y:
            R[l]["min_y"] = y
        if R[l]["max_x"] < x:
            R[l]["max_x"] = x
        if R[l]["max_y"] < y:
            R[l]["max_y"] = y

# pass 2: calculate texture gradient
tex_grad = _calc_texture_gradient(img)

# pass 3: calculate colour histogram of each region
for k, v in R.items():
    # colour histogram
    masked_pixels = hsv[:, :, :][img[:, :, 3] == k]
    R[k]["size"] = len(masked_pixels / 4)
    R[k]["hist_c"] = _calc_colour_hist(masked_pixels)

    # texture histogram
    R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k])

return R

def _extract_neighbours(regions):
def intersect(a, b):
if (a[“min_x”] < b[“min_x”] < a[“max_x”]
and a[“min_y”] < b[“min_y”] < a[“max_y”]) or (
a[“min_x”] < b[“max_x”] < a[“max_x”]
and a[“min_y”] < b[“max_y”] < a[“max_y”]) or (
a[“min_x”] < b[“min_x”] < a[“max_x”]
and a[“min_y”] < b[“max_y”] < a[“max_y”]) or (
a[“min_x”] < b[“max_x”] < a[“max_x”]
and a[“min_y”] < b[“min_y”] < a[“max_y”]):
return True
return False

R = regions.items()
r = [elm for elm in R]
R = r
neighbours = []
for cur, a in enumerate(R[:-1]):
    for b in R[cur + 1:]:
        if intersect(a[1], b[1]):
            neighbours.append((a, b))

return neighbours

def _merge_regions(r1, r2):
new_size = r1[“size”] + r2[“size”]
rt = {
“min_x”: min(r1[“min_x”], r2[“min_x”]),
“min_y”: min(r1[“min_y”], r2[“min_y”]),
“max_x”: max(r1[“max_x”], r2[“max_x”]),
“max_y”: max(r1[“max_y”], r2[“max_y”]),
“size”: new_size,
“hist_c”: (
r1[“hist_c”] * r1[“size”] + r2[“hist_c”] * r2[“size”]) / new_size,
“hist_t”: (
r1[“hist_t”] * r1[“size”] + r2[“hist_t”] * r2[“size”]) / new_size,
“labels”: r1[“labels”] + r2[“labels”]
}
return rt

def selective_search(im_orig, scale=1.0, sigma=0.8, min_size=50):
‘’'Selective Search

Parameters
----------
    im_orig : ndarray
        Input image
    scale : int
        Free parameter. Higher means larger clusters in felzenszwalb segmentation.
    sigma : float
        Width of Gaussian kernel for felzenszwalb segmentation.
    min_size : int
        Minimum component size for felzenszwalb segmentation.
Returns
-------
    img : ndarray
        image with region label
        region label is stored in the 4th value of each pixel [r,g,b,(region)]
    regions : array of dict
        [
            {
                'rect': (left, top, right, bottom),
                'labels': [...]
            },
            ...
        ]
'''
assert im_orig.shape[2] == 3, "3ch image is expected"

# load image and get smallest regions
# region label is stored in the 4th value of each pixel [r,g,b,(region)]
img = _generate_segments(im_orig, scale, sigma, min_size)

if img is None:
    return None, {}

imsize = img.shape[0] * img.shape[1]
R = _extract_regions(img)

# extract neighbouring information
neighbours = _extract_neighbours(R)

# calculate initial similarities
S = {}
for (ai, ar), (bi, br) in neighbours:
    S[(ai, bi)] = _calc_sim(ar, br, imsize)

# hierarchal search
while S != {}:

    # get highest similarity
    # i, j = sorted(S.items(), cmp=lambda a, b: cmp(a[1], b[1]))[-1][0]
    i, j = sorted(list(S.items()), key=lambda a: a[1])[-1][0]

    # merge corresponding regions
    t = max(R.keys()) + 1.0
    R[t] = _merge_regions(R[i], R[j])

    # mark similarities for regions to be removed
    key_to_delete = []
    for k, v in S.items():
        if (i in k) or (j in k):
            key_to_delete.append(k)

    # remove old similarities of related regions
    for k in key_to_delete:
        del S[k]

    # calculate similarity set with the new region
    for k in filter(lambda a: a != (i, j), key_to_delete):
        n = k[1] if k[0] in (i, j) else k[0]
        S[(t, n)] = _calc_sim(R[t], R[n], imsize)

regions = []
for k, r in R.items():
    regions.append({
        'rect': (
            r['min_x'], r['min_y'],
            r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),
        'size': r['size'],
        'labels': r['labels']
    })

return img, regions

结果图实例：
SS后目标区域框
参考资料

宿魄man

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Selective Search（选择性搜索）学习总结

选择性搜索：• 一种目标用于目标检测的暴力方法就是从左往右，从上往下滑动窗口，利用分类识别目标，为了识别不同距离出检测不同的目标类型，可以使用不同大小、高宽比的窗口进行滑动，然后将窗口内的图片进行特征提取后送入分类器，进行分类判断即可完成目标检测的操作。很显然这种方法复杂度太高，会产生太多的冗余候选区域，实际应用中是不太可行的。• 在穷举暴力法的基础上，进行一些剪枝操作，只选择固定大小...
复制链接

扫一扫