selective search的策略是,先利用基于图的图像分割的方法得到小尺度的区域,然后一次次合并得到大的尺寸。
1、适应不同尺度:穷举搜索通过改变窗口大小来适应物体的不同尺度,选择搜索(Selective Search)同样无法避免这个问题。算法采用了图像分割(Image Segmentation)以及使用一种层次算法(Hierarchical Algorithm)有效地解决了这个问题。
2、多样化:单一的策略无法应对多种类别的图像。使用颜色、纹理、大小等多种策略对分割好的区域进行合并。
3、速度快:算法,就像功夫一样,唯快不破!
一、选择性搜索的具体算法(区域合并算法)
首先通过基于图的图像分割方法初始化原始区域,就是将图像分割成很多很多的小块。然后我们使用贪心策略,计算每两个相邻的区域的相似度,然后每次合并最相似的两块,直到最终只剩下一块完整的图片。然后这其中每次产生的图像块包括合并的图像块我们都保存下来,这样就得到图像的分层表示了。
二、保持多样性的策略
1、颜色空间变换
2、区域相似度计算
(1)颜色相似度:使用L1-norm归一化获取图像每个颜色通道的25 bins的直方图,每一个颜色通道的直方图累加和为1.0,三个通道的累加和就为3.0,如果区域ci和区域cj直方图完全一样,则此时颜色相似度最大为3.0,如果不一样,由于累加取两个区域bin的最小值进行累加,当直方图差距越大,累加的和就会越小,即颜色相似度越小。
(2)纹理相似度:采用SIFT-Like特征,对每个颜色通道的8个不同方向计算方差σ=1的高斯微分(Gaussian Derivative),使用L1-norm归一化获取图像每个颜色通道的每个方向的10 bins的直方图,区域之间纹理相似度计算方式和颜色相似度计算方式类似。
(3)优先合并小的区域:如果仅仅是通过颜色和纹理特征合并的话,很容易使得合并后的区域不断吞并周围的区域,后果就是多尺度只应用在了那个局部,而不是全局的多尺度。因此我们给小的区域更多的权重,这样保证在图像每个位置都是多尺度的在合并。
(4)区域的合适度距离:如果区域ri包含在rj内,我们首先应该合并,另一方面,如果ri很难与rj相接,他们之间会形成断崖,不应该合并在一块。
合并以上四种相似度。
三、给区域打分
给予最先合并的图片块较大的权重,比如最后一块完整图像权重为1,倒数第二次合并的区域权重为2以此类推。但是当我们策略很多,多样性很多的时候呢,这个权重就会有太多的重合了,排序不好搞啊。文章做法是给他们乘以一个随机数,然后对于相同的区域多次出现的也叠加下权重,毕竟多个方法都说你是目标,也是有理由的嘛。这样我就得到了所有区域的目标分数,也就可以根据自己的需要选择需要多少个区域了。
四、选择性搜索性能评估
通过算法计算得到的包含物体的Bounding Boxes与真实情况(ground truth)的窗口重叠越多,那么算法性能就越好。这是使用的指标是平均最高重叠率ABO(Average Best Overlap)。对于所有类别下的性能评价,很自然就是使用所有类别的ABO的平均值MABO(Mean Average Best Overlap)来评价。
五、selective search函数
安装Selective Search包:pip install selectivesearch
代码:
# -*- coding: utf-8 -*- import skimage.io import skimage.feature import skimage.color import skimage.transform import skimage.util import skimage.segmentation import numpy # "Selective Search for Object Recognition" by J.R.R. Uijlings et al. # # - Modified version with LBP extractor for texture vectorization def _generate_segments(im_orig, scale, sigma, min_size): """ segment smallest regions by the algorithm of Felzenswalb and Huttenlocher """ # open the Image im_mask = skimage.segmentation.felzenszwalb( skimage.util.img_as_float(im_orig), scale=scale, sigma=sigma, min_size=min_size) # merge mask channel to the image as a 4th channel im_orig = numpy.append( im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2) im_orig[:, :, 3] = im_mask return im_orig def _sim_colour(r1, r2): """ calculate the sum of histogram intersection of colour """ return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])]) def _sim_texture(r1, r2): """ calculate the sum of histogram intersection of texture """ return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])]) def _sim_size(r1, r2, imsize): """ calculate the size similarity over the image """ return 1.0 - (r1["size"] + r2["size"]) / imsize def _sim_fill(r1, r2, imsize): """ calculate the fill similarity over the image """ bbsize = ( (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"])) * (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"])) ) return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize def _calc_sim(r1, r2, imsize): return (_sim_colour(r1, r2) + _sim_texture(r1, r2) + _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize)) def _calc_colour_hist(img): """ calculate colour histogram for each region the size of output histogram will be BINS * COLOUR_CHANNELS(3) number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf] extract HSV """ BINS = 25 hist = numpy.array([]) for colour_channel in (0, 1, 2): # extracting one colour channel c = img[:, colour_channel] # calculate histogram for each colour and join to the result hist = numpy.concatenate( [hist] + [numpy.histogram(c, BINS, (0.0, 255.0))[0]]) # L1 normalize hist = hist / len(img) return hist def _calc_texture_gradient(img): """ calculate texture gradient for entire image The original SelectiveSearch algorithm proposed Gaussian derivative for 8 orientations, but we use LBP instead. output will be [height(*)][width(*)] """ ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2])) for colour_channel in (0, 1, 2): ret[:, :, colour_channel] = skimage.feature.local_binary_pattern( img[:, :, colour_channel], 8, 1.0) return ret def _calc_texture_hist(img): """ calculate texture histogram for each region calculate the histogram of gradient for each colours the size of output histogram will be BINS * ORIENTATIONS * COLOUR_CHANNELS(3) """ BINS = 10 hist = numpy.array([]) for colour_channel in (0, 1, 2): # mask by the colour channel fd = img[:, colour_channel] # calculate histogram for each orientation and concatenate them all # and join to the result hist = numpy.concatenate( [hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]]) # L1 Normalize hist = hist / len(img) return hist def _extract_regions(img): R = {} # get hsv image hsv = skimage.color.rgb2hsv(img[:, :, :3]) # pass 1: count pixel positions for y, i in enumerate(img): for x, (r, g, b, l) in enumerate(i): # initialize a new region if l not in R: R[l] = { "min_x": 0xffff, "min_y": 0xffff, "max_x": 0, "max_y": 0, "labels": [l]} # bounding box if R[l]["min_x"] > x: R[l]["min_x"] = x if R[l]["min_y"] > y: R[l]["min_y"] = y if R[l]["max_x"] < x: R[l]["max_x"] = x if R[l]["max_y"] < y: R[l]["max_y"] = y # pass 2: calculate texture gradient tex_grad = _calc_texture_gradient(img) # pass 3: calculate colour histogram of each region for k, v in R.items(): # colour histogram masked_pixels = hsv[:, :, :][img[:, :, 3] == k] R[k]["size"] = len(masked_pixels / 4) R[k]["hist_c"] = _calc_colour_hist(masked_pixels) # texture histogram R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k]) return R def _extract_neighbours(regions): def intersect(a, b): if (a["min_x"] < b["min_x"] < a["max_x"] and a["min_y"] < b["min_y"] < a["max_y"]) or ( a["min_x"] < b["max_x"] < a["max_x"] and a["min_y"] < b["max_y"] < a["max_y"]) or ( a["min_x"] < b["min_x"] < a["max_x"] and a["min_y"] < b["max_y"] < a["max_y"]) or ( a["min_x"] < b["max_x"] < a["max_x"] and a["min_y"] < b["min_y"] < a["max_y"]): return True return False R = regions.items() r = [elm for elm in R] R = r neighbours = [] for cur, a in enumerate(R[:-1]): for b in R[cur + 1:]: if intersect(a[1], b[1]): neighbours.append((a, b)) return neighbours def _merge_regions(r1, r2): new_size = r1["size"] + r2["size"] rt = { "min_x": min(r1["min_x"], r2["min_x"]), "min_y": min(r1["min_y"], r2["min_y"]), "max_x": max(r1["max_x"], r2["max_x"]), "max_y": max(r1["max_y"], r2["max_y"]), "size": new_size, "hist_c": ( r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size, "hist_t": ( r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size, "labels": r1["labels"] + r2["labels"] } return rt def selective_search( im_orig, scale=1.0, sigma=0.8, min_size=50): '''Selective Search Parameters ---------- im_orig : ndarray Input image scale : int Free parameter. Higher means larger clusters in felzenszwalb segmentation. sigma : float Width of Gaussian kernel for felzenszwalb segmentation. min_size : int Minimum component size for felzenszwalb segmentation. Returns ------- img : ndarray image with region label region label is stored in the 4th value of each pixel [r,g,b,(region)] regions : array of dict [ { 'rect': (left, top, right, bottom), 'labels': [...] }, ... ] ''' assert im_orig.shape[2] == 3, "3ch image is expected" # load image and get smallest regions # region label is stored in the 4th value of each pixel [r,g,b,(region)] img = _generate_segments(im_orig, scale, sigma, min_size) if img is None: return None, {} imsize = img.shape[0] * img.shape[1] R = _extract_regions(img) # extract neighbouring information neighbours = _extract_neighbours(R) # calculate initial similarities S = {} for (ai, ar), (bi, br) in neighbours: S[(ai, bi)] = _calc_sim(ar, br, imsize) # hierarchal search while S != {}: # get highest similarity # i, j = sorted(S.items(), cmp=lambda a, b: cmp(a[1], b[1]))[-1][0] i, j = sorted(list(S.items()), key = lambda a: a[1])[-1][0] # merge corresponding regions t = max(R.keys()) + 1.0 R[t] = _merge_regions(R[i], R[j]) # mark similarities for regions to be removed key_to_delete = [] for k, v in S.items(): if (i in k) or (j in k): key_to_delete.append(k) # remove old similarities of related regions for k in key_to_delete: del S[k] # calculate similarity set with the new region for k in filter(lambda a: a != (i, j), key_to_delete): n = k[1] if k[0] in (i, j) else k[0] S[(t, n)] = _calc_sim(R[t], R[n], imsize) regions = [] for k, r in R.items(): regions.append({ 'rect': ( r['min_x'], r['min_y'], r['max_x'] - r['min_x'], r['max_y'] - r['min_y']), 'size': r['size'], 'labels': r['labels'] }) return img, regions