[image retrieval] surf and comparison with sift

最新推荐文章于 2020-09-22 14:03:46 发布

XXH2015

最新推荐文章于 2020-09-22 14:03:46 发布

阅读量690

点赞数

original paper of surf:
“SURF (Speeded Up Robust Features)isa robust local feature detector, first presented by　Herbert Bay et al. in 2006, that can be used in computer vision tasks likeobject recognition or 3D reconstruction. It is partly inspired by the SIFT descriptor.The standard version of SURF is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT. SURF is based on sums of2D Haar wavelet responses and makes an efficient use ofintegral images.It uses an integer approximation to the determinant of Hessian blob detector, which can be computed extremely quickly with an integral image (3 integer operations). For features, it uses the sum of　the Haar wavelet response around the point of interest. Again, these can be computed with the aid of the integral image”.

integral image and box filter 1st order box filter:

DLx $D_{x}^L$ ,

DLy $D_{y}^L$ for edge of the image

L is the size of the box, if L =2 then it is 5*5 matrix

D 1 x = ⎡ ⎣ ⎢ 10 - 1 10 - 1 10 - 1 ⎤ ⎦ ⎥

$D_{x}^1=\begin{bmatrix}1 0 -1\\1 0 -1\\1 0 -1\end{bmatrix}$

1st order box filter:

DLxx $D_{xx}^L$ ,

DLyy $D_{yy}^L$ (for line in x and y direction),

DLxy $D_{xy}^L$ (for line in y=x direction )

D 1 x x = ⎡ ⎣ ⎢ 1 - 21 1 - 21 1 - 21 ⎤ ⎦ ⎥

$D_{xx}^1=\begin{bmatrix}1 -2 1\\1 -2 1\\1 -2 1\end{bmatrix}$

D 1 x y = ⎡ ⎣ ⎢ - 101 000 10 - 1 ⎤ ⎦ ⎥

$D_{xy}^1=\begin{bmatrix}-1 0 1\\0 0 0 \\1 0 -1\end{bmatrix}$

3 comparison with Gaussian scale space representation

3.1 scale space representation

linear scale space – Gaussian scale space, 2D gaussian kernel is isotropic and seperable.

σ $\sigma$ :=scale parameter

Discrete scale space: use boxfilter technique, the Gaussian smooth can be approximate discretely - by a box filter with size L

size L is important parameter

box square domain

Γ=[−γ,γ]∗[γ,γ] $\Gamma=[-\gamma,\gamma]*[\gamma,\gamma]$ and

BΓ $B_{\Gamma}$ define the box,
then the discrete convolution become:

u γ : = 1 ( 2 γ + 1 ) 2 B Γ * u

$u_{\gamma}:=\frac{1}{(2\gamma+1)^2}B_{\Gamma}*u$

3.2 Box-space sampling
the box length representation is divided into octaves, where a new octave is created fir every double of the kernel size

surf 尺度空间与sift 尺度空间

for sift

输入图像函数反复与高斯函数的核卷积并反复对其进行二次抽样，但每层图像依赖于原图像（当前尺度可能和原图象尺度相差很大，这时候再那原图像卷积确实有点二了），并且图像需要重设尺寸

在sift算法中，同一个octave层中的distance of pixels图片尺寸(即大小)相同，但是scale尺度(即模糊程度)不同，而不同的octave层中的图片尺寸大小也不相同，因为它是由上一层图片降采样得到的。在进行高斯模糊时，sift的高斯模板大小(size of the kernel)是始终不变的，只是在不同的octave之间改变图片的大小。

for surf

surf 改变矩形滤波器的大小而非图像的大小得到尺度空间

SURF算法采用Hessian矩阵求出极值后，在3× 3× 3的立体邻域内进行非极大值抑制，只有比上下尺度各9个及本尺度周围的8个共计26个邻域值都大或都小的极值点，才能作为候选特征点，然后在尺度空间和图像空间中进行插值运算，得到稳定的特征点位置。

滤波是把图像转到频域的说法,因为只有在频域才可能提到波。根据傅里叶变换的性质,时空域存在这样的对应关系:一个域下的乘积等于另一个域下的卷积。因此，频域滤波就等同于空间域做卷积，所以才用高斯核模板去跟原图做卷积,就相当于滤波。

SURF算法对积分图像进行操作，卷积只和前一幅图像有关，其降采样的方法是申请增加图像核的尺寸，这也是SIFT算法与SURF算法在使用金字塔原理方面的不同。SURF算法允许尺度空间多层图像同时被处理，不需对图像进行二次抽样，从而提高算法性能。

在surf中，图片的大小是一直不变的，不同的octave层得到的待检测图片是 改变高斯模糊(scale)尺寸大小得到的，当然了，同一个octave中个的图片用到的 高斯模板尺度(size of kernels)也不同。Surf采用这种方法节省了降采样过程，其处理速度自然也就提上去了。

关于surf算法的总结：http://www.cnblogs.com/tornadomeet/archive/2012/08/17/2644903.html

利用非极大值抑制初步确定特征点
if $Det(H_{approx})>0\rightarrow 特征值 have the same sigh\rightarrow 归类为极值点$
精确定位极值点
　　这里也和sift算法中的类似，采用3维线性插值法得到亚像素级的特征点，同时也去掉那些值小于一定阈值的点。
选取特征点的主方向。
　　这一步与sift也大有不同。Sift选取特征点主方向是采用在特征点领域内统计其梯度直方图，取直方图bin值最大的以及超过最大bin值80%的那些方向做为特征点的主方向。而在surf中，不统计其梯度直方图，而是统计特征点领域内的harr小波特征。即在特征点的领域(比如说，半径为6s的圆内，s为该点所在的尺度)内，统计60度扇形内所有点的水平haar小波特征和垂直haar小波特征总和，haar小波的尺寸变长为4s，这样一个扇形得到了一个值。然后60度扇形以一定间隔进行旋转，最后将最大值那个扇形的方向作为该特征点的主方向。该过程的示意图如下：
　　
构造surf特征点描述算子
　在sift中，是在特征点周围取16*16的邻域，并把该领域化为4*4个的小区域，每个小区域统计8个方向梯度，最后得到4*4*8=128维的向量，该向量作为该点的sift描述子。
　在surf中，也是在特征点周围取一个正方形框，框的边长为20s(s是所检测到该特征点所在的尺度)。该框带方向，方向当然就是第4步检测出来的主方向了。然后把该框分为16个子区域，每个子区域统计25个像素的水平方向和垂直方向的haar小波特征，这里的水平和垂直方向都是相对主方向而言的。该haar小波特征为水平方向值之和，水平方向绝对值之和，垂直方向之和，垂直方向绝对值之和。该过程的示意图如下所示：
　

summary of [an analysis of the surf method]

: 3.3.3 Scale normalization
< differential operators have to be normalized when applied in linear scale spcae in order to achieve scale invariance detection of local features
< 图像局部不变性特征和描述中chapter 2.2 中有提到的规范化高斯导数..
< 具体方法：normalize 2nd derivative operation by multiply with $\sigma^2\rightarrow$ DOH operator and laplacian operator 都要乘 $\sigma^2\rightarrow$ box filter也要normalize …又扯到了Frobenius..结果是：box filters responses should be divided by scale parameter L to achieve scale invariance detection
< sift 中没有这个问题吗

interest point detection

4.1 feature filtering
和DOH卷积，DOH的形状是中间白四周黑；见paper figure 12，所以强响应值代表它是一个blob。
surf 中response: =

D O H L (u) : = 1 L 4 D e t (H a p p r o x) = 1 L 4 D L x x u D L Y Y u - (0.9 D L x y u) 2

$DOH^L(u): =\frac{1}{L^4}Det(H_{approx})=\frac{1}{L^4}D_{xx}^LuD_{YY}^Lu-(0.9D_{xy}^Lu)^2$
computation using integral image
note that not every pixel at large scale is tested - reduce conputation time -?

4.2 feature detection
找到extrema of 26 neighborhood for the response value - same as sift

除了DOH(normalize 后的)作为response value 还有 normalized Laplacian detector, scale invariant Harris cornoer detector , affine invariant blob detector

4.3 scale space location refinement

s1 taylor expansion -> diecrete -> continue , optimazation of 3 parameter function x,y,L(scale variable) - get the refined point

s2 it is possible that the refined point does not belong to the neighborhood of X , if so then discard it.

—— compare with sift?? low contrast+edge + where is taylor????

5 interest point description

for comparison, we need to encode a local descriptor for the neighborhood of each interest point.

the descriptor representing the interest point is expected to be 1.geometric invariant + 2.robust to various perturbations such as noise, illumination or contrast change.

for 1, scale and translation invariance is offered by scale space analysis(normalization + ?). as for rotational invariance, need

5.1 scale of interest point:= scale of the corresponding linear scale space $\sigma_k=\lfloor0.4L_k\rceil$ ,L is the scale parameter of the box-space

5.2 dominant orientation of a interest point
consider the neighborhood: = the disk of radius =

6σ $6\sigma$ with center at the point
compute the gradient at each pixels at the scale of the interest point

σk $\sigma_k$
the gradient need to be weighted to reduced the impact of remote point. by discrete Gaussian kernel
orientation score function: compute score vector summing all the weighted gradient in the neighborhood
orientation of the interest point:= the global maximum of the orientation score function
sampling strategies: not cal all the weighted gradient, samplied with a step parameter equal to the scale of the interest point

σk $\sigma_k$

:5.3 surf descriptors

sift	surf
	16*4 vector
build histogram for gradient statistic,weighted by gradient mag	compute 1st order stat on x and y direction gradient response for compromising compactness and efficiency
	scale normalized sampling - sample at a step of $\sigma_k$ ;change of coordinates(规范化坐标in chapter2 of the book?);Gradient normalization;Gradient sta…sigh of laplacian也算出来说是可以provide some infor about the local contrast on the interest point–accelerate the matching

5.4 feature matching
for surf: 64 demensional vector
feature comparison

for any query descriptor Xk in 1st image–> cal the dis with all the candidate feature Yl from the 2nd image
for surf, to simplify the cal. use sigh of laplacian to judge whether they are similar first -> differ sigh, not likely same
other acceleration of the cal: approximate Nearest neighbor search , based on the data structure(kd-tree, vector quantification etc)

steps for feature matching
- s1 feature comparison
  - exhausive compare feature vectors in the query image with the feature vectors in the training set by euclidean dis (or other distance)
  - noted that this step can be speed up by first comparing the Laplacian of surf feature(cal-ed parallel)(mentioned by author of surf), discard those in different sigh.
  - besides, there are other way(nearest neighbour search based on data structure such as kd-tree and vector quantization )
- s2 matching criterion
  - extract the most signi correspondences from them - cal the similarity which is defed by distance
  - two tech introduced: Nearest neighbour distance threshold and Nearest neighbour distance ratio(proposed by Lowe, outperformed NNDT)
  - NNDR: consider the ratio between 1st and 2nd nearest neighbor to measure the quality of the similarity of query I and the first candidate
- s3 geo consistency (geo verification)
  - reason: image registration(geo deformation); object detection and recognition
  - method: ORSA algo(combine RANSAC + hypothesis testing)
  - general idea: try to find a fundamental matrix to represent the geo transformation between the query image and the matching image, during the fitting steps, it will generate inliers and the error, based on which we can rerank the image.( it may combine with the similarity value and weighted by the confidence)

XXH2015

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[image retrieval] surf and comparison with sift

original paper of surf: “SURF (Speeded Up Robust Features)isa robust local feature detector, first presented by　Herbert Bay et al. in 2006, that can be used in computer vision tasks likeobject rec
复制链接

扫一扫