MingjaLee's Blog

菜鸟进步曲 之 第一重奏


Distinctive Image Featuresfrom Scale-Invariant Keypoints

本文主要是对Lowe SIFT论文的提炼,标注自己阅读论文时需要重点理解的知识点,以备日后回顾时,无需从头看论文。(仅供他人参考)

1. Introduction

  • Scale-space extrema detection:
  • Keypoint localization
  • Orientation assignment
  • Keypoint descriptor


3. Detection of scale-space extrema

Detecting locations that areinvariant to scale change of the image can be accomplished by searching for stable featuresacross all possible scales, using a continuous function of scale known as scale space (Witkin,1983).

  • 构建尺度空间
  • LoG近似DoG找到关键点<检测DOG尺度空间极值点> Figure 2

3.1 Local extrema detection

In order to detect the local maxima and minima of D(x, y, σ), each sample point is comparedto its eight neighbors in the current image and nine neighbors in the scale above and below(see Figure 2). It is selected only if it is larger than all of these neighbors or smaller than allof them. The cost of this check is reasonably low due to the fact that most sample points willbe eliminated following the first few checks.

Figure 3

3.2 Frequency of sampling in scale

​ To summarize, these experiments show that the scale-space difference-of-Gaussian func-tion has a large number of extrema and that it would be very expensive to detect them all.Fortunately, we can detect the most stable and useful subset even with a coarse sampling of scales.

3.3 Frequency of sampling in the spatial domain

Figure 4

Just as we determined the frequency of sampling per octave of scale space, so we must de-termine the frequency of sampling in the image domain relative to the scale of smoothing.Given that extrema can be arbitrarily close together, there will be a similar trade-off betweensampling frequency and rate of detection. Figure 4 shows an experimental determination ofthe amount of prior smoothing, σ, that is applied to each image level before building thescale space representation for an octave.

Of course, if we pre-smooth the image before extrema detection, we are effectively dis-carding the highest spatial frequencies. Therefore, to make full use of the input, the imagecan be expanded to create more sample points than were present in the original. We double the size of the input image using linear interpolation prior to building the first level ofthe pyramid.

4. Accurate keypoint localization

Once a keypoint candidate has been found by comparing a pixel to its neighbors, the nextstep is to perform a detailed fit to the nearby data for location, scale, and ratio of principalcurvatures. This information allows points to be rejected that have low contrast (and aretherefore sensitive to noise) or are poorly localized along an edge.

4.1 Eliminating edge responses

For stability, it is not sufficient to reject keypoints with low contrast. The difference-of-Gaussian function will have a strong response along edges, even if the location along theedge is poorly determined and therefore unstable to small amounts of noise.

5. Orientation assignment

By assigning a consistent orientation to each keypoint based on local image properties, the keypoint descriptor can be represented relative to this orientation and therefore achieve in-variance to image rotation. This approach contrasts with the orientation invariant descriptorsof Schmid and Mohr (1997), in which each image property is based on a rotationally invariant measure. The disadvantage of that approach is that it limits the descriptors that can be usedand discards image information by not requiring all measures to be based on a consistentrotation.

​ Peaks in the orientation histogram correspond to dominant directions of local gradients.The highest peak in the histogram is detected, and then any other local peak that is within 80% of the highest peak is used to also create a keypoint with that orientation. Therefore, forlocations with multiple peaks of similar magnitude, there will be multiple keypoints created atthe same location and scale but different orientations. Only about 15% of points are assignedmultiple orientations, but these contribute significantly to the stability of matching. Finally, aparabola is fit to the 3 histogram values closest to each peak to interpolate the peak positionfor better accuracy.

Figure 10

Figure 12

6. The local image descriptor - 给特征点赋值一个128维方向参数

Figure 13

Figure 15

6.1 Descriptor representation

Figure 7

7. Application to object recognition

7.1 Keypoint matching

Figure 11

7.2 Efficient nearest neighbor indexing

No algorithms are known that can identify the exact nearest neighbors of points in high di-mensional spaces that are any more efficient than exhaustive search. Our keypoint descriptorhas a 128-dimensional feature vector, and the best algorithms, such as the k-d tree (Friedmanet al., 1977) provide no speedup over exhaustive search for more than about 10 dimensionalspaces. Therefore, we have used an approximate algorithm, called the Best-Bin-First (BBF) algorithm (Beis and Lowe, 1997).

7.3 Clustering with the Hough transform

To maximize the performance of object recognition for small or highly occluded objects, wewish to identify objects with the fewest possible number of feature matches. We have foundthat reliable recognition is possible with as few as 3 features .


版权声明:本文为博主原创文章,转载请注明出处。 https://blog.csdn.net/lijiang1991/article/details/50856251
文章标签: SIFT OpenCV
个人分类: Computer Vision Paper
想对作者说点什么? 我来说一句


2012年11月23日 872KB 下载


2013年08月26日 4.6MB 下载


2008年12月10日 1.43MB 下载