【转】[机器视觉]feature detection and matching(I)

最新推荐文章于 2024-07-02 23:50:51 发布

byliut

最新推荐文章于 2024-07-02 23:50:51 发布

阅读量307

点赞数

文章标签：计算机视觉 c++

原文链接：https://blog.sciencenet.cn/blog-942948-698828.html

版权

Feature detection and matching

1. Points and Patches(Feature detectors, Feature descriptors, Feature matching, Feature tracking, Application:performance driven animation). 2. Edges(Edge detection, Edge linking, Application: Edge editing and enhancement) 3. Lines(Successive approximation, Hough transform, Vanishing points, Application: Retangle detection).

Introduction

一系列的的feature detectors和descriptor可以用来analyze，describe and match images。这包括point-like interest operators，region like interest operators，edges operators，straight lines。Feature detection和matching是许多计算机视觉应用上的essential component。对于有的image pair，我们想align两幅图像，使他们无缝的stich into一个composite mosaic。对于另外一些image pair，我们想要建立a set of correspondence，这样3dmodel可以被构建，或者产生in-between view。在这两种case下，不同的feature被detect和match，从而实现alignment或者产生一组correspondence。第一种feature是图像上的特定的location（第一种是keypoint feature)，比如mountain peaks，building corners，doorways，或者shaped patches of snow。（这种localized feature通常被称为keypoint feature或者interest points，通常用point location附近的patches来描述。）另外一种重要的特征是edges(第二种是edges features)，比如a profile of mountain。这种feature可以基于orientation和local appearance（edge profile就是一种）来匹配，这也是一个object boundaries和occlusion events的indicators。Edge可以分为longer curves和straight line segments，这可以直接用于matched或者analyzed来发现vanishing points。这一章我们重在描述一些practical approaches，来检测这些features并讨论如果在不同的images间建立feature correspondences。Keypoint feature在一系列的application中使用，Edges和Line提供的信息适用于描述object boundaries和man-made objects。

Point和Patches

Point feature可以用来发现两幅图像上的a sparse set of 对应点位置。这通常作为pre-cursor来计算camera pose。而只有计算得出了prerequisite，才能利用stereo matching来计算 a denser set correspondence。这种correspondences可以用来align不同的images，特别是用于stitch image mosaics或者perform video stabilization。他们也广泛的应用于object instance和category recognition（Correspondance的两种用处，一是align image，另外一个是 object recognition）。Keypoint的关键优势是他们在clutter(occlusion)的环境中，存在large scale changes和orientation changes的情况下，仍旧可以允许matching。（这一点儿确实很强大）Feature-based correspondence已经自从早期的stereo matching开始被广泛地使用，最近在image stitching和automated 3d modelling上拥有极强的popularity。主要有两种主要的approaches来发现feature points和对应的correspondences。第一种是使用local search technique可以被accurately tracked的features，比如correlation或者least squares这种技术。第二种是独立地detect features，然后基于local appearance来match feature。前者的approach在nearby viewpoint或者rapid succession的场景下更加suitable（例如video sequences)，而后者是在a large amount of motion或appearance change的场景下更加suitable，例如stitching together panoramas，在宽baseline的情况下建立correspondences，或者perform object recognition。Keypoint detection和match pipeline可以分为四个stages。第一个是feature detection的stage（第一步是特征检测），每个image都会寻找所有可能match well的location。第二步是feature description阶段（第二步是特征描述），检测到的keypoint location附近的region都转化为更compact和stable的descriptor。第三步是feature matching阶段（第三步是特征匹配），在另一幅图像中search可能的candidate。第四步是feature tracking stage阶段（第四步是特征跟踪），这是第三阶段的alternative，适用于只在detected feature旁边的small neighborhood内进行search。一个很好的例子可以证明这些stages，就是David Lowe的Scale Invariant Feature Transform（SIFT）。

Feature Detectors

为了稳定发现correspondence的image location，哪些feature可以被track呢？Textureless patches基本不太可能去localize。具有大的gradient的Patches会更容易去localize，尽管straight line segments在单个方向上不能克服aperture problem。这只在沿着edge方向上align patches时保持有效。Patches gradient有两个不同的orientation最容易去localize。这种intuition以通过观察一个最简单的possible matching criterion来实现，比如summed square difference。（方向不变性通过SSD来体现）当进行feature detection, 我们并不知道哪个image location是该特征所匹配的。因此，我们只能计算该metric在小的位移变化上是如何的stable。(这就被称为auto-correlation function或者surface)。注意textured flower bed的auto-correlation surface，和右图的red cross上的auto-correlation surface呈现了一个strong minimum。对应于roof edge的correlation surface在一个方向上有strong ambiguity，而cloud region上对应的correlation surface没有stable minimum。经典的“Harris” detector使用[-2，-1，0，1，2] filter，但是更多的modern variants在水平的和垂直的梯度上进行Gaussian Convolution。The auto-correlation matrix A 中把weighted summation替换为discrete convolution。该矩阵可以interpreted作为tensor image，而梯度的outer product与weighting function进行卷积提供了一个per-pixel的estimation。

Harris Corner

Anandan，Lucas和Kanade是第一个分析auto-correlation matrix的uncertainty structure，他们结合uncertainties和optic flow measurements来实现。Haris是第一个propose在rotationally invariant scalar measurement上使用local maxima。所有的这些technique提出了Gaussian Weighting Window 而不是之前使用的square patches，这让detector很强烈地应对 in-plane image rotation。最小的eigenvalue不是唯一的用于发现keypoints的quantity。不像eigenvalue analysis，这个quantity不需要使用square roots，但是仍旧是rotationally invariant，并且也是downweights edge-like feature。基本的feature detection algorithm的步骤是：第一是通过原始图像进行Gaussian卷积，来计算图像的horizontal和vertical derivative。第二是计算这三幅image的outer product。第三是Convole这些图像with larger Gaussian。第四是使用one of the formulas，来计算a scalar interest measure。第五是发现超过一个特定threshold的local maxima，report them作为detected feature point location。Auto-correlation based keypoint detector可以summariz为以上四步。Interest operator对应于经典的Harris detector和Difference of Gaussian detector(DOG)以下讨论。

Adaptive Non-maximal suppression(ANMS)和Measuring Repeatability

当大多数feature detector简单的在兴趣function寻找local maxima。这可以lead to一个具有uneven distribution的feature points，比如在higher contrast的region，points会更加denser。为了migate这个问题，Brown只检测那些local maxima的feature和那些response values超过10%的高于其邻近范围。考虑a large number of feature detectors已经在计算机视觉上广泛使用，如何决定哪个detector要被使用。Schmid是第一个提出可以measuring feature detector的repeatability。这定义一幅图像中的检测到的keypoint与另外一副图像中的对应位置有不超过1.5像素点的频率。另外一个论文中，应用rotation，scale changes，illumination changes，viewpoint changes，并且加上噪声，转换planar image。同时在每个detected feature point上measure信息content。这被定义为一组rotational invariant local grayscale的描述子。

Scale Invariance

在finest stable scale下detecting feature是不合适的。举个例子，当match一些little frequency detail的image，fine scale feature并不存在。对这个问题的solution是在a variety of scales上extract feature。比如在金字塔形的多分辨率上perform同样的操作，并且在同一level上match feature。这种方法适应于图像没有undergo大尺度的改变（没有解决问题，不能经历和忍受）。该方案不能匹配从airplane获取的successive aerial images，也不能利用fixed-focal-length camera来stiching panorama。但是，对于大多数object recognition application，图像中的object scale是未知的。不同于在different scale上extract feature，然后match所有的他们，更有效地方式是extract位置和尺度都稳定的feature。早期的在scale selection上的investigation由Lindeberg来提出。他第一个提出在Laplacian of Gaussian (LoG) function上使用extrema。当Lowe的SIFT（SIFT的全称是Scale Invariant Feature Transform）在实践中运行良好，SIFT并没有基于一些maximum spatial stability的理论基础。事实上，该方法的location dector和基于auto-correlation的detector互补的。因此这两种方法常常结合在一起。(are complementary to是互补的意思，而in conjunction with是合作的意思，)。为了增加Harris Corner detector的scale selection mechanism， Mikolajczyk在每个检测到的Harris point，评估Laplacian of Gaussian function，只保留那些Laplacian是extremal的情况(高于或者低于both coarser和finer值）。一个optional iterative refinement被提出和评估。

Rotational Invariance and orientation estimation

除了解决 sclae changes，绝大多数的Image matching和Object recognition算法需要处理in-plane图像旋转。解决的方法是设计rotationally invariant的描述子。但是这种描述子有很差的discriminability，就是说他们map不同的Patches到一个相同的descriptor。一个更好的方法是在每个detected keypoint估计dominant orientation。一旦local orientation和keypoint scale被估计，在detected point周围的scaled和oriented patch可以提取并作为feature descriptor。最简单的orientation estimate就是一个keypoint周围区域内的average gradient。如果使用Gaussian Weighting function，这种average gradient等同于first -order steerable filter。图像与Gaussian filter的horizontal和vertical derivative进行卷积。为了让这个estimate更加reliable，通常使用larger aggregation window而不是detection window。但是，有些时候，averaged gradient可能会很小，因此是对方向上的unreliable indicator。另外一种reliable technique是观察keypoint附近区域histogram of orientation。Lowe计算一种36bin的edge orientation的histogram，通过gradient magnitude和Gaussian distance来量测，最后计算出更准确的orientation estimate。

Affine Invariance

当scale和rotation invariance是非常渴望的时候(desirable，值得追求的意思)，对于另外一些application，比如wide baseline stereo matching或者location recognition，完全的affine invariance是prefered。Affine invariant detector不仅可以应对scale和orientation change，也可以应对affine deformation，比如perspective foreshortening。事实上，对于任意小的patch，任何continuous image warping（弯曲、变形的意思）都可以近似为affine deformation。为了产生affine invariance，很多不同的authors已经提出fitting ellipse到auto-correlation或者Hessian matrix，然后使用principal axes作为affine coordinate frame。另外一个重要的affine invariant region detector是maximally stable extremal region(MSER) detector。为了检测MSER，二进制的region通过对图像thresholding来计算。这种operation可以有效地通过gray value来sorting所有的pixel，然后incrementally添加pixel到每个connected component。当threshold改变时候，每个component的area都在观察之列，change rate of area用来定义为maximally stable。这样的region因此invariant to geometric 和 photometric transformation。如果需求，一个affine coordinate frame可以fit to每个detected region。这个feature point detector的研究领域仍旧非常active，每年的主要computer vision conference都会出现。Xiao和Shah给出了一系列的流行的affine region detector的survey，实验性地比较了common image transformation的invariance。这些transformation包括scaling，rotation，noise和blur。当然，keypoints并不是唯一的feature可以用来register images。Zoghlami使用line segments和point-like feature来estimate homography。而Bartoli使用line segment以及local correspondence来extract 3d structure and motion。Van Gool使用affine invariant region来检测correspondences。Corso使用related technique来fit 2d oriented Gaussian Kernel到homogeneous regions。

Feature Descriptor

检测feature(keypoint)之后，我们必须match这些特征点。我们必须决定哪些特征点来自不同图像的对应位置。在一些situation，比如video sequence或者已经被rectified的stereo pair，每个feature point周围的local motion绝大多数是translational。这些case中，simple error metric，比如the sum of squared difference(SSD)或者normalized cross-correlation，可以用来比较每个特征点周围的small patches的intensities。由于feature point并没有精确的located，一个更准确的matching score可以通过incremental motion refinement来进行计算。但这是非常的time consuming，并且有时甚至decrease performance。在大多数情况下，特征的local appearance会在orientation和scale上改变，甚至有时候undergo（这是一个好词儿） affine deformation。Extracting一个local scale，orientation，或者affine frame estimate，然后在形成feature descriptor之前resample the patch是通常preferable的。尽管compensate这些改变，images patches的local appearance通常仍旧会随着图像而变化。如何让image descriptor更加对这些改变 invariant，而仍旧在不同的patches上preserve discriminability。Mikolajczyk综述了一些最近developed view-invariant local image descriptor并且实验性的比较了他们的性能。接下来，我们describe少量的描述子in more detail。

Bias and gain normalization(MOPS)

对于并没有exhibit大量的foreshortening的Tasks，例如image stitching，简单归一化的intensity patches表现良好并且易于实现。为了compensate在feature point detector上的轻度的inaccuracy，这些multi-scale oriented patches(MOPS) 使用coarser level的image pyramid来避免aliasing。为了对affine photometric variation进行compensate，patch intensities被re-scaled，以至他们的mean是0，而他们的variance是1。

Scale Invariant Feature Transform（SIFT）

SIFT feature选择Gaussian pyramid的合适level。通过计算16*16 窗里的每个像素的gradient来形成。梯度的幅度通过Gaussian fall-off function来进行downweight，这样可以减少远离中心的gradient的影响。对每个4*4的quandrant（四分之一的象限），每个gradient orientation histogram都通过增加weighted gradient value到八个orientation histogram bin之一。为了减少location和主要的orientation misestimation，每个weighted gradient magnitude增加了2*2*2 histogram bin。产生的128 non-negative value形成了SIFT descriptor的raw version。为了减少contrast或者gain的效果，128维的vector被归一化为单位长度。为了让descriptor更加鲁棒的应对photometric variation，所有的value被修剪到 0.2，产生的vector再一次归一化为单位矢量。