Lost in quantization: improving particular object retrieval in large scale image databases

最新推荐文章于 2023-10-06 18:22:00 发布

z609425072

最新推荐文章于 2023-10-06 18:22:00 发布

阅读量604

点赞数

文章标签： image search 图片

目标检索与图像检索的区别在于，给定一副查询图像，里面包含指定的目标，我们需要返回包含这些目标的检索结果。而目标由于带有光照视点等等不确定性，使得这个问题较整张图片的检索更加困难。

目前的检索的框架都是基于BOW模型的，也就是将一些高维的局部描述子映射为离散的vocabulary。这篇文章的创新点在于将这个映射为a weight set of words。原始的BOW模型可能造成较低的recall，query expansion可以解决这个问题，但是它只有在initial recall足够高的情况下才work，在初始情况很低的情况下效果并不好。那么作者提出的soft-assignment策略能够解决这个问题。

回顾一下state of the art（2008）

image description

仿射不变区域：MSER、Hessian region

描述子：SIFT

quantization:

k-means, HKM, AKM.

search engine：

tf-idf

spatial verification：

affine homograogies

query expansion：

average query expansion

这些方法就是这篇文章的baseline。那么什么是soft-assignment？

我们知道很多因素可以影响描述子的取值，例如光照、图像噪声。这就可能导致将近似的描述子映射成为不同的visual word。hard -assignment策略的错误就在于我们试图建立一个不变的框架系统。

具体有两种做法，一种是：

1.we extract a single descriptor from each image patch and assign it to several visual words nearby in the descriptor space

2. we extract a set of descriptors from each image patch by synthesizing deformations of the patch in the image space and assign each descriptor to the nearest visual word。

对于第一种方法，这种weighted combination（weight 采用的是exp（-d^2/Sigma^2））是基于描述子与cluster center的距离。(这里有两个参数，尺度参数sigma ，以及最近的r个cell)

我们知道，标准的index是记录visual word 出现的次数，那么如何与我们的weighted combination of visual word （ti-idf是整数）结合呢？对于tf，我们采用的是标准的tf，对于idf，我们不论weight多小，都给visual word记1时效果最好。
tentative correspondence（defined as the set of features which share at
least one visual word assignment）数量大大增加，但是由于vocabulary的特殊和large，实际上不相关的feature映射成为同一个visual word 的概率仍然很小。

这个集合仍然需要进行类似于RANSAC的验证，这需要一个score函数，使得可以利用weight而不是单纯的计数。（我也不懂这段话的意思，好像是需要做一个spatial re-ranking）

第二种方法没看懂。。。

剩下是一些实验结果。