图像检索系统技术路线Pipeline of our image retrieval system

Pipeline of our image retrieval system

Motivation

At present, most image retrieval systems uses bag of word model to get raw matching candidates and re-rank the results by enforcing spatial verification(like affine transform constraint). The accuracy is determined by the characteristics of BOW while the retrieval efficiency is influenced by the complexity of estimating the geometric parameters between two sets of image features as well as the length of the rank list. To ensure higher retrieval precision, the rank list should be long, leading to lower performance. Is there any possibility that both high precision and performance can be achieved simultaneously? Our system is designed to answer this question. Unlike existing retrieval schemes, our system attempts to explore the relationship between images in the database. That is, we want to know which images are containing common objects or parts of scene. On the basis of the assumption that we obtain a complete and clear picture of the relationship graph for the entire database, the task of retrieval now is to find certain graph nodes which are highly related to the query image(satisfying spatial constraint) and select the remaining positive matching through graph traversal. Since the graph can be established offline in advance, the feature point correspondence between any two images is known. Spatial verification could be performed more efficiently. Meanwhile, the retrieval precision is high since only relevant images are considered. Our system is implemented by the following steps.


Feature Extraction
We extract SIFT features for each image in the database using the binary code provided by Lowe. Our image database is Oxford building collection, which contains 5 thousand high resolution color images. We don't use affine region detector because we found it is not stable in the experiments. All images are resized to half of the original size for the sake of reducing the number of features of single image. This can facilitate the feature matching process carried in the next step.

Link two images
As our goal is to build up a graph over all the image database, the basic element is finding out the relationship between two images. This is done by establishing correspondence between SIFT features of image pairs. For each image, we do a normal image retrieval in the database and several image candidates are returned. The validity of correspondence is verified by enforcing geometric constraints across two views, like affine transform and fundamental matrix. At the first stage, coarse correspondence is found by Lowe's distance-ratio criteria. Lots of the correspondences are incorrect due to variations in image quality, illumination and viewpoint. At the second stage, RANdom SAmple Consensus(RANSAC) is applied to filter the false matchings. We consider fundamental matrix and affine transform and apply RANSAC to estimate their parameters respectively. The experimental results reveals that affine transform works better than fundamental matrix with small amount of iterations. This is attributed to the fact the minimum number of inliers required by fundamental matrix is larger than affine transform. Desipte the effectiveness of RANSAC, false matchings might be still present, leading to linking two irrevalent images when all matchings are false while satisfying spatial constraint. Such case can be detected by considering bi-directional matching. Namely, treat each member of the image pair as the query instance and find their correspondences in the other which serves as reference. However, it would double the computational time of linking process . Instead, we proposed a simple yet effective spatial verification rule named Intersection Counting Rule(ICR) to further eliminate the false matchings surviving RANSAC. This idea is illustrated by the following figure(not available now). Correspondences passing RANSAC and ICR are the final results. For image pairs the number of correspondences of which are above specified threshold, their matching information is recorded, providing fundation for building the graph.

Obtain matching candidate
The most straightforward way to build the graph for image database is matching every two images. It is feasible for small database but simply impossible when the database scales up to millions of images. In this scenario, it is necessary to narrow down the matching candidates to a reasonably small set. BOW model is adopted in our system. We first build a visual vocabulary tree using Hierarchical k-means and for each visual word it has an inverted file data structure. For each image in the database, it is fed into the BOW search engine as query and a ranked resultant image list is returned. Images at the top of the list are more likely to be relevant to the query image. Feature matching is performed down the list until reaches a specified candidate number. It is quite possible that some of the relevant image pairs might be missed. This could however be compensated during the graph refining phase.

Build up Graph

In the graph notation, each image is represented as vertex and edges occur between relevant images. The graph is undirected. Each edge is annotated with the matching information produced at previous steps. 

Refine Graph

By far, the graph is incomplete since some relevant image pairs might be missed by BOW model. These missing edges can be added by performing matching in the neighboring vertices(images) of each vertice.  This process can be illustrated by the Figure 1. Consider vertex a and its neighbors b,c and d. The solid edge are produced by BOW and dashed edges are potentially present and can be verified by pairwise matching.  As illustrated by Figure 2, all missing edges could be found by iterating until none edges is added.

Figure 1. Link the neighbor vertices of single verticeFigure 2. Adding links for the entire graph iteratively. Dashed lines with identical color are discovered in the same iteration.




Retrieval

The search engine consists of two components: BOW model and image graph. The input to the engine is a query image and interesting regions specified by user. The objective of retrieval is to return a list of images containing the query regions. Our system accomplishes retrieval task in two steps. Firstly, it extracts SIFT features inside the query regions. Each feature is quantized to the nearest visual word in the vocabulary tree(soft assignment might be also used so that being quantized to several words). As is mentioned previously each word is accompanied with a inverted file structure which contains the image indices, the inverted file entries of quantized words are unioned, resulting in a list of images ranked in descending order of their counting references. The SIFT features inside the query region are then matched with the images at the top of the rank list with spatial verification enabled. This process is identical to image linking step. The matching results pairs of feature correspondences and an affine transform matrix. The query region is a polygon(simply a rectangle in our system) and its corresponding region in the matching image is obtained by affine warping. The warped region intersects with the matching image region and the area of the intersection region is computed. The quality of the matching is assessed in terms of the ratio of the intersection area to the matching image area and the  degree of affine deformation(not available now). The quality can be quantitatively scored.  The matching process could be completed very fast because it's highly possible to find a desirable matching candidate after examing several candidates.  Secondly, the candiate selected in previous step is regarded as a start point, from which other relevant images are retrieved.  In our implementation, the images accessible by the start point(there is at least one path linking two vertices) are explored in breadth-first-search(BFS) fashion.  An relevant image might be reached in multiple paths and we are interested in those which contain image regions corresponding to the query region at each path vertex. In other words, the query region can be propagated uninterruptedly along the path until reaching the destination.  Given two images and their feature correspondences, the query region can be warped to the other image by the affine transformation estimated by the feature correspondences confined in the query region, as illustrated by Figure 3.  Propagation along the path is illustrated in Figure 4. When two images sharing common scene region, it is possible that their feature points cannot be directly matched due to large variation in imaging. However, the common region can be linked through some intermedian images, as the blue path  in Figure 4  indicates. Several example propagation paths are shown in Figure 5.

Figure 3. Estimated affine transformation and the warped regionFigure 4. The propagation of the query region. Dash lines indicate feature correspondences are not established.



Figure 5. Several region propagation instances in our system
The procedure of finding relevant images containing the region(s) specified by the query image is summarized as follows.
 
    Algorithm 1. Query region propagation and ranking
 
  1. Traversal images accessible from the start vertex in BFS fashion.
  2. All paths leading to an accessible image are termed path group and they are sorted in ascending order of their lengths.  
  3. For each path group, we attempt to find the shortest one along which the query region can be propagated to the destination image.
  4. A threshold of the propagation length must specified. It is time consuming to compute long propagation paths. In addition,  the probability for long path being eligible is low. In our system, only paths of which lengths are no greater than 5 are computed.
  5. All path groups are processed by step 3 and step 4. The accessible images are scored and ranked. Images to which the query region cannot be propagated are placed at the bottom of the rank list. 

Issues worth further discussion:
  • How to build up the graph within accetable time when the image database is scaled up to millions of images?
  • How to avoid missing edges while buiding the graph?
  • A key concern in retrieval is the performance. The propagation scheme adopted at present is not efficient enough due to involvement of estimating affine transformation between two neighboring images. Is there any alternative?
  • Co-segmentation can possibly make the search finish in a fraction of second. Are there any  fast and accurate approaches?


from: https://users.soe.ucsc.edu/~manazhao/files/graph_ir/index.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值