论文笔记-Person Re-Identification Meets Image Search

最新推荐文章于 2024-07-23 14:40:19 发布

海之崖

最新推荐文章于 2024-07-23 14:40:19 发布

阅读量3.3k

点赞数 3

本文链接：https://blog.csdn.net/zdh2010xyz/article/details/53646804

版权

Person re-ID 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

 
 Person Re-Identification Meets Image Search 

  转载请附原文地址：http://blog.csdn.net/zdh2010xyz/article/details/53646804 

 
 Abstract 

  本文在image search的基础上，将person re-identification问题作为image search问题处理。两点贡献：1）设计了一个unsupervised Bag-ofWords representation，在person re-identification问题上运用image search技术。2）贡献了一个高水平的数据集，采用DPM检测行人（包含一定数量的错误图像--distractor images），但数据集很接近realistic setting。对比依赖于feature-feature match的方法，本文提出提出的方法在速度上提升了两个数量级（two orders of magnitude）。 

 
 1. Introduction 

  Given a probe image(query)，工作：搜索一个gallery（database）图像，找到probe（query）这个人的图像。 

  本文工作出于两方面的考虑。第一，local feature based approaches 被证明在行人再识别任务中是非常有效的。结合query-search方法，基于Bag-of-Words（BoW）模型的图像搜索方法应该是很有潜力的。常规方法，主要是采用强力特征与特征的匹配（brute-force feature-feature matching），存在计算量的问题。在BoW模型上，采用codebook，将local features 映射为visual words。 
 并不是通过visual word histogram weighted（基于TF-IDF模型）表示图像。而是 
 perform exhaustive visual matching among images,in the BoW model， 
 local features are aggregated into(聚合) a 
 global vector.在空间限制方面（In tackling 
  spatial constraints）, a number of 
 geometric-aware visual matching methods are proposed。同时，为了进一步提高搜索的精度，还使用了一些后处理步骤（some post-processing steps）. 

  第二，主要是数据库上的问题。现存的数据库中，行人信息动辄好几百，同时每个行人可能只有两个摄像头拍到。因此queries以及相关图像的数量是有限的。另外，手工框处的pedestrian大都是well-aligned。在实际情况中，采用行人检测器所获得的pedestrian，会出现undergo 
 misalignmentor 
 body part missing的问题。由于遮挡或者复杂背景的影响，行人检测器也会出现yield false alarms。 

  现在机方法都是基于ideal setting ，当环境改变时，其检测准确度将大受影响。因此，很有必要引入更加真实场景的数据集和更加鲁棒的算法。 

  基于以上两点，本文的贡献主要有两点： 
 First，提出了unsupervised BoW representation。从training数据中产生codebook后，每个pedestrian图像表示成一个visual word histogram。这设计步骤：root descriptor、negative evidence、burstines weighting等等。为incorporate geometric constraints，图像被分割成水平的条块。Moreover，multiple queries 集成到one vector，以适应extensive image variations.Finally，一个自动的reranking step来实现refine the initial rank list。 

 
 Second，提出了一个新的person red数据集，Market-1501。有1501个identities，采自6个摄像头，大学校园环境。最大的person reidentification dataset featured by 32643 annotated bounding boxes。有三个优势： 
 DPM detected bounding boxes, 
  the inclusion of distractor images（错误图像）, and 
  multi-query, multi-groundtruth per identity. 

  文章安排如下：第二节回顾相关工作，第三节介绍Market-1501，第四节介绍基于image search的person re-id算法，第五节为实验，结论在第六节。 

 
 2. Related Work 

  有很有监督与无监督的方法，discriminative model中，SVM、RankSVM、boosting等常被采用。以及最新的深度学习。This line of works are beneficial in 
 reducingthe 
 impact of multi-view variations, but require 
 laborious annotation（标准比较难）, especially when new cameras are added in the sytem。在unsupervised models中，Farenzena结合行人的对程性与非对对称性，提出了Symmetry-Driven Accumulation of Local Features (SDALF)。Ma使用Fisher Vector将局部特征编码为global vector。To 
 exploitthe 
  salience information among pedestrian images, Zhao et al. propose to 
 assign higher weight to rare colors, an idea very 
 similar to the Inverse Document Frequency (IDF) in image search。In this scenario, this paper proposes an 
 unsupervised method which requires a minimal amount of labeling or training effort. 

  另一方面，得益于SIFT与BoW模型，image search任务得到了较好的发展。J´egou将binary SIFT 特征嵌入到inverted file中，通过index-level feature fusion between complementary descriptorsre可以fined visual matching。因为Bow模型没有考虑spatial distribution of local features（这也是person re-id的一个问题），另外一个研究方向是model the spatial constraints。 

 
 The geometry-preserving visual phrases(GVP) and 
 the spatial coding 
 methods both calculate the 
 relative position among features, 以及check 
 geometric consistencybetween images by 
 the offset maps. 

  对于ranking 问题，一个有效的reranking step取得了进步。Liu et al. design a “ 
 one shot” feedback 
 optimization scheme which allows a user to quickly refine the search results.在刚性物体搜索中，In rigid object search, 
 RANSACis typically used in post-processing。主要考虑是多个queries的问题。 

 
 3. The Market-1501 Dataset 

 
 3.1. Description 

  共6个摄像头，包括5个1280*1080HD与1个720*576SD，总共32643个框，1501个identities。每个identities都拍进这6个摄像头，标准中，至少保证每个identity在两个摄像头，由此cross-camera search是可以的。在同摄像头下，同一个identity可以有不同的appearance。 

  数据集有以下特点： 
 First，采用DPM检测出bounding boxing，当然有一些misalignment。 
 Second, in addition to the 
 false positive bounding boxes, we also provide 
 false alarm detection results.考虑到大量detected bounding boxes是“bad”，对于每个标记的detected bounding box，有一个hand-drawn groundtruth bounding box。两者重合度超过50%，则标记为good，小于20%则标记为“distractor”，其他的标记为“junk”（意味着该图像对person re-id零）。 
 Third, each identity may have 
  multiple images under each camera. 

 
 3.2. Evaluation Protocol 

  目前的数据前普遍采用的是Cumulated Matching Characteristics (CMC)曲线来评价re-id算法的表现。同样使用mean average precision(mAP)曲线。 

  数据集随机的分为training 与 testing 两部分，分别包含750与751个行人。query有 hand-drawn选择。统一个相机下只存在一个query，则每个indenty至多有6个query（共6个摄像头），总共3363个queries，平均每个identy 4.5个queries。每个query有14.8个groundtruth images。 

 
 4. Our Method 

  4.1 BoW模型 

  特征提取：采用Color Names（CN）,就是颜色描述。图像固定大小为128*64，小块为4*4，步长为4，提取CN特征。 

  BoW的聚类类别数用codebook表示，本文中定义为350。 

  Quantiization量化，用Multiple Assignment（MA）。设定MA=10。 

  TF-IDF，The visual word histogram 以 TF-IDF向量量化。 

  Similarity Function相似性判断，采用点乘。 

  4.2 Improvements 

  弱化几何约束（Weak Geometric Constraints） 

  经典方法中，Adjacency Constrained Search（ACS），就是将图像分块进行匹配，一般水平分。但这种方式计算量巨大。受到Spatial Pyramid Matching启发，我们整合了ACS与BoW模型。将图像分为m个水平条，则visual word histogram可以描述为：d^m = （d1^m,d2^m,......,dk^m）^T,k为codebook大小。整幅图像的特征f=（d^1,d^2,...,d^M）^T. 

  背景抑制（backgroudn supperession） 

  采用二维高斯模板函数，假定认为行人是在图像你的正中间。 

  Multipe Queries 多个query 

  考虑计算机速度，将多个query合并成一个。两种策略： 1、每张query计算特征向量后，取向量的平均average。2 取最大max。 

海之崖

关注

3
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录