Re-ID:AlignedReID: Surpassing Human-Level Performance in Person Re-Identification 论文解析

刚读完这篇文章,贼6,用动态规划求最小路径进行特征对齐,很新奇,而且准确率很高。

下面是我对这篇论文的一个整理~


  • 这篇文章作者提出了AlignedReID的方法,其亮点在于:在数据集Market1501与CUHK03上,该方法实现的rank-1 accuracy 首超人类,
  • 作者认为:
    • Traditional approaches have focused on low-level features such as colors, shapes, and local descriptors. With the renaissance of deep learning, the convolutional neural network (CNN) has dominated this field.
    • 传统的方法大多采用CNN提取低级别的特征。
    • Many CNN-based approaches learn a global feature, without considering the spatial structure of the person. This has a few major drawbacks:
      • inaccurate person detection boxes might impact feature learning.
      • the pose change or non-rigid body deformation makes the metric learning difficult.
      • occluded parts of the human body might introduce irrelevant context into the learned feature.
      • it is non-trivial to emphasis local differences in a global feature, especially when we have to distinguish two people with very similar apperances.
    • 许多基于CNN的方法只学习了全局的特征,而没有考虑人体的空间结构,这会导致以下这些问题:
      • 不准确的人物检测框可能会影响特征的学习;
      • 姿势的改变和人体的变形可能会导致度量学习的困难;
      • 人体的部分身体部位被遮挡可能会引入无关的上下文信息;
      • 在全局特征上强调局部差异是非常重要的,尤其是在区分两个外貌非常相似的人的时候
    • 这里写图片描述
    • 为了解决以上问题,过去的研究将重心放在part-based, local feature learning。有些研究将整个身体分割为几个固定的部分,而不考虑这几个部分之间的对应关系。这样的话无法解决以上问题。还有研究使用pose estimation帮助人体几个部分的对齐,但这样需要额外的supervision and a pose estimation step。
  • 所以,作者采用了AlignedReID的方法:
    • In this paper, we propose a new approach, called AlignedReID, which still learns a global feature, but perform an automatic part alignment during the learning, without requring extra supervision or explicit pose estimation.
    • 作者提出的方法中,仍然是学习全局的特征,但是能自动进行各部分的对齐,且这一操作不需要额外的supervision 和 explicit pose estimation.
    • In the local branch, we align local parts by introducing a shortest path loss.
    • 在局部特征的学习中,我们通过计算最短路径进行对齐操作。
    • In the inference stage, we discard the local branch and only extract the global feature.
    • 在预测阶段,只使用了全局特征而没有采用局部特征。
    • In other words, the global feature itself, with the aid of local features learning, can greatly address the drawbacks we mentioned above, in our new joint learning framework.
    • 换句话说,在基于局部特征学习得到的全局特征能够解决基于CNN方法遇到的那四个问题。
    • In addition, the form of global feature keeps our approach attractive for the deployment of a large ReID system, without costly local features matching.
    • 作者还说,全局特征的形式使得他们的方法在大型的人物重识别中仍然能够很好的工作,而不需采用消耗巨大的局部特征匹配。
    • We also adopt a mutual learning approach in the metric learning setting, to allow two models to learn better representations from each other.
    • 对于度量学习,作者采用的是mutual learning 的方法,并取得了很好的结果。
  • 现有几个概念需要补充一下:
    • Metric Learning:Deep metric learning methods transform raw images into embedding feature, then compute the feature distances as their similarities. Usually, two images of the same person are defined as a positive pair, whereas two images of different persons are a negative pair. Triplet loss is motivatived by the margin enforced between positive and negative pairs. Selecting suitable samples for the training model through hard mining has been shown to be effective. Combining softmax loss with metric learning loss to speed up the convergence is also a popular method.
    • Feature Alignments: Consider the spatial local information when learning features.
    • Mutual Learning: presents a deep mutual learning strategy where an ensemble of students learn collaboratively and teach each other throughout the training process.
    • Re-Ranking: After obtaining the image features, most current works choose the L2 Euclidean distance to compute a similarity score for a ranking or retrieval task.
  • 下面对AlignedReID的原理进行更深的一步介绍:
    • In AlignedReID, we generate a single global feature as the final output of the input image, and use the L2 distance as the similarity metric. However, the global feature is learned jointly with local features in the learning stage.
    • Re-ID一般分为两步:一是提取特征,二是进行度量学习。在AlignedReID中,每张输入图片的最终输出是单一的全局特征,而该全局特征是与局部特征联合训练得来的。
      • A global feature(a C-d vector) is extracted by directly applying global pooling on the feature map.
      • 对于全局特征的提取,便是用global pooling在feature map上滑动提取特征。
      • For the local features, a horizontal pooling, which is a global pooling in the horizontal direction, is first applied to extract a local feature for each row, and a 1X1 convolution is then applied to reduce the channel number from C to c. In this way, each local feature(a c-d vector) represents a horizontal part of image for a person.
      • 对于局部特征提取,便是用horizontal pooling对feature map进行逐行提取&#
  • 4
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值