[Paper note] Embedding Deep Metric for Person Re-identification: A Study Against Large Variation

最新推荐文章于 2022-05-02 11:14:25 发布

chn13

最新推荐文章于 2022-05-02 11:14:25 发布

阅读量938

点赞数

分类专栏： paper-note 文章标签： re-id

本文链接：https://blog.csdn.net/chn13/article/details/52921178

版权

paper-note 专栏收录该内容

20 篇文章 2 订阅

订阅专栏

ECCV 2016
Author: Hailin Shi, Yang Yang, Xiangyu Zhu, Shengcai Liao, Zhen Lei, Weishi Zheng, Stan Z. Li

Overview

Re-id research topics:
- Improving discriminative features.
- Good metric for comparison.
- This paper mainly focus on learning good metrics.
Influenced by face recognition method (the author also works on face recognition).
Contributions:
- Moderate Positive Mining, a novel positive sample selection strategy for training CNN while the data has large intra-class variations.
- Metric weight constraint (combine Euclidean distance with Mahalanobis distance).

Moderate positive mining

Intuitions
- Positive samples with large distance is harmful.
- Positive samples with too little distance have little contribution to convergance.
- What to do: reduce the intra-class variance while preserving the intrinsic graphical structure of pedestrian data via mining the moderate positive pairs in the local range (picture).
Algorithm of choosing moderate positive sample (picture)
- Compute the distances of 1-all positive&negative samples
- Mine the hardest negative sample (min distance negative), $distance = d^*$
- Subset of positive samples where distance is larger than $d^*$
- In this subset, find positive pair with min distance – moderate positive

Metric weight constraint

Euclidean distance shortcomings:
- Sensitive to the scale?
- Blind to the correlation across dimensions
- Using the Mahalanobis distance is a better choice for multivariate metric, argued by other work
Another FC after distance between features is calculated to gain Mahalanobis distance.
- Get Mahalanobis distance
  - $d(x_1, x_2)=\sqrt{(x_1-x_2)^\mathbf{T}M(x_1-x_2)}$
  - $M=WW^\mathbf{T}$ (ensure $M$ is semi-definate matrix)
  - $d(x_1, x_2)=||W^\mathbf{T}(x_1-x_2)||_2$
- This can be implemented by an FC layer
  $y=f(W^\mathbf{T}x)$
Weight constraint
- Euclidean better generalization ability, less discriminability.
- Balance between Euclidean and Mahalanobis distance.
- M should have large values at the diagonal (Euclidean) and small values elsewhere, by giving constraint:
  - $||WW^\mathbf{T}-I||^2_F\leq C$
- Further combine the constraint into the loss function as a regularization term:
  - Triplet loss: $L = d(x_1, x_2^p)+[m-d(x_1, x_2^n)]$ (margin set to 2 in the experiment)
  - Regularization: $\hat{L}=L+\frac\lambda2||WW^\mathbf{T}-I||^2_F$ (tune $\lambda$ to get the best trade-off)
  - Gradient w.r.t $W$ is computed by $\frac{\partial \hat{L}}{\partial W}=\frac{\partial L}{\partial W}+\lambda(WW^\mathbf{T}-I)W$
CNN architecture
- 3 branches CNN
  - Original image 128x64 => 3x64x64 (with overlap)
  - Untied (unshared) filter between CNN branches to learn specific features from the different human body parts of pedestrian image.
Experiments
- Their baseline is very weak (worse than CUHK-03 baseline)
- Three parts are analyzed
  - Moderate positive and hard negative (improve 10%+)
  - Weight constraint, tune on $\lambda$ ( $\lambda$ around $10^{-2}$ gets good trade-off)
  - Tied or untied filters between branches (Untied a little better)
- Augmentation
  - Random translation
  - Randomly cropped (0-5 pixels) in horizon and vertical, and stretched to recover the size
- Datasets
  - CUHK03 (Rank-1: 61.32% with hand-crafted bbox, 52.09% with detected bbox)
  - CUHK01 + Market-1501 in training (Rank-1: 86.59%)
  - VIPeR (Rank-1: 43.39%)