【ReID】Strong Baseline

Bag of Tricks and A Strong Baseline for Deep Person Re-identification

Hao Luo

CVPR2019 Oral

Question

How to use some trick to improve the ability of re-id model and only use global features to achieve high performance?

Achievement

achieves 94.5% rank-1 and 85.9% mAP on Market1501

Methodology

Standard Baseline

  1. ResNet50 with pre-trained
  2. Randomly sample P=16 identities and K=4 images of per person to constitute a training batch. The batch size equals to B=P×K.
  3. We resize each image into 256 × 128 pixels and pad the resized image 10 pixels with zero values. Then randomly crop it into a 256 × 128 rectangular image.
  4. Each image is flipped horizontally with 0.5 probability.
  5. Regularization mean = (0.485, 0.456, 0.406) std = (0.229, 0.224, 0.225)
  6. The model outputs ReID features f and ID prediction logits p.
  7. ReID features f is used to calculate triplet loss. ID prediction logits p is used to calculated cross entropy loss. The margin m of triplet loss is set to be 0.3.
  8. Adam method is adopted to optimize the model. The initial learning rate is set to be 0.00035 and is decreased by 0.1 at the 40th epoch and 70th epoch re- spectively. Totally there are 120 training epochs.

Training Tricking

  1. Warmup Learning Rate

    lr ⁡ ( t ) = { 3.5 × 1 0 − 5 × t 10  if  t ≤ 10 3.5 × 1 0 − 4 if  10 < t ≤ 40 3.5 × 1 0 − 5 if  40 < t ≤ 70 3.5 × 1 0 − 6 if  70 < t ≤ 120 \operatorname{lr}(t)=\left\{\begin{array}{ll} 3.5 \times 10^{-5} \times \frac{t}{10} & \text { if } t \leq 10 \\ 3.5 \times 10^{-4} & \text {if } 10<t \leq 40 \\ 3.5 \times 10^{-5} & \text {if } 40<t \leq 70 \\ 3.5 \times 10^{-6} & \text {if } 70<t \leq 120 \end{array}\right. lr(t)=3.5×105×10t3.5×1043.5×1053.5×106 if t10if 10<t40if 40<t70if 70<t120

  2. Random Erasing Augmentation (REA)

    • p = 0.5 p = 0.5 p=0.5
    • 0.02 < S e < 0.4 0.02 <S_e < 0.4 0.02<Se<0.4,
    • aspect ratio r 1 = 0.3 , r 2 = 3.33 r_1 = 0.3, r_2 = 3.33 r1=0.3,r2=3.33
  3. Label Smoothing (LS)

    prevent overfitting

L ( I D ) = ∑ i = 1 N − q i log ⁡ ( p i ) { q i = 0 , y ≠ i q i = 1 , y = i L(I D)=\sum_{i=1}^{N}-q_{i} \log \left(p_{i}\right)\left\{\begin{array}{l} q_{i}=0, y \neq i \\ q_{i}=1, y=i \end{array}\right. L(ID)=i=1Nqilog(pi){qi=0,y=iqi=1,y=i
q i = { 1 − N − 1 N ε  if  i = y ε / N  otherwise  q_{i}=\left\{\begin{array}{ll} 1-\frac{N-1}{N} \varepsilon & \text { if } i=y \\ \varepsilon / N & \text { otherwise } \end{array}\right. qi={1NN1εε/N if i=y otherwise 

ϵ = 0.1 \epsilon = 0.1 ϵ=0.1

  1. Last Stride

    Higher spatial resolution of feature can bring significant improvement.

    last stride = 1 means

  2. BNNeck
    在这里插入图片描述

  3. Center Loss

    It is difficult to ensure that d p < d n d_p < d_n dp<dn in the whole training dataset.

    Center loss, which simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers, makes up for the drawbacks of the triplet loss. The center loss function is formulated as:

L C = 1 2 ∑ j = 1 B ∥ f t j − c y j ∥ 2 2 \mathcal{L}_{C}=\frac{1}{2} \sum_{j=1}^{B}\left\|\boldsymbol{f}_{t_{j}}-\boldsymbol{c}_{y_{j}}\right\|_{2}^{2} LC=21j=1Bftjcyj22

where y j y_j yj is the label of the j j j th image in a mini-batch. c y j c_{y_j} cyj denotes the y j y_j yj th class center of deep features. B B B is the number of batch size. The formulation effectively characterizes the intra-class variations. Minimizing center loss increases intra-class compactness. Our model totally includes three losses as follow:
L = L I D + L T r i p l e t + β L C L=L_{I D}+L_{T r i p l e t}+\beta L_{C} L=LID+LTriplet+βLC
β = 0.0005 \beta=0.0005 β=0.0005

Experimental Results

Harvest

  • triplet loss是不做归一化
  • cross-domain是个问题,是整个deep learning学术界通有的问题。不过在业界,当数据量当了一个量级之后,其实domain bias就不那么明显了。目前造成落地困难的是遮挡,不可见光,撞衫等问题
  • 只用arcface和cosface这种集成了metric learning思想的改进版softmax就行了,但是我用你的baseline,发现arcface+triplet>softmax+triplet>arcface,arcface和triplet loss貌似能结合起来?
  • code
  • next paper is A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification

Reference

罗浩知乎分享 一个更加强力的ReID Baseline

罗浩团队方案 全国人工智能大赛 行人重识别(Person ReID)赛项 季军团队方案分享

Code

config:

​ yacs

data

​ market1501

model

LAST_STRIDE = 1

optimizer

if "bias" in key:
    lr = cfg.SOLVER.BASE_LR * cfg.SOLVER.BIAS_LR_FACTOR

loss

Triplet Loss

The original version is FaceNet from Google.

TripletMarginLoss

pytorch implement

version of pytorch implement: Learning shallow convolutional feature descriptors with triplet losses

L ( a , p , n ) = max ⁡ { d ( a i , p i ) − d ( a i , n i ) +  margin,  0 } L(a, p, n)=\max \left\{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\text { margin, } 0\right\} L(a,p,n)=max{d(ai,pi)d(ai,ni)+ margin, 0}

d ( x i , y i ) = ∥ x i − y i ∥ p d\left(x_{i}, y_{i}\right)=\left\|\mathbf{x}_{i}-\mathbf{y}_{i}\right\|_{p} d(xi,yi)=xiyip

Triplet loss with batch hard mining, TriHard loss

In Defense of the Triplet Loss for Person Re-Identification

作者:罗浩.ZJU
链接:https://zhuanlan.zhihu.com/p/31921944
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

难样采样三元组损失(本文之后用TriHard损失表示)是三元组损失的改进版。传统的三元组随机从训练数据中抽样三张图片,这样的做法虽然比较简单,但是抽样出来的大部分都是简单易区分的样本对。如果大量训练的样本对都是简单的样本对,那么这是不利于网络学习到更好的表征。大量论文发现用更难的样本去训练网络能够提高网络的泛化能力,而采样难样本对的方法很多。论文[10]提出了一种基于训练批量(Batch)的在线难样本采样方法——TriHard Loss。

TriHard损失的核心思想是:对于每一个训练batch,随机挑选 P P P 个ID的行人,每个行人随机挑选 K K K 张不同的图片,即一个batch含有 $ P \times K$ 张图片。之后对于batch中的每一张图片 a a a,我们可以挑选一个最难的正样本和一个最难的负样本和 a a a 组成一个三元组。

首先我们定义和 a a a 为相同ID的图片集为 A A A ,剩下不同ID的图片图片集为 B B B ,则TriHard损失表示为:

L t h = 1 P × K ∑ a ∈ b a t c h ( max ⁡ p ∈ A d a , p − min ⁡ n ∈ B d a , n + α ) + L_{t h}=\frac{1}{P \times K} \sum \limits _{a \in b a t c h}\left(\max \limits _{p \in A} d_{a, p}-\min \limits _{n \in B} d_{a, n}+\alpha\right)_{+} Lth=P×K1abatch(pAmaxda,pnBminda,n+α)+

其中 α \alpha α 是人为设定的阈值参数。TriHard损失会计算 a a a 和batch中的每一张图片在特征空间的欧式距离,然后选出与 a a a 距离最远(最不像)的正样本 p p p 和距离最近(最像)的负样本 n n n 来计算三元组损失。通常TriHard损失效果比传统的三元组损失要好。

MarginRankingLoss

y = 1 y = 1 y=1 so x 1 x_1 x1 should large than x 2 x_2 x2

x 1 = d ( a i , n i ) x_1 = d\left(a_{i}, n_{i}\right) x1=d(ai,ni)

x 2 = d ( a i , p i ) x_2 = d\left(a_{i}, p_{i}\right) x2=d(ai,pi)

l o s s ( x , y ) = m a x ( 0 , − y ∗ ( x 1 − x 2 ) + m a r g i n ) loss(x,y)=max(0,−y∗(x_1−x_2)+margin) loss(x,y)=max(0,y(x1x2)+margin)

SoftMarginLoss

margin is 0 and y = 1 y = 1 y=1 x = x 1 − x 2 x = x_1-x_2 x=x1x2

l o s s ( x , y ) = ∑ n = 1 l o g ( 1 + e x p ( − y [ i ] ∗ x [ i ] ) ) x . n e l e m e n t ( ) loss(x,y)= \sum\limits_{n=1}^{}\frac{log(1+exp(−y[i]∗x[i]))}{x.nelement()} loss(x,y)=n=1x.nelement()log(1+exp(y[i]x[i]))

  1. feature distance matrix 两两之间距离, 由 global_feature 求的
  2. hard example mining 找到不同类之间最近的样本距离 x 1 x_1 x1 和同一类中最远的样本距离 x 2 x_2 x2, label 信息在这个时候使用
  3. MarginRankingLoss / SoftMarginLoss 得到 loss

margin 的值应该和 dist 求法相关

Label Smothing Loss

Rethinking the Inception Architecture for Computer Vision. CVPR 2016.

Center Loss

A Discriminative Feature Learning Approach for Deep Face Recognition ECCV 2016

在 loss 中增加类似正则项, 使得同类样本之间紧凑, 不同类样本之间分散.

使得在一个batch 中 同一类每一个 f e a t u r e feature feature和所有feature的中心 C e n t e r f e a t u r e Center_{feature} Centerfeature距离尽量小

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值