【ReID】Strong Baseline_reid strong baseline-CSDN博客

本文链接：https://blog.csdn.net/arron_hou/article/details/105396791

Bag of Tricks and A Strong Baseline for Deep Person Re-identification

Hao Luo

CVPR2019 Oral

Question

How to use some trick to improve the ability of re-id model and only use global features to achieve high performance?

Achievement

achieves 94.5% rank-1 and 85.9% mAP on Market1501

Methodology

Standard Baseline

ResNet50 with pre-trained
Randomly sample P=16 identities and K=4 images of per person to constitute a training batch. The batch size equals to B=P×K.
We resize each image into 256 × 128 pixels and pad the resized image 10 pixels with zero values. Then randomly crop it into a 256 × 128 rectangular image.
Each image is flipped horizontally with 0.5 probability.
Regularization mean = (0.485, 0.456, 0.406) std = (0.229, 0.224, 0.225)
The model outputs ReID features f and ID prediction logits p.
ReID features f is used to calculate triplet loss. ID prediction logits p is used to calculated cross entropy loss. The margin m of triplet loss is set to be 0.3.
Adam method is adopted to optimize the model. The initial learning rate is set to be 0.00035 and is decreased by 0.1 at the 40th epoch and 70th epoch re- spectively. Totally there are 120 training epochs.

Training Tricking

Warmup Learning Rate

$\operatorname{lr}(t)=\left\{\begin{array}{ll} 3.5 \times 10^{-5} \times \frac{t}{10} & \text { if } t \leq 10 \\ 3.5 \times 10^{-4} & \text {if } 10<t \leq 40 \\ 3.5 \times 10^{-5} & \text {if } 40<t \leq 70 \\ 3.5 \times 10^{-6} & \text {if } 70<t \leq 120 \end{array}\right.$
Random Erasing Augmentation (REA)
- $p = 0.5$
- $0.02 <S_e < 0.4$ ,
- aspect ratio $r_1 = 0.3, r_2 = 3.33$
Label Smoothing (LS)

prevent overfitting

$D)=\sum_{i=1}^{N}-q_{i} \log \left(p_{i}\right)\left\{\begin{array}{l} q_{i}=0, y \neq i \\ q_{i}=1, y=i \end{array}\right.$
$q_{i}=\left\{\begin{array}{ll} 1-\frac{N-1}{N} \varepsilon & \text { if } i=y \\ \varepsilon / N & \text { otherwise } \end{array}\right.$

$\epsilon = 0.1$

Last Stride

Higher spatial resolution of feature can bring significant improvement.

last stride = 1 means
BNNeck
Center Loss

It is difficult to ensure that $d_p < d_n$ in the whole training dataset.

Center loss, which simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers, makes up for the drawbacks of the triplet loss. The center loss function is formulated as:

$\mathcal{L}_{C}=\frac{1}{2} \sum_{j=1}^{B}\left\|\boldsymbol{f}_{t_{j}}-\boldsymbol{c}_{y_{j}}\right\|_{2}^{2}$

where $y_j$ is the label of the $j$ th image in a mini-batch. $c_{y_j}$ denotes the $y_j$ th class center of deep features. $B$ is the number of batch size. The formulation effectively characterizes the intra-class variations. Minimizing center loss increases intra-class compactness. Our model totally includes three losses as follow:
$L=L_{I D}+L_{T r i p l e t}+\beta L_{C}$
$\beta=0.0005$

Experimental Results

Harvest

triplet loss是不做归一化
cross-domain是个问题，是整个deep learning学术界通有的问题。不过在业界，当数据量当了一个量级之后，其实domain bias就不那么明显了。目前造成落地困难的是遮挡，不可见光，撞衫等问题
只用arcface和cosface这种集成了metric learning思想的改进版softmax就行了，但是我用你的baseline，发现arcface+triplet>softmax+triplet>arcface，arcface和triplet loss貌似能结合起来？
code
next paper is A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification

Reference

罗浩知乎分享 一个更加强力的ReID Baseline

罗浩团队方案 全国人工智能大赛行人重识别(Person ReID)赛项季军团队方案分享

Code

config:

yacs

data

market1501

model

LAST_STRIDE = 1

optimizer

if "bias" in key:
    lr = cfg.SOLVER.BASE_LR * cfg.SOLVER.BIAS_LR_FACTOR

loss

Triplet Loss

The original version is FaceNet from Google.

TripletMarginLoss

pytorch implement

version of pytorch implement: Learning shallow convolutional feature descriptors with triplet losses

$n)=\max \left\{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\text { margin, } 0\right\}$

$d\left(x_{i}, y_{i}\right)=\left\|\mathbf{x}_{i}-\mathbf{y}_{i}\right\|_{p}$

Triplet loss with batch hard mining, TriHard loss

In Defense of the Triplet Loss for Person Re-Identification

作者：罗浩.ZJU
链接：https://zhuanlan.zhihu.com/p/31921944
来源：知乎
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

难样采样三元组损失（本文之后用TriHard损失表示）是三元组损失的改进版。传统的三元组随机从训练数据中抽样三张图片，这样的做法虽然比较简单，但是抽样出来的大部分都是简单易区分的样本对。如果大量训练的样本对都是简单的样本对，那么这是不利于网络学习到更好的表征。大量论文发现用更难的样本去训练网络能够提高网络的泛化能力，而采样难样本对的方法很多。论文[10]提出了一种基于训练批量(Batch)的在线难样本采样方法——TriHard Loss。

TriHard损失的核心思想是：对于每一个训练batch，随机挑选 $P$ 个ID的行人，每个行人随机挑选 $K$ 张不同的图片，即一个batch含有 $ P \times K$ 张图片。之后对于batch中的每一张图片 $a$ ，我们可以挑选一个最难的正样本和一个最难的负样本和 $a$ 组成一个三元组。

首先我们定义和 $a$ 为相同ID的图片集为 $A$ ，剩下不同ID的图片图片集为 $B$ ，则TriHard损失表示为：

$L_{t h}=\frac{1}{P \times K} \sum \limits _{a \in b a t c h}\left(\max \limits _{p \in A} d_{a, p}-\min \limits _{n \in B} d_{a, n}+\alpha\right)_{+}$