Motivation
-
现有的Re-ID工作都面临以下的问题:
- loss function的选择
- 不对准问题
- 寻找高判别力的局部特征
- 对于rank loss优化中的采样问题
-
目前的大多数工作都是针对上述问题中的一两个来进行解决,能不能用一个统一的框架来解决上述问题呢?
Contribution
- 提出了Mancs框架来统一解决上述问题
- 提出了fully attentional block with deep supervision与curriculum sampling来提高模型提取特征的能力与训练的效果(这两个可以借鉴到其他工作上)
- 本文提出的方法在三个公开数据集上达到了SOTA效果
1 Introduction
-
Re-ID定义、意义以及难点
-
研究方向:
- 行人特征表示
- 距离度量:存在正负样本不平衡问题,通常对采样方法要求较高
-
动机与贡献
2 Related Work
-
Attention Network
- MSCAN
- HA-CNN
- CAN
-
Metric Learning
- triplet loss ==> online hard examples mining(OHEM)
- contrastive loss
-
Multi-task learning
- triplet loss + softmax
- 本文:triplet loss + focal loss
3 Method
3.1 Training Architecture
- 如下图,本文的网络结构主要由三部分构成:
- backbone network (ResNet50) ==> a multi-scale feature extractor
- attention module ==> attention mask
- loss function:attention loss + triplet loss + focal loss
3.2 Fully Attentional Block
-
借鉴了SE Block,对其结构进行了改进:
- SE Block的问题:使用GAP导致空间结构信息的丢失 ==> 本文去掉池化层,用1x1的卷积层来代替全连接层来保留空间信息
-
attention map计算公式:
M = S i g m o i d ( C o n v ( R e L U ( C o n v ( F i ) ) ) ) M = Sigmoid(Conv(ReLU(Conv(F_i)))) M=Sigmoid(Conv(ReLU(Conv(Fi)))) -
由attention map得到输出feature map
F o = F i ∗ M + F i F_o = F_i * M + F_i Fo=Fi∗M+Fi
3.3 ReID Task #1: Triplet loss with curriculum sampling
-
ranking loss相比classification loss在数据量不大的时候有更强的泛化性能
-
rank branch:共享backbone + a pooling layer + FC layer
-
采样方法:OHEM每个选择最困难的样本进行参数更新容易导致训练过程中模型坍塌 ==> curriculum sampling(from easy triplets to hard triplets)
- 对于一个anchor I i a I_i^a Iia,首先随机选择一个positive I i p I_i^p Iip
- 根据负样本到anchor的距离从小到大(hard --> easy)进行排序
- 根据概率分布(Gaussian distribution N ( μ , σ ) \mathcal{N}(\mu, \sigma) N(μ,σ))来对负样本进行选择
μ = [ N n − N n t 0 t ] + σ = a × b t − t 0 t 1 − t 0 \mu = [N_n - \frac{N_n}{t_0}t]_+ \\ \sigma = a \times b^{\frac{t-t_0}{t_1 - t_0}}\\ μ=[Nn−t0Nnt]+σ=a×bt1−t0t−t0
-
I
i
n
I_i^n
Iin的选择概率, 随着
t
t
t增大,选择困难样本的概率增大,如下图
P r ( I i n ∗ = I i n ∣ I i a ) ∝ N ( μ , σ ) Pr(I^{n^*}_i=I_i^n|I^a_i) \propto \mathcal{N}(\mu, \sigma) Pr(Iin∗=Iin∣Iia)∝N(μ,σ)
- final loss for ranking branch
L r a n k = 1 P ( K − 1 ) K ∑ i = 1 P ( K − 1 ) K [ m + D ( f r a n k ( I i a ) , f r a n k ( I i n ) ) ] + L_{rank} = \frac{1}{P(K-1)K} \sum\limits_{i=1}^{P(K-1)K}[m+D(f_{rank}(I^a_i),f_{rank}(I^n_i))]_+ Lrank=P(K−1)K1i=1∑P(K−1)K[m+D(frank(Iia),frank(Iin))]+
3.4 ReID Task #2: Person classification with focal loss
-
考虑到classification + ranking效果更好,添加了classification branch,同时考虑到困难样本应该比简单样本更受重视,选择了focal loss(softmax loss的一种改进版本),给困难样本更多的权重
-
focal loss for classification branch
L c l s = − 1 P K ∑ i = 1 P K ( 1 − p i ) γ l o g ( p i ) p i = S i g m o i d c i ( F C ( f c l s ( I i ) ) ) L_{cls} = -\frac{1}{PK}\sum \limits_{i=1}^{PK}(1-p_i)^\gamma log(p_i) \\ p_i = Sigmoid_{c_i}(FC(f_{cls}(I_i))) Lcls=−PK1i=1∑PK(1−pi)γlog(pi)pi=Sigmoidci(FC(fcls(Ii)))
3.5 ReID Task #3: Deep supervision for better attention
-
将不同尺度得到的attention map(与attention mask相乘过的特征图)进行平均池化与concatated得到attention feature vector f a t t f_{att} fatt进行来身份分类 ==> accurate attention maps
-
loss function for attention branch
L a t t = 1 P K C ∑ i = 1 P K ∑ c = 1 C y i c l o g ( q i c ) + ( 1 − y i c ) l o g ( 1 − q i c ) q i c = S i g m o i d c ( F C ( f a t t ( I i ) ) ) L_{att} = \frac{1}{PKC}\sum \limits_{i = 1}^{PK}\sum \limits_{c=1}^Cy_i^clog(q^c_i) + (1-y_i^c)log(1-q^c_i) \\ q^c_i = Sigmoid_c(FC(f_{att}(I_i))) Latt=PKC1i=1∑PKc=1∑Cyiclog(qic)+(1−yic)log(1−qic)qic=Sigmoidc(FC(fatt(Ii)))
3.6 Multi-task learning
- three tasks(rank + cls + att)共享backbone,最终的loss function:
L = λ r a n k L r a n k + λ c l s L c l s + λ a t t L a t t \mathcal{L}= \lambda_{rank}L_{rank} + \lambda_{cls}L_{cls} + \lambda_{att}L_{att} L=λrankLrank+λclsLcls+λattLatt
3.7 Inference
- rank branch的特征具有更强的泛化性能,在测试阶段用来代表行人图片,如下图所示
4 Experiments
4.1 Datasets
- Market1501、CUHK03、DukeMTMC-reID
4.2 Evaluation Protocol
-
mAP、CMC
-
Market1501:both single query and multi-query;CUHK03与DukeMTMC-reID:single query
-
CUHK03 split:1367/100 and 767/700
4.3 Implementation Details
-
Pytorch
-
Pretrained ResNet-50 + 分类层前的2048FC
Data Augmengtation
- resize images to 256 x 128 ==> randomly crop with scale in [0.64, 1.0] and aspect ratio in [2, 3] ==> resize back to 256 x 128 ==> randomly horizontally flip with probility 0.5 ==> random erasing ==> subtracted the mean value and divided by the standard deviation
Training Configurations
-
PK Sampling strategy:Market1501 and DukeMTMC-ReID:P、K = 16 CUHK03:P=32,K=8 DukeMTMC-ReID
-
160 epochs、 t 0 = 30 t 1 = 60 a = 15 b = 0.001 t_0=30 \ t_1=60 a=15 b=0.001 t0=30 t1=60a=15b=0.001
-
λ r a n k = 1 , λ c l s = 1 , λ a t t = 0.2 \lambda_{rank}=1,\lambda_{cls}=1,\lambda_{att}=0.2 λrank=1,λcls=1,λatt=0.2
-
m a r g i n m = 0.5 γ = 2 margin \ m=0.5 \ \gamma=2 margin m=0.5 γ=2
-
Adam optimizer, lr=3x10e-4
-
gradient clipping to prevent model collision
-
最后卷积层的ReLU换成了PReLU ==> 增强最后的特征的表达能力
4.4 Comparisons with the state-of-art methods
Evaluation On Market-1501
Evaluation On CUHK03
Evaluation On DukeMTMC-reID
4.5 Ablation Study
- 对本文提出的Curriculum Sampling(CS)、Full Attentional Block、Focal Loss、Random Erasing有效性进行了验证,如下表
-
cls + rank的baseline已经很高了,本文提出的方法每个提升相对比较小
-
下图举的例子不是很懂,文中该图说明random erasing与cls有很大的提升
5 Conclusions
- 本文提出的Mancs能够学习稳定的特征在三个常用的公开数据集上取得了SOTA的性能
- 本文提出的fully attentional block with deep supervision与curriculum sampling的有效性(可以在其他相关任务借鉴)
- 未来工作:结合数据采样与增强进一步提供reID特征的泛化能力