《BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition》笔记

最新推荐文章于 2023-06-27 09:48:18 发布

c2a2o2

最新推荐文章于 2023-06-27 09:48:18 发布

阅读量755

点赞数

原文链接：https://zhuanlan.zhihu.com/p/109648173

版权

Paper:《BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition》
Authors: Boyan Zhou, Quan Cui, Xiu-Shen Wei, Zhao-Min Chen
Institutions: Megvii Technology; Waseda University; Nanjing University
Published on CVPR 2020 ( Oral).
Keywords: Long-Tailed Visual Recognition;

https://zhuanlan.zhihu.com/p/109648173

【概览】

文章首先指出了现在解决Long-Tail问题常用的re-balancing方法（Re-Weighting & Re-Sampling）虽然可以通过提升网络结构中classifier部分的性能，但是却损害了Representation部分（feature extractor backbone）学习的效果。如图1所示，虽然在Re-blancing之后，模型可以正确分类部分tail类的数据，但是却使得每个类的类内分布变得更加separable。

图1

提出了Bilateral-BranchNetwork(BBN)网络来更好地兼顾到representation learning和classifier learning两部分。
结合一种cumulative learning渐进式学习的策略，使得模型在训练过程中首先关注于非tail类的学习，然后随着模型训练，逐渐关注到tail类的学习当中去。

【How class re-balancing strategies work】

作者首先通过一组控制变量的对比试验来验证关于Re-blancing策略的观点是正确的。具体的，作者设计了一个two-stage的策略来将深度模型的学习过程分成representations的学习和 classifiers 的学习两部分。
第一阶段是representations learning（for backbone layers）。作者使用传统的cross entropy损失，re-weighting和re-sampling三种训练策略来训练模型。
第二阶段是classifier learning（for FC layers）。固定feature extractors部分（backbone部分）的参数，按照第一步使用的训练策略from scratch地重新训练分类器。

Figure 2. Top-1 error rates of different manners for representation learning and classifier learning on two long-tailed datasets CIFAR-100-IR50 and CIFAR-10-IR50

图2展示的是在CIFAR-100-IR50 和 CIFAR-10-IR50 连个数据集上的实验效果。横向比较时，即Classifier部分的学习策略一致，Representation Learning阶段使用RS和RW会导致error rate变差；纵向比较时，即Representation Learning阶段使用的训练策略一致，classifier部分使用RS或者RW会提升模型的性能。这与作者在之前argue的关于Re-Blancing的观点是一致的。

【Bilateral-Branch Network(BBN)】

Figure 3. Framework of our Bilateral-Branch Network (BBN).

如图3所示，文中提出的BBN模型包含两个分支，分别用来负责representation learning和classifier learning。其中，conventional learning分支使用正常的uniform sampler；而re-blancing分支使用reversed sampler，即与每个类别样本数量成反比例关系的采样策略。每个样本的采样概率如公式1所示。其中，。
（公式1）
调整器Adaptior使用渐进式学习的策略通过控制来调整两个branch在训练过程中的权重。如公式2所示。其中，为当前的训练epoch，为总的训练epoch。因此，在训练过程中是逐渐减小的。在inference阶段，设置为0.5。
（公式2）
输出的logits如公式3所示。BBN最终的损失函数如公式4所示。其中，为交叉熵损失函数。在测试阶段，最终的logits是通过将两个logits做element-wise的加法得到的。
（公式3）
（公式4）
综上，BBN的算法流程如图4所示。