Decoupling Representation and Classifier for Long-Tailed Recognition 图像领域长尾分布分类问题方法

最新推荐文章于 2024-02-28 20:31:12 发布

Jay_Tang

最新推荐文章于 2024-02-28 20:31:12 发布

阅读量1.3k

点赞数

分类专栏：机器学习核心推导文章标签：机器学习

本文链接：https://blog.csdn.net/Jay_Tang/article/details/112343511

版权

本文研究了长尾分布数据集的图像分类问题，提出解耦表示学习和分类器训练可以有效提升尾部类别的识别性能。通过实例平衡采样和分类器调整策略，如Classifier Re-training (cRT)、Nearest Class Mean (NCM) 和 τ-normalized 方法，实验证明在不依赖复杂采样策略或平衡损失的情况下，也能达到高性能。

摘要由CSDN通过智能技术生成

文章目录

往期文章链接目录

Introduction

When learning with long-tailed data, a common challenge is that instance-rich (or head) classes dominate the training procedure. The learned classification model tends to perform better on these classes, while performance is significantly worse for instance-scarce (or tail) classes (under-fitting).

The general scheme for long-tailed recognition is: classifiers are either learned jointly with the representations end-to-end, or via a two-stage approach where the classifier and the representation are jointly fine-tuned with variants of class-balanced sampling as a second stage.

In our work, we argue for decoupling representation and classification. We demonstrate that in a long-tailed scenario, this separation allows straightforward approaches to achieve high recognition performance, without the need for designing sampling strategies, balance-aware losses or adding memory modules.

Recent Directions

Recent studies’ directions on solving long-tailed recognition problem:

Data distribution re-balancing. Re-sample the dataset to achieve a more balanced data distribution. These methods include over-sampling, down-sampling and class-balanced sampling.
Class-balanced Losses. Assign different losses to different training samples for each class.
Transfer learning from head- to tail classes. Transfer features learned from head classes with abundant training instances to under-represented tail classes. However it is usually a non-trivial task to design specific modules for feature transfer.

Sampling Strategies

For most sampling strategies presented below, the probability $p_j$ of sampling a data point from class $j$ is given by: $p_{j}=\frac{n_{j}^{q}}{\sum_{i=1}^{C} n_{i}^{q}}$ where $\in[0,1]$ , $n_j$ denote the number of training sample for class $j$ and $C$ is the number of training classes. Different sampling strategies arise for different values of $q$ and below we present strategies that correspond to $q = 1, q = 0,$ and