【读论文0625】Investigating Long Tail Relation Classification with Decoupling Analysis

本文链接：https://blog.csdn.net/luochi9051/article/details/118215063

The Devil is the Classifier: Investigating Long Tail Relation Classification with Decoupling Analysis

论文
简单总结

论文在这：https://arxiv.org/pdf/2009.07022.pdf
发表时间：2020，挂在arxiv上
code: https://github.com/zjunlp/deepke

论文

解决问题

Relation classification (RC) 。关系分类，自然语言处理中的任务。

Intuition

有个发现：平衡采样的样本已经可以学到很好的表示。

pre-trained models with instance-balanced sampling already capture the well-learned representations for all classes.

只需要调整分类器就可以了。因此提出一个调整分类器的方法：attentive relation routing。
assigns soft weights by automatically aggregating the relations。只看摘要的话，看起来是根据特征关系自动re-weighting。

具体方法

Introduction中指出一个问题：我们不能完全理解联合学习的策略–分类长尾数据的能力是通过更好的特征表示还是通过更鲁棒的决策边界呢？

However, the mechanisms behind such jointly schema are not thoroughly understood, thus making it unclear how the long-tailed classification ability is achieved–is it from learning a better representation or by handling the discriminability better via robust classifier decision boundaries?

我觉得通过BBN，decoupling那几篇类似的论文可以知道，特征已经足够好了，尾部表现差是因为分类器有偏差，即，决策边界划的不好。

Section3.2 一个现象：1）同样的特征提取，不同的分类器，重加权/重采样效果比正常采样长尾数据结果好；2）不同特征提取，同样的分类器，重加权/重采样表现反而不如正常采样长尾数据（F1 score更低），重加权/重采样学到的特征还不如正常采样。
为了说明：分类器更重要。

本文创新点在Section 4.1，引入了一个attentive relation routing (ARR)学习分类器权重。利用了胶囊动态路由算法，不同点在于把原始的squash function替换为layer normalization，也没说为啥要这样替换。胶囊网络中的 $c_{ij}$ ，也就是这篇论文中的 $\alpha_{mn}$ 就是attention weight。