[NeurIPS 2019] Hyperspherical Prototype Networks

连理o

于 2023-03-28 23:29:48 发布

阅读量225

点赞数

文章标签： NeurIPS 2019

本文链接：https://blog.csdn.net/weixin_42437114/article/details/129798112

版权

papers 专栏收录该内容

39 篇文章 1 订阅

订阅专栏

Introduction
Hyperspherical prototypes
Experiments
References

Introduction

作者提出 hyperspherical prototype networks，可以利用 prototypes 以统一的框架完成分类和回归任务

Hyperspherical prototypes

Classification

在这里插入图片描述

Positioning hyperspherical prototypes. 在训练模型之前，作者先提前确定 hyperspherical prototypes 的位置，使其均匀分布在整个超球上。设最优的 prototypes 集合为 $\mathbf P^*$ ，则 $\mathbf P^*$ 会使得任意两个 prototypes 间的最大余弦距离最小，即
将上式中的 $\max _{(k, l, k \neq l) \in C} \cos \theta_{\left(\mathbf{p}_k^{\prime}, \mathbf{p}_l^{\prime}\right)}$ 作为损失函数，用梯度下降法优化即可得到 hyperspherical prototypes. 但作者认为这样优化效率太低，因为每次需要计算出所有 prototypes 间的余弦距离但却只优化距离最大的一对 prototypes，为此，作者提出采用以下损失函数，对每个 prototype，优化其距离最大的一对 prototypes，每次优化 $K$ 对 prototypes
其中， $K$ 为类别数， $C$ 为类别集合， $\hat {\mathbf P}\hat {\mathbf P}^T$ 为 pairwise prototype similarities，减去 $2\mathbf I$ 是为了避免 self selection. 将上式作为损失函数使用梯度下降法优化 prototypes，再将其投影回超球，不断迭代即可得到理想的 hyperspherical prototypes (SGD, with a learning rate of 0.01, momen-tum of 0.9)
Prototypes with privileged information. 为了进一步融入类别的语义信息，使得语义接近的 prototypes 相较于语义不同的 prototypes 间更加接近，作者利用了类别名的 word embed $\mathbf W=\{\mathbf w_1,...,\mathbf w_K\}$ ，引入了如下 ranking-based loss function,
其中， $T$ 为所有类别三元组的集合，ground truth $\bar S_{ijk}=\llbracket \cos \theta_{\mathbf{w}_i, \mathbf{w}_j} \geq \cos \theta_{\mathbf{w}_i, \mathbf{w}_k} \rrbracket$ ，output $S_{i j k} \equiv \frac{e^{o_{i j k}}}{1+e^{o_{i j k}}}$ ， $o_{ijk}=\cos\theta_{\mathbf p_i,\mathbf p_j}-\cos\theta_{\mathbf p_i,\mathbf p_k}$ . 上述两个损失函数相加即为最终的 hyperspherical prototypes 预训练损失函数
Classification. 损失函数最大化样本特征和其 class prototype 间的余弦距离，并且在此过程中不更新 prototypes
推理时，模型的预测结果为

Regression

在这里插入图片描述

在进行回归时，假设回归值的上下限分别为 $v_u,v_l$ ，作者为 $v_u,v_l$ 各自设定了两个 prototypes $\mathbf p_u,\mathbf p_l$ 并规定它们方向相反，即 $\cos\theta_{\mathbf p_u,\mathbf p_l}=-1$ ，训练时的损失函数为
样本特征与 $\mathbf p_u$ 间的余弦相似度即为归一化后的预测值
Our approach to regression differs from standard regression, which backpropagate losses on one-dimensional outputs. In the context of our work, this corresponds to an optimization on the line from $\mathbf p_u$ to $\mathbf p_l$ . Our approach generalizes regression to higher dimensional output spaces. While we still interpolate between two points, the ability to project to higher dimensional outputs provides additional degrees of freedom to help the regression optimization. As we will show in the experiments, this generalization results in a better and more robust performance than mean squared error.

Joint regression and classification

hyperspherical prototype networks 可以在同一个超球上完成分类和回归任务，只需要满足回归任务上下限对应的 prototypes 对应欧式空间的一个轴，其余轴则用于分类任务

Experiments

Classification

Evaluating hyperspherical prototypes
Prototypes with privileged information.
Comparison to other prototype networks.
Comparison to softmax cross-entropy. We conclude that we are comparable to softmax cross-entropy for sufficient examples and preferred when examples per class are unevenly distributed or scarce.

Regression

在这里插入图片描述

Joint regression and classification

Rotated MNIST. We classify the digits and regress on their rotation. We employ $\mathbb S^2$ as output, where the classes are separated along the $(x, y)$ -plane and the rotations are projected along the $z$ -axis.
Predicting creation year and art style.