Hyperbolic Representation Learning for NLP

最新推荐文章于 2023-12-30 02:21:13 发布

连理o

最新推荐文章于 2023-12-30 02:21:13 发布

阅读量343

点赞数

文章标签：双曲神经网络

本文链接：https://blog.csdn.net/weixin_42437114/article/details/128951728

版权

39 篇文章 1 订阅

订阅专栏

Hyperbolic geometry

作者提出 Hyperbolic Interaction Model (HyperIM)，将 word 和 label hierarchies 共同嵌入到双曲空间，从而学得 label-aware document representations 并解决 HMLC 问题

在这里插入图片描述

Hyperbolic Label Embedding.
其中， $\Theta^L=\{\theta_i^l\}_{i=1}^C$ 为 label embedding set， $\theta_i^l\in\mathcal B^k$ 位于 Poincaré ball 上， $\mathcal N(l_p)=\{l_{q'}|(l_p,l_{q'})\notin\mathcal T\}\cup\{l_p\}$ ， $\mathcal T$ 为 label hierarchy 的边集
Hyperbolic Word Embedding. 给定语料库中的 word co-occurrences 信息，作者采用 Poincaré GloVe 来获取 hyperbolic word embed $\Theta^E$
Hyperbolic Word Encoder. 单词具有一词多义性，直接将 word embed 和 label embed 做交互会导致模型难以区分多义词的不同含义 (单词的不同含义都对应同一个 embed)，为此，作者使用 hyperbolic GRU 来引入上下文信息，输入为预训练的 hyperbolic word embed，输出为结合了上下文信息的 hyperbolic word embed $\Theta^w$ ，它将用于和 label embed 交互
Interaction in the Hyperbolic Space.
(1) Label-Aware Document Representations.
其中 $\theta^w_t\in\Theta^w$ 为文本中的 $t$ -th word embed， $\theta_i^l\in\Theta^L$ 为文本的 $i$ -th label. $i$ -th label-aware document representation 为
集合 $\mathcal S=\{s_i\}_{i=1}$ 为 label-aware document representations
(2) Prediction.
其中 $W^f\in\R^{(T/2)\times T}$ ， $W^e\in\R^{1\times (T/2)}$
(3) Partial Interaction.

Datasets
Evaluation Metrics. Precision@ $k$ (P@ $k$ for short) and nDCG@ $k$ for $k$ =1, 3, 5
其中 $r=\{1,...,C\}^k$ 为 $k$ 个最可能的 label，按降序排列， $r_{[1]}$ 即为最可能的 label. P@ $k$ 为最可能的 top- $k$ labels 中预测正确的 label 占比， $y\|_0$ 为 true labels 的数量。最终的指标是测试集上所有样本指标的均值. Notice that nDCG@1 is omitted in the results since it gives the same value as P@1.
Results
Ablation Test - Euclidean Interaction Model
Interaction Visualization

Document Model $\mathcal F_w$ . 作者使用 TextCNN，输入文本 $D$ ，输出文本 embed $\mathcal F_w(D)\in\R^n$
Label Embedding Model $\mathcal G_{\Theta}$ . 作者使用 Embedding 层，输入 label $l$ ，输出 label embed ${\Theta}_l$ ，然后使用映射 $\Pi(x)=\frac{x}{1+\sqrt{1+\|x\|_2^2}}$ 将其投影到 Poincaré manifold 上得到 $\Pi({\Theta}_l)$
Alignment Model.

First Term. 用于促进文本 embed 和 label embed 对齐的 BCE loss
其中 $L$ 为 GT label 数
Second Term. 让具有高共现度的 label embed 在双曲空间上距离相互靠近，从而隐式地学习 label hierarchy
其中 $\mathcal N(l,l')$ 为 the set of all labels that less frequently co-occur with $l$ than $l^{'}$
Overall objective function.
其中 $\lambda=0.1$

$\text{HIDDEN}_{\text{cas}}$ (HIDDEN cascade). 先用 $\mathcal L_2$ 优化 label embed，再固定 label embed，用 $\mathcal L_1$ 优化文本 embed
$\text{HIDDEN}_{\text{flt}}$ (HIDDEN flat). 固定 $\Theta_{\text{flat}}$ 为单位矩阵，即 label embed 使用 one-hot 向量，然后用 $\mathcal L_1$ 优化文本 embed
$\text{HIDDEN}_{\text{euc}}$ (HIDDEN euclidean). 在 $\mathcal L_2$ 中使用欧式距离