[论文精读]Multi-Channel Graph Neural Network for Entity Alignment

最新推荐文章于 2024-09-30 21:48:12 发布

夏莉莉iy

最新推荐文章于 2024-09-30 21:48:12 发布

阅读量690

点赞数 20

分类专栏：论文精读文章标签：人工智能深度学习计算机视觉笔记算法神经网络图论

本文链接：https://blog.csdn.net/Sherlily/article/details/142655462

版权

论文精读专栏收录该内容

76 篇文章 9 订阅

订阅专栏

论文网址：Multi-Channel Graph Neural Network for Entity Alignment (aclanthology.org)

论文代码：https:// github.com/thunlp/MuGNN

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

2.3. Preliminaries and Framework

2.3.1. Preliminaries

2.3.2. Framework

2.4. KG Completion

2.4.1. Rule Inference and Transfer

2.4.2. Rule Grounding

2.5. Multi-Channel Graph Neural Network

2.5.1. Relation Weighting

2.5.2. Multi-Channel GNN Encoder

2.5.3. Align Model

2.6. Experiment

2.6.1. Experiment Settings

2.6.2. Overall Performance

2.6.3. Impact of Two Channels and Rule Transfer

2.6.4. Impact of Seed Alignments

2.6.5. Qualitative Analysis

2.7. Related Work

2.8. Conclusions

3. 知识补充

3.1. Adagrad Optimizer

4. Reference

1. 心得

（1）是比较容易理解的论文

2. 论文逐段精读

2.1. Abstract

①Limitations of entity alignment: structural heterogeneity and limited seed alignments

②They proposed Multi-channel Graph Neural Network model (MuGNN)

2.2. Introduction

①Knowledge graph (KG) stores information by directed graph, where the nodes are entity and the edges denote relationship

②Mother tongue information usually stores more information:

（作者觉得KG1的Jilin会对齐KG2的Jilin City，因为他们有相似的方言和连接的长春。这个感觉不是一定吧？取决于具体模型？感觉还是挺有差别的啊这俩东西，结构上也没有很相似）

③To solve the problem, it is necessary to fill in missing entities and eliminate unnecessary ones

2.3. Preliminaries and Framework

2.3.1. Preliminaries

（1）KG

①Defining a directed graph $G=\left ( E,R,T \right )$ , which contains entity set $E$ , relation set $R$ and triplets $T$
②Triplet $t=(e_{i},r_{ij},e_{j})\in T$

（2）Rule knowledge

①For rule $k=(r_{c}|r_{s1},\cdots,r_{sp})$ , $\mathcal{K}=\{k\}$ , it means there are $\forall x,y\in E:(x,r_{s},y)\Rightarrow (x,r_{c},y)$

（3）Rule Grounding

①通过上面的递推，实体可以找到更进一步的关系

（4）Entity alignment

①Alignments in two entities: $\mathcal{A}_{e}=\{(e,e^{\prime}) \in E\times E^{\prime}|e \leftrightarrow e^{\prime}\}$

②Alignment relation: $\mathcal{A}_{r}^{s}=\{(r,r^{\prime})\in R\times R'|r\leftrightarrow r'\}$

2.3.2. Framework

①Workflow of MuGNN:

（1）KG completion

①Adopt rule mining system AMIE+

（2）Multi-channel Graph Neural Network

①Encoding KG in different channels

2.4. KG Completion

2.4.1. Rule Inference and Transfer

2.4.2. Rule Grounding

①比如从KG2中找到 $province(x,y) \wedge dialect(y,z) \Rightarrow dialect(x,z)$ 关系，就可以补充到KG1中去

2.5. Multi-Channel Graph Neural Network

2.5.1. Relation Weighting

①They will generate a weighted relationship matrix

②They construct self attention adjacency matrix and cross-KG attention adjacency matrix for each channel

（1）KG Self-Attention（这个是为了补齐）

①Normalized connection weights:

$a_{ij}=softmax(c_{ij})=\frac{exp(c_{ij})}{\sum_{e_{k}\in N_{e_{i}}\cup e_{i}}exp(c_{ik})}$

where $e_i$ contains self loop and $e_{k} \in N_{e_{i}}\cup\{e_{i}\}$ denotes the neighbors of $e_i$

② $c_{ij}$ denotes the attention coefficient between two entities:

$\begin{aligned} \text{cij}& =attn(\mathbf{We_{i}},\mathbf{We_{j}}) \\ &=LeakyReLU(\mathbf{p[We_{i}\|We_{j}]}) \end{aligned}$

where $\mathbf{W}$ and $\mathbf{p}$ are trainable parameters

（2）Cross-KG Attention（这个是为了修剪，是另一个邻接矩阵）

①Pruning operation :

$a_{ij}=\max\limits_{r\in R,r'\in R'}\mathbf{1}((e_i,r,e_j)\in T)sim(r,r')$

if $(e_i,r,e_j)\in T)$ is true then it will be 1 otherwise 0, $sim\left ( \cdot \right )$ denotes inner product similarity measure $sim(r,r')=\mathbf{r}^{T}\mathbf{r}^{\prime}$

2.5.2. Multi-Channel GNN Encoder

①Propagation of GNN:

$\mathrm{GNN}(A,H,W)=\sigma(\mathbf{AHW})$

and they chose $\sigma \left ( \cdot \right )$ as ReLU

②Multi GNN encoder:

$\mathrm{MultiGNN}(H^{l};A_{1},\cdots,A_{c})=\mathrm{Pooling}(H_{1}^{l+1},\cdots,H_{c}^{l+1})$

where $c$ denotes the number of channels

③Updating function:

$\mathbf{H}_i^{l+1}=\mathrm{GNN}(A_i,H^l,W_i)$

④Pooling strategy: mean pooling

2.5.3. Align Model

①Embedding two KG to the same vector space and measure the distance to judge the equivalence relation:

$\mathcal{L}_{a}=\sum_{(e,e^{'})\in\mathcal{A}_{e}^{s}}\sum_{(e_{-},e_{-}^{'})\in\mathcal{A}_{e}^{s-}}[d(e,e^{'})+\gamma_{1}-d(e_{-},e_{-}^{'})]_{+}+\\\sum_{(r,r^{'})\in\mathcal{A}_{r}^{s}}\sum_{(r_{-},r_{-}^{'})\in\mathcal{A}_{r}^{s-}}[d(r,r^{'})+\gamma_{2}-d(r_{-},r_{-}^{'})]_{+}$

where $[\cdot]_{+}=max\{0,\cdot\}$ , $d(\cdot)=\|\cdot\|_{2}$ , $\mathcal{A}_e^{s-}$ and $\mathcal{A}_r^{s-}$ are negative pairs in the original sets, $\gamma _1> 0$ and $\gamma _2> 0$ are margin hyper-parameters separating positive and negative entity and relation alignments

②Triplet loss:

$\begin{gathered} L_{r} =\sum_{g^{+}\in\mathcal{G}(\mathcal{K})g^{-}\in\mathcal{G}^{-}(\mathcal{K})}[\gamma_{r}-I(g^{+})+I(g^{-})]_{+} \\ +\sum_{t^{+}\in Tt^{-}\in T^{-}}[\gamma_{r}-I(t^{+})+I(t^{-})]_{+} \end{gathered}$

③ $I\left ( \cdot \right )$ denotes the true value function for triplet $t$ :

$I(t)=1-\frac{1}{3\sqrt{d}}\|\mathbf{e}_{i}+\mathbf{r}_{ij}-\mathbf{e}_{j}\|_{2}$

then it can be recursively transformed into:

$I(t_{s})=I(t_{s1}\wedge t_{s2})=I(t_{s1})\cdot I(t_{s2})\\I(t_{s}\Rightarrow t_{c})=I(t_{s})\cdot I(t_{c})-I(t_{s})+1$

where $d$ is the embedding size

④The overall loss:

$\mathcal{L}=\mathcal{L}_a+\mathcal{L}_r'+\mathcal{L}_r$

2.6. Experiment

2.6.1. Experiment Settings

（1）Datasets

①Datasets: DBP15K (contains DBPZH-EN(Chinese to English), DBPJA-EN (Japanese to English), and DBPFREN (French to English)) and DWY100K (contains DWY-WD (DBpedia to Wikidata) and DWY-YG (DBpedia to YAGO3))

②Statistics of datasets:

③Statistics of KG in datasets:

（2）Baselines

①MTransE

②JAPE

③GCN-Align

④AlignEA

（3）Training Details

①Training ratio: 30% for training and 70% for testing

②All the embedding size: 128

③All the GNN layers: 2

④Optimizer: Adagrad

⑤Hyperparameter: $\gamma _1=1.0,\gamma _2=1.0,\gamma _r=0.12$

⑥Grid search to learning rate in {0.1,0.01,0.001}, L2 in {0.01,0.001,0.0001}, dropout rate in {0.1,0.2,0.5}. They finally got 0.001,0.01,0.2 optimal each

2.6.2. Overall Performance

2.6.3. Impact of Two Channels and Rule Transfer

①Module ablation:

2.6.4. Impact of Seed Alignments

①Ratio of seeds:

2.6.5. Qualitative Analysis

①Two examples of how the rule works:

2.7. Related Work

Introduces some related works

2.8. Conclusions

They aim to further research word ambiguity

3. 知识补充

3.1. Adagrad Optimizer

（1）补充学习：Deep Learning 最优化方法之AdaGrad - 知乎 (zhihu.com)

4. Reference

Cao, Y. et al. (2019) 'Multi-Channel Graph Neural Network for Entity Alignment', Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, doi: 10.18653/v1/P19-1140