[论文精读]Multi-Channel Graph Neural Network for Entity Alignment

论文网址:Multi-Channel Graph Neural Network for Entity Alignment (aclanthology.org)

论文代码:https:// github.com/thunlp/MuGNN

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Preliminaries and Framework

2.3.1. Preliminaries

2.3.2. Framework

2.4. KG Completion

2.4.1. Rule Inference and Transfer

2.4.2. Rule Grounding

2.5. Multi-Channel Graph Neural Network

2.5.1. Relation Weighting

2.5.2. Multi-Channel GNN Encoder

2.5.3. Align Model

2.6. Experiment

2.6.1. Experiment Settings

2.6.2. Overall Performance

2.6.3. Impact of Two Channels and Rule Transfer

2.6.4. Impact of Seed Alignments

2.6.5. Qualitative Analysis

2.7. Related Work

2.8. Conclusions

3. 知识补充

3.1. Adagrad Optimizer

4. Reference


1. 心得

(1)是比较容易理解的论文

2. 论文逐段精读

2.1. Abstract

        ①Limitations of entity alignment: structural heterogeneity and limited seed alignments

        ②They proposed Multi-channel Graph Neural Network model (MuGNN)

2.2. Introduction

        ①Knowledge graph (KG) stores information by directed graph, where the nodes are entity and the edges denote relationship

        ②Mother tongue information usually stores more information:

(作者觉得KG1的Jilin会对齐KG2的Jilin City,因为他们有相似的方言和连接的长春。这个感觉不是一定吧?取决于具体模型?感觉还是挺有差别的啊这俩东西,结构上也没有很相似

        ③To solve the problem, it is necessary to fill in missing entities and eliminate unnecessary ones

2.3. Preliminaries and Framework

2.3.1. Preliminaries

(1)KG

        ①Defining a directed graph G=\left ( E,R,T \right ), which contains entity set E, relation set R and triplets T
        ②Triplet t=(e_{i},r_{ij},e_{j})\in T

(2)Rule knowledge

        ①For rule k=(r_{c}|r_{s1},\cdots,r_{sp})\mathcal{K}=\{k\}, it means there are \forall x,y\in E:(x,r_{s},y)\Rightarrow (x,r_{c},y)

(3)Rule Grounding

        ①通过上面的递推,实体可以找到更进一步的关系

(4)Entity alignment

        ①Alignments in two entities: \mathcal{A}_{e}=\{(e,e^{\prime}) \in E\times E^{\prime}|e \leftrightarrow e^{\prime}\}

        ②Alignment relation: \mathcal{A}_{r}^{s}=\{(r,r^{\prime})\in R\times R'|r\leftrightarrow r'\}

2.3.2. Framework

        ①Workflow of MuGNN:

(1)KG completion

        ①Adopt rule mining system AMIE+

(2)Multi-channel Graph Neural Network

        ①Encoding KG in different channels

2.4. KG Completion

2.4.1. Rule Inference and Transfer

        

2.4.2. Rule Grounding

        ①比如从KG2中找到province(x,y) \wedge dialect(y,z) \Rightarrow dialect(x,z)关系,就可以补充到KG1中去

2.5. Multi-Channel Graph Neural Network

2.5.1. Relation Weighting

        ①They will generate a weighted relationship matrix

        ②They construct self attention adjacency matrix and cross-KG attention adjacency matrix for each channel

(1)KG Self-Attention(这个是为了补齐)

        ①Normalized connection weights:

a_{ij}=softmax(c_{ij})=\frac{exp(c_{ij})}{\sum_{e_{k}\in N_{e_{i}}\cup e_{i}}exp(c_{ik})}

where e_i contains self loop and e_{k} \in N_{e_{i}}\cup\{e_{i}\} denotes the neighbors of e_i

        ②c_{ij} denotes the attention coefficient between two entities:

\begin{aligned} \text{cij}& =attn(\mathbf{We_{i}},\mathbf{We_{j}}) \\ &=LeakyReLU(\mathbf{p[We_{i}\|We_{j}]}) \end{aligned}

where \mathbf{W} and \mathbf{p} are trainable parameters

(2)Cross-KG Attention(这个是为了修剪,是另一个邻接矩阵)

        ①Pruning operation :

a_{ij}=\max\limits_{r\in R,r'\in R'}\mathbf{1}((e_i,r,e_j)\in T)sim(r,r')

if (e_i,r,e_j)\in T) is true then it will be 1 otherwise 0, sim\left ( \cdot \right ) denotes inner product similarity measure sim(r,r')=\mathbf{r}^{T}\mathbf{r}^{\prime}

2.5.2. Multi-Channel GNN Encoder

       ①Propagation of GNN:

\mathrm{GNN}(A,H,W)=\sigma(\mathbf{AHW})

and they chose \sigma \left ( \cdot \right ) as ReLU

        ②Multi GNN encoder:

\mathrm{MultiGNN}(H^{l};A_{1},\cdots,A_{c})=\mathrm{Pooling}(H_{1}^{l+1},\cdots,H_{c}^{l+1})

where c denotes the number of channels

        ③Updating function:

\mathbf{H}_i^{l+1}=\mathrm{GNN}(A_i,H^l,W_i)

        ④Pooling strategy: mean pooling

2.5.3. Align Model

        ①Embedding two KG to the same vector space and measure the distance to judge the equivalence relation:

\mathcal{L}_{a}=\sum_{(e,e^{'})\in\mathcal{A}_{e}^{s}}\sum_{(e_{-},e_{-}^{'})\in\mathcal{A}_{e}^{s-}}[d(e,e^{'})+\gamma_{1}-d(e_{-},e_{-}^{'})]_{+}+\\\sum_{(r,r^{'})\in\mathcal{A}_{r}^{s}}\sum_{(r_{-},r_{-}^{'})\in\mathcal{A}_{r}^{s-}}[d(r,r^{'})+\gamma_{2}-d(r_{-},r_{-}^{'})]_{+}

where [\cdot]_{+}=max\{0,\cdot\}d(\cdot)=\|\cdot\|_{2}\mathcal{A}_e^{s-} and \mathcal{A}_r^{s-} are negative pairs in the original sets, \gamma _1> 0 and \gamma _2> 0 are margin hyper-parameters separating positive and negative entity and relation alignments

        ②Triplet loss:

\begin{gathered} L_{r} =\sum_{g^{+}\in\mathcal{G}(\mathcal{K})g^{-}\in\mathcal{G}^{-}(\mathcal{K})}[\gamma_{r}-I(g^{+})+I(g^{-})]_{+} \\ +\sum_{t^{+}\in Tt^{-}\in T^{-}}[\gamma_{r}-I(t^{+})+I(t^{-})]_{+} \end{gathered}

        ③I\left ( \cdot \right ) denotes the true value function for triplet t:

I(t)=1-\frac{1}{3\sqrt{d}}\|\mathbf{e}_{i}+\mathbf{r}_{ij}-\mathbf{e}_{j}\|_{2}

then it can be recursively transformed into:

I(t_{s})=I(t_{s1}\wedge t_{s2})=I(t_{s1})\cdot I(t_{s2})\\I(t_{s}\Rightarrow t_{c})=I(t_{s})\cdot I(t_{c})-I(t_{s})+1

where d is the embedding size

        ④The overall loss:

\mathcal{L}=\mathcal{L}_a+\mathcal{L}_r'+\mathcal{L}_r

2.6. Experiment

2.6.1. Experiment Settings

(1)Datasets

        ①Datasets: DBP15K (contains DBPZH-EN(Chinese to English), DBPJA-EN (Japanese to English), and DBPFREN (French to English)) and DWY100K (contains DWY-WD (DBpedia to Wikidata) and DWY-YG (DBpedia to YAGO3))

        ②Statistics of datasets:

        ③Statistics of KG in datasets:

(2)Baselines

        ①MTransE

        ②JAPE

        ③GCN-Align

        ④AlignEA

(3)Training Details

        ①Training ratio: 30% for training and 70% for testing

        ②All the embedding size: 128

        ③All the GNN layers: 2

        ④Optimizer: Adagrad

        ⑤Hyperparameter: \gamma _1=1.0,\gamma _2=1.0,\gamma _r=0.12

        ⑥Grid search to learning rate in {0.1,0.01,0.001}, L2 in {0.01,0.001,0.0001}, dropout rate in {0.1,0.2,0.5}. They finally got 0.001,0.01,0.2 optimal each

2.6.2. Overall Performance

2.6.3. Impact of Two Channels and Rule Transfer

        ①Module ablation:

2.6.4. Impact of Seed Alignments

        ①Ratio of seeds:

2.6.5. Qualitative Analysis

        ①Two examples of how the rule works:

2.7. Related Work

        Introduces some related works

2.8. Conclusions

        They aim to further research word ambiguity

3. 知识补充

3.1. Adagrad Optimizer

(1)补充学习:Deep Learning 最优化方法之AdaGrad - 知乎 (zhihu.com)

4. Reference

Cao, Y. et al. (2019) 'Multi-Channel Graph Neural Network for Entity Alignment', Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, doi: 10.18653/v1/P19-1140

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值