GNN-CS224W:10 Knowledge Graph Embeddings

最新推荐文章于 2024-05-26 09:46:33 发布

当客

最新推荐文章于 2024-05-26 09:46:33 发布

阅读量225

点赞数

文章标签：人工智能

本文链接：https://blog.csdn.net/u012492756/article/details/118339345

版权

Heterogeneous Graph

有不止一种edge、node的graph就是Heterogeneous Graph

Heterogeneous Graph可以用4元组来表示： $G = (V, E, R, T)$

其中 $v_i \in V$ 表示node，node 的type可以通过 $T(v_i)$ 得到

edge用3元组表示 $(v_i, r, v_j) \in E$ ，关系 $\in R$

例子在这里插入图片描述

Relational GCN (RGCN)

用来处理Heterogeneous Graph的GCN扩展形式，Ideas can easily be extended into other RGNN (RGraphSAGE, RGAT, etc.)

graph is directed

Use different neural network weights for different relation types

如下图所示，不同的relation 的weight matrix对应了不同的颜色

在这里插入图片描述

公式

在这里插入图片描述

参数过多

对每一种关系，在每一层都有不同的weight matrix，size= $d^{(l+1)}, d^{(l)})$

而在现实的很多类型的graph里有非常多的relation type，则会有非常多的relation weight matrix，参数会非常多

参数越多，相对数据越少，就越可能过拟合

以下为解决办法：

(1) Use block diagonal matrices

matrices为matrix的复数

Key insight: make the weights sparse!

block diagonal matrices是形如下图的matrix

在这里插入图片描述

sparse matrix 起的作用是减少了需要估计的参数，matrix中为0的参数不需要估计

假设使用B个小matrix，则每个小matrix的size为 $(\frac{d^{(l+1)}}{B}, \frac{d^{(l)}}{B})$ ，参数数量由原来的 $d^{(l+1)}\times d^{(l)}$ 减少为 $\times \frac{d^{(l+1)}}{B} \times \frac{d^{(l)}}{B}=\frac{d^{(l+1)}\times d^{(l)}}{B}$ ，即变成原来的 $\frac{1}{B}$

作用

减少参数会使得加速训练、避免过拟合、模型更robust等

缺点

使用这样的矩阵使得只有临近维度的参数可以interact

问题：
怎么决定是什么样的block diagonal matrices？B是多少？每一层应该怎么变化？

(2) Basis/Dictionary learning

Key insight: Share weights across different relations

使用B个基础matrix，各个关系的matrix由基础matrix 线性组合得到。
$W_r=\sum\limits_{b=1}^B a_{rb} \cdot V_b$

其中 $V_b$ 为基础矩阵， $a_{rb}$ 为关系 $r$ 针对基础矩阵 $b$ 的权重。

需要train的参数只有基础matrix和和线性组合时的权重，而基础matrix可以远小于关系数量，所以参数减少了很多。

问题：怎么确定B有多少个？

Task

node classification只需将node representation作为输出层的输入即可

Link prediction

除dataset split方法外和以前的Link prediction一样

dataset split

将各个关系(link)都分成4部分：
(1) Training message edges
(2) Training supervision edges
(3) Validation edges
(4) Test edges
将各个关系的4部分分别合起来，形成完整的4个部分

这样做的目的是为了让每个关系的样本能在数据集的4部分均匀分布，可以防止有些数量较少的关系集中于某些数据集。

Knowledge graph completion

现实中的KG中node、link都非常多，node type、link type也非常多；

为什么要 KG completion？

且KG中有大量缺失的信息，例如FreeBase中93.8%的person have no place of birth and 78.5% have no nationality!

Knowledge graph completion就是要自动的补齐这些缺失的信息。

task

given (head, relation), predict missing tails

例如给定(J.K. Rowling, genre(类型))预测具体的genre是什么，相当于问了一个问题 “J.K. Rowling 的genre 是什么?”

Given a true triple $(h e a d, r e l a t i o n, t a i l)$ , the goal is that the embedding of $(h, r)$ should be close to the embedding of $t$ .

Knowledge graph embedding

Relation Patterns

不同的关系有可能有不同的性质，例如Symmetry、Inverse relation、传递性等

以下为4种关系的性质。这里讲这个的目的是要判断各种模型的是否能识别这些关系的特性。

Symmetric (Antisymmetric) Relations

Symmetric: $\Rightarrow r(t,h)$

例如 Family, Roommate

Antisymmetric: $\Rightarrow \neg r(t,h)$

Antisymmetric: h和t有关系r，则t和h不可能有关系r

例如：Hypernym(上位词)

Inverse Relations

$r_1(h,t) \Rightarrow r_2(t,h)$

h和t有关系 $r_1$ ，则t和h一定有另一个关系 $r_2$

例如 (Advisor, Advisee)

Composition (Transitive) Relations

Transitive relations: $\wedge r(y,z) \Rightarrow r(x,z)$

传递性，例如friends，x是y的friend，y是z的friend，则x和z也是friend

composition (合成) relations: $r_1(x,y) \wedge r_2(y,z) \Rightarrow r_3(x,z)$

例如：My mother’s husband is my father

1-to-N relations

例如 “StudentsOf”，即一个老师对应多个学生

Model

课程内容只用到了shallow node embedding，也可以使用GNN，但是没讲

下图是各个模型能encode的relation类型，实际情况下各种KG差别很大，没有绝对最好的模型，按需选择

在这里插入图片描述

TransE

For a triple $(h, r, t)$ ， $\in R^d$ (都是d维向量):
$h+r\approx t$ if the given fact is true
else $\neq t$

例如：奥巴马 + 国籍 $\approx$ 美国，奥巴马 + 国籍 $\neq$ 英国

Scoring function： $f_r(h,t)=-||h+r-t||$

具体方法没有详细介绍，可以大概参考下图
在这里插入图片描述

作用

得到encode了三元组的 node、relation 的embedding

对各种关系的表达能力

Symmetric Relations
不能，如图所示h+r到达了t，t+r不会返回到h
Antisymmetric Relations
能，如上图所示，解释同上
Inverse Relations
能
Composition (Transitive) Relations
能，预先设定了关系 $r_3=r_1+r_2$ ， $x+r_1$ 到达了y， $y+r_2$ 到达了z，则 $x+r_3$ 可以到达z
1-to-N relations
不能，因为从一个点出发加一个关系，只能到达1个点，不能是多个点

TransR

TransE models translation of any relation in the same embedding space
TransR design a new space for each relation and do translation in relation-specific space

model entities as vectors in the entity space $R^d$ ，对每个relation构建一个embedding space, $\in R^k$ ，用 $M_r \in R^{k\times d}$ , 将entity从entity space 映射到对应的relation space

$M_r$ 是relation specific的，每个relation都有一个对应的 $M_r$

在这里插入图片描述

问题：关系多时也需要很多关系矩阵，需要想办法减少参数吗？

对各种关系的表达能力

Symmetric Relations
能，如下图所以，可以将h和t映射到 r space的同一个点，然后让 $r = 0$
Antisymmetric Relations
能，这个只要r space和entity space 类似即可
Inverse Relations
能
Composition (Transitive) Relations
不能，TransR中不同的relation处于不同的空间，而Composition Relation需要不同的relation处于同一个空间，所以不可以
1-to-N relations
能，如下图所示，只需要同类的entity都映射到r space的同一个点即可

DistMult

entity和relation处于同一个k维空间

scoring function: $f_r(h,t)=<h,r,t>=\sum\limits_{k=1}^K h_k \cdot r_k \cdot t_k$

即将h、r、t 3个向量各个维度分别相乘然后再求和

当(h,r,t)是真实三元组时， $f_r(h,t)$ 应该大于0，否则小于0

和 $h\cdot r$ 、 $t$ 之间的cosine similarity ( $\cos{\theta}=\frac{\vec{a} \cdot \vec{b}}{||\vec{a}|| \times ||\vec{b}||}$ ) 类似，如下图所示，如果 $h\cdot r$ 、 $t$ 接近，score 应该大于0，否则小于0

相当于给 $h\cdot r$ 定义了一个超平面，把对应的正例和负例区分开

在这里插入图片描述

问题：
训练时的优化目标是什么？让正例的score > 0，负例score<0吗？
考虑的只是大于0小于0吗？具体有多大或者多小不考虑吗？

课程里没讲清楚，需要时再去查

对各种关系的表达能力

Symmetric Relations
能，因为h和t交换顺序后计算时是完全一样的， $f_r(h,t)=<h,r,t>=\sum\limits_{k=1}^K h_k \cdot r_k \cdot t_k=<t,r,h>=f_r(t, h)$
Antisymmetric Relations
不能，因为h和t交换顺序后计算时是完全一样的，只能表达对称的(一样的关系)
Inverse Relations
不能，Inverse Relation的意思是有 $h,r_1,t>$ 则必然有 $t,r_2,h>$ ，即 $f_{r_1}(h,t)>0$ 且 $f_{r_2}(t, h)>0$ ，如果满足则 $r_1$ 和 $r_2$ 会非常相似。而inverse relation需要两个关系是相反的。
Composition (Transitive) Relations
不能，两个三元组中的(head, relation) pair分别定义了一个超平面，而两个超平面不能被一个超平面表示。
课程里就是这么讲的，很不清楚，有需要再深究
1-to-N relations
能，如下图所示 $t_1$ 、 $t_2$ 和 $\cdot r$ 的夹角都小于90度， $t_1$ 、 $t_2$ 都在 $\cdot r$ 的范围内

ComplEx

定义在复数空间上，没搞懂，先不做笔记，需要再回来看

当客

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
GNN-CS224W:10 Knowledge Graph Embeddings

Heterogeneous Graph有不止一种edge、node的graph就是Heterogeneous GraphHeterogeneous Graph可以用4元组来表示：G=(V,E,R,T)G=(V,E,R,T)G=(V,E,R,T)其中vi∈Vv_i \in Vvi∈V表示node，node 的type可以通过T(vi)T(v_i)T(vi)得到edge用3元组表示(vi,r,vj)∈E(v_i, r, v_j) \in E(vi,r,vj)∈E，关系r∈Rr \in Rr∈R
复制链接

扫一扫