CS224W: Machine Learning with Graphs - 10 Heterogeneous Graphs and Knowledge Graph Embeddings

xbfu-xjtu

已于 2022-05-07 07:03:40 修改

阅读量339

点赞数

文章标签：机器学习知识图谱人工智能

于 2022-02-04 07:21:57 首次发布

本文链接：https://blog.csdn.net/fxb163/article/details/122423060

版权

Heterogeneous Graphs and Knowledge Graph Embeddings

1. Heterogeneous Graphs

A heterogeneous graph is defined as
$G = (V, E, R, T)$

Nodes with node types $v_i\in V$
Edges with relation types $(v_i,r,v_j)\in E$
Node type $T(v_i)$
Relation type $r\in R$

Example

Example nodes: SFO, EWR, UA689
Example edges: (UA689, origin, LAX)
Example node types: flight, airport, cause
Example edge types (relation): destination, origin, cancelled by, delayed by

1). Relational GCN

a). Definition

For directed graphs with one relation, we only pass messages along direction of edges
For directed graphs with multiple relation types, we use different NN weights for different relation types.
$h^{k+1}_v=\sigma(\sum_{r\in R}\sum_{u\in N_v^r}\dfrac{1}{c_{v,r}}W^l_rh^l_u+W^l_0h^l_v)$
Normalized by node degree of the relation $c_{v,r}=|N_v^r|$

b). Scalability

Each relation has $L$ matrices: $W^1_r, W^2_r, W^3_r,\cdots, W^L_r$ so the size of each $W^l_r$ is $d^{l+1}\times d^l$
Problem: 1) rapid number of parameters growth w.r.t. number of relations and 2) overfitting
Two methods to regularize the weights $W^l_r$

Use block diagonal matrices
Key insight: make the weights sparse
If use $B$ low-dimensional matrices, then the number of parameters redces from $d^{l+1}\times d^l$ to $B\times \dfrac{d^{l+1}}{B}\times\dfrac{d^{l}}{B}$
Limitation: only nearby neurons/dimensions can interact through $W$
Basis/Dictionary learning
Key insight: share weights across different relations
Represent the matrix of each relation as a linear combination of basis transformations $W_r=\sum_{b=1}^Ba_{rb}\cdot V_b$ , where $V_b$ is the basis matrices shared across all relations and $a_{rb}$ is the learnable importance weight of matrix $V_b$

c). Example: Entity/node classification and link prediction

To be updated.

2). Knowledge Graphs: KG Completion with Embeddings

Knowledge in graph form:

Nodes are entities labeled with their types
Edges between two nodes capture relationships betweem entities
KG is an example of a heterogeneous graph

a). Example of KGs

Bibliographic networks
Bio KGs
Google KG
Amazon Product Graph
Facebook Graph APi
IBM Watson
Microsoft Satori

b). Application of KGs

Serving information
Question answering and conversation agents

c). KG Datasets

FreeBase, Wikidata, DBpedia, YAGO, NELL, etc.
Common characteristics:

Massive: millions of nodes and edges
Incomplete: many true edges are missing

d). Connectivity patterns in KG

Relations in a heterogeneous KG have different properties

Symmetric relations: $r(h,t)=r(t,h)\quad\forall h,t$
Example: family, roommate
Antisymmetric relations: $r(h,t)=\lnot r(t,h)\quad\forall h,t$
Example: hypernym
Inverse relations: $r_2(h,t)\Rightarrow r_1(t,h)$
Example: (advisor, advisee)
Composition (transitive) relations: $r_1(x,y) \wedge r_2(y,z) \Rightarrow r_3(x,z) \quad \forall x,y,z$
Example: my mother’s husband is my father
1-to-N relations: $r(h,t_1),r(h,t_2),r(h,t_3),\dots,r(h,t_n)$ are all true
Example: $r$ is “StudentOf”

3). KG Completion

KG completion task: for a given (head, relation), we predict missing tails.

a). KG representation

Edges in KG are represented as triples $(h, r, t)$ : head $h$ has relation $r$ with tail $t$
Key idea

model entities and relations in the embedding/vector space $R^d$ and associate entities and relations with shallow embeddings
Given a true triple $(h, r, t)$ , the goal is that the embedding of $(h, r)$ should be close to the embedding of $t$

b). TransE

For a triple $(h, r, t)$ , $h,r,t\in R^d$ , $\approx t$ if the given fact is true else $\neq t$
Scoring function: $f_r(h, t) = - ||h+r-t||$
Limitation: cannot model symmetric relations and 1-to-N relations

b). TransR

Model entities as vectors in the entity space $R^d$ and model each relation as vector in relation space $r\in R^k$ with $M_r\in R^{k\times d}$ as the projection matrix
$h_\bot=M_rh, t_\bot=M_rt$
Use $M_r$ to project from entity space $R^d$ to relation space $R^k$
Scoring function: $f_r(h, t) = - ||h_\bot+r-t_\bot||$
Limitation: cannot model composition relations (each relation has a different space)

c). DistMult

Entities and relations using vectors in $R^k$
Scoring function: $f_r(h, t) = <h,r,t>=\sum_ih_i \cdot r_i \cdot t_i, h,r,t\in R^k$
It can be viewed as a cosine similarity between $h\cdot r$ and $t$
Limitation: cannot model antisymmetric relations, composition relations and inverse relations

d). ComplEx

Based on DistMult, ComplEx embeds entities and relations in Complex vector space using vectors in $C^k$
Scoring function: $f_r(h, t) =\text{Re}(\sum_ih_i \cdot r_i \cdot {\bar t}_i)$
Limitation: cannot model composition relations

e). KG embeddings in practice

Different KGs may have drastically different relation patterns
There is not a general embedding that works for all KGs
Try TransE for a quick run if the target KG does not have much symmetric relations
Then use more expressive models, e.g., ComplEx, RotatE

xbfu-xjtu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CS224W: Machine Learning with Graphs - 10 Heterogeneous Graphs and Knowledge Graph Embeddings

CS224W: Machine Learning with Graphs - 10 Heterogeneous Graphs and Knowledge Graph Embeddings
复制链接

扫一扫