Heterogeneous Graphs and Knowledge Graph Embeddings
1. Heterogeneous Graphs
A heterogeneous graph is defined as
G
=
(
V
,
E
,
R
,
T
)
G=(V,E,R,T)
G=(V,E,R,T)
- Nodes with node types v i ∈ V v_i\in V vi∈V
- Edges with relation types ( v i , r , v j ) ∈ E (v_i,r,v_j)\in E (vi,r,vj)∈E
- Node type T ( v i ) T(v_i) T(vi)
- Relation type r ∈ R r\in R r∈R
Example
- Example nodes: SFO, EWR, UA689
- Example edges: (UA689, origin, LAX)
- Example node types: flight, airport, cause
- Example edge types (relation): destination, origin, cancelled by, delayed by
1). Relational GCN
a). Definition
For directed graphs with one relation, we only pass messages along direction of edges
For directed graphs with multiple relation types, we use different NN weights for different relation types.
h
v
k
+
1
=
σ
(
∑
r
∈
R
∑
u
∈
N
v
r
1
c
v
,
r
W
r
l
h
u
l
+
W
0
l
h
v
l
)
h^{k+1}_v=\sigma(\sum_{r\in R}\sum_{u\in N_v^r}\dfrac{1}{c_{v,r}}W^l_rh^l_u+W^l_0h^l_v)
hvk+1=σ(r∈R∑u∈Nvr∑cv,r1Wrlhul+W0lhvl)
Normalized by node degree of the relation
c
v
,
r
=
∣
N
v
r
∣
c_{v,r}=|N_v^r|
cv,r=∣Nvr∣
b). Scalability
Each relation has
L
L
L matrices:
W
r
1
,
W
r
2
,
W
r
3
,
⋯
,
W
r
L
W^1_r, W^2_r, W^3_r,\cdots, W^L_r
Wr1,Wr2,Wr3,⋯,WrL so the size of each
W
r
l
W^l_r
Wrl is
d
l
+
1
×
d
l
d^{l+1}\times d^l
dl+1×dl
Problem: 1) rapid number of parameters growth w.r.t. number of relations and 2) overfitting
Two methods to regularize the weights
W
r
l
W^l_r
Wrl
-
Use block diagonal matrices
Key insight: make the weights sparse
If use B B B low-dimensional matrices, then the number of parameters redces from d l + 1 × d l d^{l+1}\times d^l dl+1×dl to B × d l + 1 B × d l B B\times \dfrac{d^{l+1}}{B}\times\dfrac{d^{l}}{B} B×Bdl+1×Bdl
Limitation: only nearby neurons/dimensions can interact through W W W -
Basis/Dictionary learning
Key insight: share weights across different relations
Represent the matrix of each relation as a linear combination of basis transformations W r = ∑ b = 1 B a r b ⋅ V b W_r=\sum_{b=1}^Ba_{rb}\cdot V_b Wr=∑b=1Barb⋅Vb, where V b V_b Vb is the basis matrices shared across all relations and a r b a_{rb} arb is the learnable importance weight of matrix V b V_b Vb
c). Example: Entity/node classification and link prediction
To be updated.
2). Knowledge Graphs: KG Completion with Embeddings
Knowledge in graph form:
- Nodes are entities labeled with their types
- Edges between two nodes capture relationships betweem entities
- KG is an example of a heterogeneous graph
a). Example of KGs
- Bibliographic networks
- Bio KGs
- Google KG
- Amazon Product Graph
- Facebook Graph APi
- IBM Watson
- Microsoft Satori
b). Application of KGs
- Serving information
- Question answering and conversation agents
c). KG Datasets
FreeBase, Wikidata, DBpedia, YAGO, NELL, etc.
Common characteristics:
- Massive: millions of nodes and edges
- Incomplete: many true edges are missing
d). Connectivity patterns in KG
Relations in a heterogeneous KG have different properties
- Symmetric relations:
r
(
h
,
t
)
=
r
(
t
,
h
)
∀
h
,
t
r(h,t)=r(t,h)\quad\forall h,t
r(h,t)=r(t,h)∀h,t
Example: family, roommate - Antisymmetric relations:
r
(
h
,
t
)
=
¬
r
(
t
,
h
)
∀
h
,
t
r(h,t)=\lnot r(t,h)\quad\forall h,t
r(h,t)=¬r(t,h)∀h,t
Example: hypernym - Inverse relations:
r
2
(
h
,
t
)
⇒
r
1
(
t
,
h
)
r_2(h,t)\Rightarrow r_1(t,h)
r2(h,t)⇒r1(t,h)
Example: (advisor, advisee) - Composition (transitive) relations:
r
1
(
x
,
y
)
∧
r
2
(
y
,
z
)
⇒
r
3
(
x
,
z
)
∀
x
,
y
,
z
r_1(x,y) \wedge r_2(y,z) \Rightarrow r_3(x,z) \quad \forall x,y,z
r1(x,y)∧r2(y,z)⇒r3(x,z)∀x,y,z
Example: my mother’s husband is my father - 1-to-N relations:
r
(
h
,
t
1
)
,
r
(
h
,
t
2
)
,
r
(
h
,
t
3
)
,
…
,
r
(
h
,
t
n
)
r(h,t_1),r(h,t_2),r(h,t_3),\dots,r(h,t_n)
r(h,t1),r(h,t2),r(h,t3),…,r(h,tn) are all true
Example: r r r is “StudentOf”
3). KG Completion
KG completion task: for a given (head, relation), we predict missing tails.
a). KG representation
Edges in KG are represented as triples
(
h
,
r
,
t
)
(h,r,t)
(h,r,t): head
h
h
h has relation
r
r
r with tail
t
t
t
Key idea
- model entities and relations in the embedding/vector space R d R^d Rd and associate entities and relations with shallow embeddings
- Given a true triple ( h , r , t ) (h,r,t) (h,r,t), the goal is that the embedding of ( h , r ) (h,r) (h,r) should be close to the embedding of t t t
b). TransE
For a triple
(
h
,
r
,
t
)
(h,r,t)
(h,r,t),
h
,
r
,
t
∈
R
d
h,r,t\in R^d
h,r,t∈Rd,
h
+
r
≈
t
h+r \approx t
h+r≈t if the given fact is true else
h
+
r
≠
t
h+r \neq t
h+r=t
Scoring function:
f
r
(
h
,
t
)
=
−
∣
∣
h
+
r
−
t
∣
∣
f_r(h, t) = - ||h+r-t||
fr(h,t)=−∣∣h+r−t∣∣
Limitation: cannot model symmetric relations and 1-to-N relations
b). TransR
Model entities as vectors in the entity space
R
d
R^d
Rd and model each relation as vector in relation space
r
∈
R
k
r\in R^k
r∈Rk with
M
r
∈
R
k
×
d
M_r\in R^{k\times d}
Mr∈Rk×d as the projection matrix
h
⊥
=
M
r
h
,
t
⊥
=
M
r
t
h_\bot=M_rh, t_\bot=M_rt
h⊥=Mrh,t⊥=Mrt
Use
M
r
M_r
Mr to project from entity space
R
d
R^d
Rd to relation space
R
k
R^k
Rk
Scoring function:
f
r
(
h
,
t
)
=
−
∣
∣
h
⊥
+
r
−
t
⊥
∣
∣
f_r(h, t) = - ||h_\bot+r-t_\bot||
fr(h,t)=−∣∣h⊥+r−t⊥∣∣
Limitation: cannot model composition relations (each relation has a different space)
c). DistMult
Entities and relations using vectors in
R
k
R^k
Rk
Scoring function:
f
r
(
h
,
t
)
=
<
h
,
r
,
t
>
=
∑
i
h
i
⋅
r
i
⋅
t
i
,
h
,
r
,
t
∈
R
k
f_r(h, t) = <h,r,t>=\sum_ih_i \cdot r_i \cdot t_i, h,r,t\in R^k
fr(h,t)=<h,r,t>=∑ihi⋅ri⋅ti,h,r,t∈Rk
It can be viewed as a cosine similarity between
h
⋅
r
h\cdot r
h⋅r and
t
t
t
Limitation: cannot model antisymmetric relations, composition relations and inverse relations
d). ComplEx
Based on DistMult, ComplEx embeds entities and relations in Complex vector space using vectors in
C
k
C^k
Ck
Scoring function:
f
r
(
h
,
t
)
=
Re
(
∑
i
h
i
⋅
r
i
⋅
t
ˉ
i
)
f_r(h, t) =\text{Re}(\sum_ih_i \cdot r_i \cdot {\bar t}_i)
fr(h,t)=Re(∑ihi⋅ri⋅tˉi)
Limitation: cannot model composition relations
e). KG embeddings in practice
- Different KGs may have drastically different relation patterns
- There is not a general embedding that works for all KGs
- Try TransE for a quick run if the target KG does not have much symmetric relations
- Then use more expressive models, e.g., ComplEx, RotatE