CS224W: Machine Learning with Graphs
Stanford / Winter 2021
11-reasoning
Query Types
KG Query Types
One-hop Queries
One-hop Queries
- Defination: Is t t t an answer to query ( h , ( r ) ) (h,(r)) (h,(r))?
Path Queries
Path Queries
-
An n-hop query q q q can be represented by
q = ( v a , ( r 1 , … , r n ) ) q=\left(v_{a},\left(r_{1}, \ldots, r_{n}\right)\right) q=(va,(r1,…,rn))
Conjunctive Queries
Conjunctive Queries
-
Logic Conjunction Operation
Answering Predictive Queries on Knowledge Graphs
如何回答知识图谱的推理问题
Traversing Knowledge Graphs
遍历知识图谱
-
通过遍历知识图谱的节点与边来获取answer
-
但知识图谱本身可能不完整,缺失大量关系,这会造成答案不完整
-
Question: 先通过前述知识图谱补全任务补全知识图谱而后进行遍历?
-
不可行
-
知识图谱补全任务中,很多关系都会有非零的概率(网络预测输出的值大多都是非0的,很少有真的等于0的),这会导致知识图谱在补全后非常密集
-
遍历密集图的时间复杂度是指数级别的
-
Traversing KG in Vector Space
Paper : Traversing Knowledge Graphs in Vector Space
在向量空间内遍历KG,隐式估计缺失边
-
Key Idea: Embed queries
- 利用TransE等进行知识图谱推理
-
Insight
-
可以在KG Completion Task上训练TransE
-
因为TransE可以建模Composition Relations,所以它可以被用在Path Queries上(向量加和就是一种组合的关系)
-
对于TransR、DistMult、ComplEx,由于它们不能建模Composition Relations。所以不能用在Path Queries上
-
Query2box
Paper : QUERY2BOX: REASONING OVER KNOWLEDGE GRAPHS IN VECTOR SPACE USING BOX EMBEDDINGS
Query2box
-
Box Embeddings
- Embed queries with hyper-rectangles (boxes)
-
Entity Embeddings
- Entities are seen as zero-volume boxes
-
Relation Embeddings
- Each relation takes a box and produces a new box
-
Intersection operator f f f
-
New operator, inputs are boxes and output is a box
-
Intuitively models intersection of boxes
-
-
Projection Operator
- Take the current box as input and use the relation embedding to project and expand the box
-
Example
- Use projection operator
-
Intersection Operator
- boxes相交部分的中心点应该位于如图所示红色区域内,而且与三个box的中心点相关
Cen ( q inter ) = ∑ i w i ⊙ Cen ( q i ) w i = exp ( f cen ( Cen ( q i ) ) ) ∑ j exp ( f cen ( Cen ( q j ) ) ) Cen ( q i ) ∈ R d w i ∈ R d \begin{gathered} \operatorname{Cen}\left(q_{\text {inter }}\right)=\sum_{i} \boldsymbol{w}_{i} \odot \operatorname{Cen}\left(q_{i}\right) \\ \boldsymbol{w}_{i}=\frac{\exp \left(f_{\text {cen }}\left(\operatorname{Cen}\left(q_{i}\right)\right)\right)}{\sum_{j} \exp \left(f_{\text {cen }}\left(\operatorname{Cen}\left(q_{j}\right)\right)\right)} \quad \begin{array}{c} \operatorname{Cen}\left(q_{i}\right) \in \mathbb{R}^{d} \\ \boldsymbol{w}_{i} \in \mathbb{R}^{d} \end{array} \end{gathered} Cen(qinter )=i∑wi⊙Cen(qi)wi=∑jexp(fcen (Cen(qj)))exp(fcen (Cen(qi)))Cen(qi)∈Rdwi∈Rd
其中, f c e n f_{cen} fcen表示一个神经网络- boxes相交部分的偏移量应该比任意一个输入的box都小,所以我们使用min取得输入box最小的偏移量,再乘以一个由Sigmoid缩放到[0,1]之间的数,以保证偏移量一定比原先任一个box小
Off ( q inter ) = min ( Off ( q 1 ) , … , Off ( q n ) ) ⊙ σ ( f off ( Off ( q 1 ) , … , Off ( q n ) ) ) \begin{aligned} &\operatorname{Off}\left(q_{\text {inter }}\right) \\ &=\min \left(\operatorname{Off}\left(q_{1}\right), \ldots, \operatorname{Off}\left(q_{n}\right)\right) \\ &\odot \sigma\left(f_{\text {off }}\left(\operatorname{Off}\left(q_{1}\right), \ldots, \operatorname{Off}\left(q_{n}\right)\right)\right) \end{aligned} Off(qinter )=min(Off(q1),…,Off(qn))⊙σ(foff (Off(q1),…,Off(qn)))
f o f f f_{off} foff表示一个神经网络 -
Entity-to-Box Distance
- Entity到Box中心点(q到v)的距离可以定义为
d box ( q , v ) = d out ( q , v ) + α ⋅ d in ( q , v ) d_{\text {box }}(\mathbf{q}, \mathbf{v})=d_{\text {out }}(\mathbf{q}, \mathbf{v})+\alpha \cdot d_{\text {in }}(\mathbf{q}, \mathbf{v}) dbox (q,v)=dout (q,v)+α⋅din (q,v)
其中 0 < α < 1 0<\alpha<1 0<α<1- 若点在box内,则距离应该被降低权重(???)
f q ( v ) = − d b o x ( q , v ) f_{q}(v)=-d_{b o x}(\mathbf{q}, \mathbf{v}) fq(v)=−dbox(q,v)
f q ( v ) f_{q}(v) fq(v) captures inverse distance of a node v v v as answer to q q q -
AND-OR Queries
-
Conjunctive queries + disjunction = Existential Positive First-order (EPFO) queries = AND-OR queries
-
能否在向量空间定义AND-OR Queries?
- 不行!
-
通过将AND-OR逻辑条件等价转换成DNF(Disjunction of conjunctive queries),将所有的并集操作都放在最后
-
Given any AND-OR query q q q
q = q 1 ∨ q 2 ∨ ⋯ ∨ q m q=q_{1} \vee q_{2} \vee \cdots \vee q_{m} q=q1∨q2∨⋯∨qm
q i q_i qi为conjunctive query
-
-
Distance between entity embedding and a DNF q = q 1 ∨ q 2 ∨ ⋯ ∨ q m q=q_{1} \vee q_{2} \vee \cdots \vee q_{m} q=q1∨q2∨⋯∨qm is defined as
d box ( q , v ) = min ( d box ( q 1 , v ) , … , d box ( q m , v ) ) d_{\text {box }}(\mathbf{q}, \mathbf{v})=\min \left(d_{\text {box }}\left(\mathbf{q}_{1}, \mathbf{v}\right), \ldots, d_{\text {box }}\left(\mathbf{q}_{m}, \mathbf{v}\right)\right) dbox (q,v)=min(dbox (q1,v),…,dbox (qm,v))
-
v v v到 q q q的距离定义为 v v v到各 q i q_i qi距离的最小值
-
若 v v v是某个conjunctive query q i q_i qi的answer,那么 v v v也是 q q q的answer
-
若 v v v距离某个conjunctive query q i q_i qi距离很近,那么 v v v也应在向量空间中距离 q q q很近
-
-
AND-OR的embedding过程(并不像交集那样定义了明确的embedding,并集无法定义明确的embedding过程,所以通过转换DNF分别对各个子query求)
-
-
Training
- 注意此处 f q ( v ) f_q(v) fq(v)为negative distance