《Translating Embeddings for Modeling Multi-relational Data》阅读笔记

Abstract

We propose TransE, embedding entities and relationships of multi-relational data in low-dimensional vector space. It significantly outperforms state-of-the-art methods in link prediction on two knowledge bases. It can also be successfully trained on a large scale data set.

Introduction

Our work focuses on modeling multi-relational data from KBs(Knowledge Base), with the goal of providing an efficient tool to complete them by automatically adding new facts, without requiring extra knowledge.

1. Modeling Multi-relational data

In contrast to single-relational data, the difficulty of multi-relational data is that the notion of locality may involve relationships and entities of different types at the same time, so that modeling multi-relational data requires more generic approaches that can choose the appropriate patterns considering all heterogeneous relationships at the same time.

… suggested that even in complex and heterogeneous multi-relational domains simple yet appropriate modeling assumptions can lead to better trade-offs(权衡) between accuracy and scalability(可扩展性).

2. Relationships as translations in the embedding space

TransE, an energy-based model for learning low-dimensional embeddings of entities. In TransE, realtionships are represented as translations in the embedding space: if (h,l,t) ( h , l , t ) holds, then the embedding of the tail entity should be close to embedding of the head entity h h plus some vector that depends on the relationship l. This approach relies on a reduced set of parameters as it learns only one low-dimensional vector for each entity and each relationship.

The main motivation behind our translation-based parameterization is that hierarchical relationships are extremely common in KBs and translations are the natural transformations for representing them. Indeed, considering the natural representation of trees(i.e. embeddings of the nodes in dimension 2), the siblings are close to each other and nodes at a given height are organized on the x-axis, the parent-child relationship corresponds to a translation on the y-axis.

ps: a null translation vector corresponds to an equivalence relationship between entities.

Translation-based Model

Given a training set S S of triplets (h,l,t) composed of two entities h,tE h , t ∈ E (the set of entities) and a relationship lL l ∈ L (the set of relationships), the model learns vector embeddings of the entities and the relationships.

Note that for a given entity, its embedding vector is the same when the entity appears as the head or as the tail of a triplet.

We want that h+lt h + l ≈ t when (h,l,t) ( h , l , t ) holds ( t t should be a nearest neighbor of h+l), while h+l h + l should be far away from t t otherwise.

transe算法

measure d

Following an energy-based framework, the energy of a triplet is equal to d(h+l,t) for some dissimilarity measure d d , which we take to be either the L1 or the L2 L 2 -norm(曼哈顿或欧几里得距离).
d(h+l,t)=||h+tl||22 d ( h + l , t ) = | | h + t − l | | 2 2

corrupted triplets S(h,l,t) S ( h , l , t ) ′

either the head or tail replaced by a random entity(but not both at the same time)

S(h,l,t)={(h,l,t)|hE}{(h,l,t)|tE} S ( h , l , t ) ′ = { ( h ′ , l , t ) | h ′ ∈ E } ⋃ { ( h , l , t ′ ) | t ′ ∈ E } .

loss function L L

Given margin hyperparameter γ>0 1, [x]+ [ x ] + denotes the positive part of x x 2.

L=(h,l,t)S(h,l,t)S(h,l,t)[γ+d(h+l,t)d(h+l,t)]+

The loss function favors lower values of the energy for training triplets than for corrupted triplets, and thus a natural implementation of the intended criterion.

The optimization is carried out by stochastic gradient descent3 (in minibatch mode), over the possible h h , l, t t , with the additional constraints that the L2norm of the embeddings of the entities is 14(no regularization or norm constraints are given to the label embeddings l l ). It prevents the training process to trivially minimize L by artificially increasing entity embeddings norms.


  1. 一般设置为1
  2. 当值大于零,取本身;小于零,取0
  3. SGD, 随机梯度下降。这里是对一个batch求梯度之后就立即更新theta值
  4. 约束节点的嵌入(向量)的欧几里得距离为1,但是关系的嵌入不用约束
  • 3
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
对于基于TransE或类似模型进行推理,通常可以采用以下步骤: 1. 构建知识图谱:将知识库中的实体和关系抽象成节点和边,构建一个图谱。 2. 训练TransE模型:使用知识图谱作为输入,训练TransE模型来学习实体之间的关系。 3. 进行推理:通过查找知识图谱中的实体和关系,进行推理。 其中,比较关键的是如何训练TransE模型。TransE模型的核心思想是将实体和关系映射到同一向量空间中,从而在向量空间中计算它们之间的相似度。在训练阶段,需要最小化实体和关系之间的距离,使得真实的三元组距离近,而虚假的三元组距离远。相似度可以使用余弦相似度或点积等函数计算,具体实现可参考论文《TransE: Translating Embeddings for Modeling Multi-relational Data》。 下面给出一个简单的例子:假设有一个知识库包含以下三元组: (Tom, hasChild, Harry) (Tom, hasChild, Lily) (Lily, sibling, Harry) 使用TransE模型,我们可以将Tom、Harry和Lily分别映射到向量空间中的三个向量,然后通过计算向量之间的距离,来推理Tom是否是Harry的父亲。具体过程如下: 1. 将实体和关系映射到向量空间中: Tom -> (0, 0) Harry -> (2, 0) Lily -> (1, 1) hasChild -> (1, 0) sibling -> (0, 1) 2. 通过向量之间的距离计算相似度: sim(Tom, hasChild, Harry) = cos((0+1-2)/3) ≈ -0.63 sim(Tom, hasChild, Lily) = cos((0+1-1)/3) ≈ 0.33 sim(Tom, sibling, Harry) = cos((0-1-2)/3) ≈ -0.94 由此可见,Tom与Harry之间的相似度较低,因此不能推断Tom是Harry的父亲。而Tom与Lily之间的相似度较高,说明Tom是Lily的父亲。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值