# 《Translating Embeddings for Modeling Multi-relational Data》阅读笔记

## Abstract

We propose TransE, embedding entities and relationships of multi-relational data in low-dimensional vector space. It significantly outperforms state-of-the-art methods in link prediction on two knowledge bases. It can also be successfully trained on a large scale data set.

## Introduction

Our work focuses on modeling multi-relational data from KBs(Knowledge Base), with the goal of providing an efficient tool to complete them by automatically adding new facts, without requiring extra knowledge.

#### 1. Modeling Multi-relational data

In contrast to single-relational data, the difficulty of multi-relational data is that the notion of locality may involve relationships and entities of different types at the same time, so that modeling multi-relational data requires more generic approaches that can choose the appropriate patterns considering all heterogeneous relationships at the same time.

… suggested that even in complex and heterogeneous multi-relational domains simple yet appropriate modeling assumptions can lead to better trade-offs(权衡) between accuracy and scalability(可扩展性).

#### 2. Relationships as translations in the embedding space

TransE, an energy-based model for learning low-dimensional embeddings of entities. In TransE, realtionships are represented as translations in the embedding space: if $\left(h,l,t\right)$$(h,l,t)$ holds, then the embedding of the tail entity should be close to embedding of the head entity $h$$h$ plus some vector that depends on the relationship $l$$l$. This approach relies on a reduced set of parameters as it learns only one low-dimensional vector for each entity and each relationship.

The main motivation behind our translation-based parameterization is that hierarchical relationships are extremely common in KBs and translations are the natural transformations for representing them. Indeed, considering the natural representation of trees(i.e. embeddings of the nodes in dimension 2), the siblings are close to each other and nodes at a given height are organized on the x-axis, the parent-child relationship corresponds to a translation on the y-axis.

ps: a null translation vector corresponds to an equivalence relationship between entities.

## Translation-based Model

Given a training set $S$$S$ of triplets $\left(h,l,t\right)$$(h,l,t)$ composed of two entities $h,t\in E$$h,t \in E$ (the set of entities) and a relationship $l\in L$$l \in L$ (the set of relationships), the model learns vector embeddings of the entities and the relationships.

Note that for a given entity, its embedding vector is the same when the entity appears as the head or as the tail of a triplet.

We want that $h+l\approx t$$h+l \approx t$ when $\left(h,l,t\right)$$(h,l,t)$ holds ($t$$t$ should be a nearest neighbor of $h+l$$h+l$), while $h+l$$h+l$ should be far away from $t$$t$ otherwise.

###### measure d

Following an energy-based framework, the energy of a triplet is equal to $d\left(h+l,t\right)$$d(h+l,t)$ for some dissimilarity measure $d$$d$, which we take to be either the ${L}_{1}$$L_1$ or the ${L}_{2}$$L_2$-norm(曼哈顿或欧几里得距离).
$d\left(h+l,t\right)=||h+t-l|{|}_{2}^{2}$$d(h+l,t) = ||h+t-l||_2^2$

###### corrupted triplets S′(h,l,t)${S}_{\left(h,l,t\right)}^{{}^{\prime }}$$S_{(h,l,t)}^{'}$

either the head or tail replaced by a random entity(but not both at the same time)

${S}_{\left(h,l,t\right)}^{{}^{\prime }}=\left\{\left({h}^{{}^{\prime }},l,t\right)|{h}^{{}^{\prime }}\in E\right\}\bigcup \left\{\left(h,l,{t}^{{}^{\prime }}\right)|{t}^{{}^{\prime }}\in E\right\}$$S_{(h,l,t)}^{'} = \{(h^{'},l,t)|h^{'}\in E\} \bigcup \{ (h,l,t^{'})|t^{'} \in E\}$.

###### loss function L$L$$L$

Given margin hyperparameter $\gamma >0$$\gamma > 0$ 1, $\left[x{\right]}_{+}$$[x]_+$ denotes the positive part of $x$$x$ 2.

$L=\sum _{\left(h,l,t\right)\in S}\sum _{\left({h}^{{}^{\prime }},{l}^{{}^{\prime }},{t}^{{}^{\prime }}\right)\in {S}_{\left(h,l,t\right)}^{{}^{\prime }}}\left[\gamma +d\left(h+l,t\right)-d\left({h}^{{}^{\prime }}+l,{t}^{{}^{\prime }}\right){\right]}_{+}$$L = \sum_{(h,l,t) \in S} \sum_{(h^{'},l^{'},t^{'}) \in S_{(h,l,t)}^{'}} [\gamma + d(h+l,t)-d(h^{'}+l,t^{'})]_+$

The loss function favors lower values of the energy for training triplets than for corrupted triplets, and thus a natural implementation of the intended criterion.

The optimization is carried out by stochastic gradient descent3 (in minibatch mode), over the possible $h$$h$, $l$$l$, $t$$t$, with the additional constraints that the ${L}_{2}-norm$$L_2-norm$ of the embeddings of the entities is 14(no regularization or norm constraints are given to the label embeddings $l$$l$). It prevents the training process to trivially minimize $L$$L$ by artificially increasing entity embeddings norms.

1. 一般设置为1
2. 当值大于零，取本身；小于零，取0
3. SGD, 随机梯度下降。这里是对一个batch求梯度之后就立即更新theta值
4. 约束节点的嵌入(向量)的欧几里得距离为1，但是关系的嵌入不用约束

01-12 373

07-03 2745

10-25 1361

03-27 1.8万

02-12 324

07-23 5787

#### Numbers in Multi-Relational Data Mining

2008年04月16日 151KB 下载

#### multi-relational data mining

2008年04月12日 933KB 下载

#### Introduction to multi-relational data mining

2008年04月12日 832KB 下载

11-10 1356