《Translating Embeddings for Modeling Multi-relational Data》阅读笔记
We propose TransE, embedding entities and relationships of multi-relational data in low-dimensional vector space. It significantly outperforms state-of-the-art methods in link prediction on two knowledge bases. It can also be successfully trained on a large scale data set.
Our work focuses on modeling multi-relational data from KBs(Knowledge Base), with the goal of providing an efficient tool to complete them by automatically adding new facts, without requiring extra knowledge.
1. Modeling Multi-relational data
In contrast to single-relational data, the difficulty of multi-relational data is that the notion of locality may involve relationships and entities of different types at the same time, so that modeling multi-relational data requires more generic approaches that can choose the appropriate patterns considering all heterogeneous relationships at the same time.
… suggested that even in complex and heterogeneous multi-relational domains simple yet appropriate modeling assumptions can lead to better trade-offs(权衡) between accuracy and scalability(可扩展性).
2. Relationships as translations in the embedding space
TransE, an energy-based model for learning low-dimensional embeddings of entities. In TransE, realtionships are represented as translations in the embedding space: if holds, then the embedding of the tail entity should be close to embedding of the head entity plus some vector that depends on the relationship . This approach relies on a reduced set of parameters as it learns only one low-dimensional vector for each entity and each relationship.
The main motivation behind our translation-based parameterization is that hierarchical relationships are extremely common in KBs and translations are the natural transformations for representing them. Indeed, considering the natural representation of trees(i.e. embeddings of the nodes in dimension 2), the siblings are close to each other and nodes at a given height are organized on the x-axis, the parent-child relationship corresponds to a translation on the y-axis.
ps: a null translation vector corresponds to an equivalence relationship between entities.
Given a training set of triplets composed of two entities (the set of entities) and a relationship (the set of relationships), the model learns vector embeddings of the entities and the relationships.
Note that for a given entity, its embedding vector is the same when the entity appears as the head or as the tail of a triplet.
We want that when holds ( should be a nearest neighbor of ), while should be far away from otherwise.
Following an energy-based framework, the energy of a triplet is equal to for some dissimilarity measure , which we take to be either the or the -norm(曼哈顿或欧几里得距离).
either the head or tail replaced by a random entity(but not both at the same time)
The loss function favors lower values of the energy for training triplets than for corrupted triplets, and thus a natural implementation of the intended criterion.
The optimization is carried out by stochastic gradient descent3 (in minibatch mode), over the possible , , , with the additional constraints that the of the embeddings of the entities is 14(no regularization or norm constraints are given to the label embeddings ). It prevents the training process to trivially minimize by artificially increasing entity embeddings norms.