- 一些名词的理解
- translation 应该译作“平移”,head向量经过relation向量的平移(向量相加),应该靠近tail向量。这里的loss就可以以选定的范数来进行度量(“距离”)
- 模型的理解
- 特点:我用的参数少(simplicity),好训练
- easy to train, contains a reduced number of parameters and can scale up to very large databases
- greater expressiveness seems to be more synonymous to underfitting than to better performance
- challenge:数据的异质性
- difficulty of relational data is that the notion of locality may involve relationships and entities of different types at the same time,
- 特点:我用的参数少(simplicity),好训练
- 算法流程:
- 1 uniform初始化随机赋值,对每个实体和关系生成100维向量
- 2 每次训练采样,nbatch=400
- 相关参数:entity : 14951 , relation : 1345 , triple : 483142
batch size: 1207
- 相关参数:entity : 14951 , relation : 1345 , triple : 483142
- 3 每个batch为一组,对每个(h,r,t)triplet以50%几率随机替换head 或 尾(不同时)
- 4 比较计算相关loss,梯度更新(即完成SGD)
- 其他模型
- SE [3] embeds entities into R k , and relationships into two matrices L1 ∈ R k×k and L2 ∈ R k×k such that d(L1h, L2t) is large for corrupted triplets (h, l, t) (and small otherwise).
- SE 是求矩阵乘积,得到两个矩阵比较;transE是向量加法,得到向量间的比较
- 实验
- 数据集
- Wordnet / freebase
- freebase15k:
-
entity2id:
/m/06rf7 0
/m/0c94fn 1 - relation2id:
/sports/sports_team/roster./soccer/football_roster_position/player 8 /business/company_type/companies_of_this_type 9
- train(h,t,r):
/m/07s9rl0 /m/0170z3 /media_common/netflix_genre/titles
-
- Metric:
- 1,positive的平均rank MEAN RANK
- 2,positive在前十的比例 HITS@10 (%)
- results:outperform
- 数据集
- 细节:
- filter:过滤corrupt后仍然为正例的情况
- 没理解为什么grad_pos 和 grad_neg 要置为1
for triple, corrupted_triple in Tbatch:
# 取copy里的vector累积更新
h_correct_update = copy_entity[triple[0]]
t_correct_update = copy_entity[triple[1]]
relation_update = copy_relation[triple[2]]
h_corrupt_update = copy_entity[corrupted_triple[0]]
t_corrupt_update = copy_entity[corrupted_triple[1]]
# 取原始的vector计算梯度
h_correct = self.entity[triple[0]]
t_correct = self.entity[triple[1]]
relation = self.relation[triple[2]]
h_corrupt = self.entity[corrupted_triple[0]]
t_corrupt = self.entity[corrupted_triple[1]]
if self.L1:
dist_correct = distanceL1(h_correct, relation, t_correct)
dist_corrupt = distanceL1(h_corrupt, relation, t_corrupt)
else:
dist_correct = distanceL2(h_correct, relation, t_correct)
dist_corrupt = distanceL2(h_corrupt, relation, t_corrupt)
err = self.hinge_loss(dist_correct, dist_corrupt)
if err > 0:
self.loss += err
grad_pos = 2 * (h_correct + relation - t_correct)
grad_neg = 2 * (h_corrupt + relation - t_corrupt)
if self.L1:
for i in range(len(grad_pos)):
if (grad_pos[i] > 0):
grad_pos[i] = 1
else:
grad_pos[i] = -1
for i in range(len(grad_neg)):
if (grad_neg[i] > 0):
grad_neg[i] = 1
else:
grad_neg[i] = -1
# head系数为正,减梯度;tail系数为负,加梯度
h_correct_update -= self.learning_rate * grad_pos
t_correct_update -= (-1) * self.learning_rate * grad_pos
# corrupt项整体为负,因此符号与correct相反
if triple[0] == corrupted_triple[0]: # 若替换的是尾实体,则头实体更新两次
h_correct_update -= (-1) * self.learning_rate * grad_neg
t_corrupt_update -= self.learning_rate * grad_neg
elif triple[1] == corrupted_triple[1]: # 若替换的是头实体,则尾实体更新两次
h_corrupt_update -= (-1) * self.learning_rate * grad_neg
t_correct_update -= self.learning_rate * grad_neg
#relation更新两次
relation_update -= self.learning_rate*grad_pos
relation_update -= (-1)*self.learning_rate*grad_neg