TransE算法原理与代码解析(2021-06-22)

TransE算法原理与案例

TransE

知识图谱基础

三元组(h,r,t)

知识表示

即将实体和关系向量化,embedding

算法描述

思想:一个正确的三元组的embedding会满足:h+r=t
在这里插入图片描述

定义距离d表示向量之间的距离,一般取L1或者L2,期望正确的三元组的距离越小越好,而错误的三元组的距离越大越好。为此给出目标函数为:

在这里插入图片描述

在这里插入图片描述
梯度求解
在这里插入图片描述

代码分析

  • 定义类:
参数:
目标函数的常数——margin
学习率——learningRate
向量维度——dim
实体列表——entityList(读取文本文件,实体+id)
关系列表——relationList(读取文本文件,关系 + id)
三元关系列表——tripleList(读取文本文件,实体 + 实体 + 关系)
损失值——loss

距离公式——L1

  • 向量初始化

规定初始化维度和取值范围(TransE算法原理中的取值范围)
涉及的函数:

    init:随机生成值
    norm:归一化
  • 训练向量
    getSample——随机选取部分三元关系,Sbatch
    getCorruptedTriplet(sbatch)——随机替换三元组的实体,h、t中任意一个被替换,但不同时替换。
    update——更新

L2更新向量的推导过程:
[外链图片转存失败(img-ecjLQyc0-1567589870034)(en-resource://database/2941:0)]

python 函数
uniform(a, b)#随机生成a,b之间的数,左闭右开。
求向量的模,var = linalg.norm(list)

"""
@version: 3.7
@author: jiayalu
@file: trainTransE.py
@time: 22/08/2019 10:56
@description: 用于对知识图谱中的实体、关系基于TransE算法训练获取向量
数据:三元关系
实体id和关系id
结果为:两个文本文件,即entityVector.txt和relationVector.txt    实体 [array向量]

“”"
from random import uniform, sample
from numpy import *
from copy import deepcopy

class TransE:
def init(self, entityList, relationList, tripleList, margin = 1, learingRate = 0.00001, dim = 10, L1 = True):
self.margin = margin
self.learingRate = learingRate
self.dim = dim#向量维度
self.entityList = entityList#一开始,entityList是entity的list;初始化后,变为字典,key是entity,values是其向量(使用narray)。
self.relationList = relationList#理由同上
self.tripleList = tripleList#理由同上
self.loss = 0
self.L1 = L1

<span class="token keyword">def</span> <span class="token function">initialize</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token triple-quoted-string string">&#39;&#39;&#39;
    初始化向量
    &#39;&#39;&#39;</span>
    entityVectorList <span class="token operator">&#61;</span> <span class="token punctuation">{<!-- --></span><span class="token punctuation">}</span>
    relationVectorList <span class="token operator">&#61;</span> <span class="token punctuation">{<!-- --></span><span class="token punctuation">}</span>
    <span class="token keyword">for</span> entity <span class="token keyword">in</span> self<span class="token punctuation">.</span>entityList<span class="token punctuation">:</span>
        n <span class="token operator">&#61;</span> <span class="token number">0</span>
        entityVector <span class="token operator">&#61;</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
        <span class="token keyword">while</span> n <span class="token operator">&lt;</span> self<span class="token punctuation">.</span>dim<span class="token punctuation">:</span>
            ram <span class="token operator">&#61;</span> init<span class="token punctuation">(</span>self<span class="token punctuation">.</span>dim<span class="token punctuation">)</span><span class="token comment">#初始化的范围</span>
            entityVector<span class="token punctuation">.</span>append<span class="token punctuation">(</span>ram<span class="token punctuation">)</span>
            n <span class="token operator">&#43;&#61;</span> <span class="token number">1</span>
        entityVector <span class="token operator">&#61;</span> norm<span class="token punctuation">(</span>entityVector<span class="token punctuation">)</span><span class="token comment">#归一化</span>
        entityVectorList<span class="token punctuation">[</span>entity<span class="token punctuation">]</span> <span class="token operator">&#61;</span> entityVector
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">&#34;entityVector初始化完成&#xff0c;数量是%d&#34;</span><span class="token operator">%</span><span class="token builtin">len</span><span class="token punctuation">(</span>entityVectorList<span class="token punctuation">)</span><span class="token punctuation">)</span>
    <span class="token keyword">for</span> relation <span class="token keyword">in</span> self<span class="token punctuation">.</span> relationList<span class="token punctuation">:</span>
        n <span class="token operator">&#61;</span> <span class="token number">0</span>
        relationVector <span class="token operator">&#61;</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
        <span class="token keyword">while</span> n <span class="token operator">&lt;</span> self<span class="token punctuation">.</span>dim<span class="token punctuation">:</span>
            ram <span class="token operator">&#61;</span> init<span class="token punctuation">(</span>self<span class="token punctuation">.</span>dim<span class="token punctuation">)</span><span class="token comment">#初始化的范围</span>
            relationVector<span class="token punctuation">.</span>append<span class="token punctuation">(</span>ram<span class="token punctuation">)</span>
            n <span class="token operator">&#43;&#61;</span> <span class="token number">1</span>
        relationVector <span class="token operator">&#61;</span> norm<span class="token punctuation">(</span>relationVector<span class="token punctuation">)</span><span class="token comment">#归一化</span>
        relationVectorList<span class="token punctuation">[</span>relation<span class="token punctuation">]</span> <span class="token operator">&#61;</span> relationVector
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">&#34;relationVectorList初始化完成&#xff0c;数量是%d&#34;</span><span class="token operator">%</span><span class="token builtin">len</span><span class="token punctuation">(</span>relationVectorList<span class="token punctuation">)</span><span class="token punctuation">)</span>
    self<span class="token punctuation">.</span>entityList <span class="token operator">&#61;</span> entityVectorList
    self<span class="token punctuation">.</span>relationList <span class="token operator">&#61;</span> relationVectorList
<span class="token keyword">def</span> <span class="token function">transE</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> cI <span class="token operator">&#61;</span> <span class="token number">20</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">&#34;训练开始&#34;</span><span class="token punctuation">)</span>
    <span class="token keyword">for</span> cycleIndex <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>cI<span class="token punctuation">)</span><span class="token punctuation">:</span>
        Sbatch <span class="token operator">&#61;</span> self<span class="token punctuation">.</span>getSample<span class="token punctuation">(</span><span class="token number">3</span><span class="token punctuation">)</span>
        Tbatch <span class="token operator">&#61;</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token comment">#元组对&#xff08;原三元组&#xff0c;打碎的三元组&#xff09;的列表 &#xff1a;{((h,r,t),(h&#39;,r,t&#39;))}</span>
        <span class="token keyword">for</span> sbatch <span class="token keyword">in</span> Sbatch<span class="token punctuation">:</span>
            tripletWithCorruptedTriplet <span class="token operator">&#61;</span> <span class="token punctuation">(</span>sbatch<span class="token punctuation">,</span> self<span class="token punctuation">.</span>getCorruptedTriplet<span class="token punctuation">(</span>sbatch<span class="token punctuation">)</span><span class="token punctuation">)</span>
            <span class="token comment"># print(tripletWithCorruptedTriplet)</span>
            <span class="token keyword">if</span><span class="token punctuation">(</span>tripletWithCorruptedTriplet <span class="token operator">not</span> <span class="token keyword">in</span> Tbatch<span class="token punctuation">)</span><span class="token punctuation">:</span>
                Tbatch<span class="token punctuation">.</span>append<span class="token punctuation">(</span>tripletWithCorruptedTriplet<span class="token punctuation">)</span>
        self<span class="token punctuation">.</span>update<span class="token punctuation">(</span>Tbatch<span class="token punctuation">)</span>
        <span class="token keyword">if</span> cycleIndex <span class="token operator">%</span> <span class="token number">100</span> <span class="token operator">&#61;&#61;</span> <span class="token number">0</span><span class="token punctuation">:</span>
            <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">&#34;第%d次循环&#34;</span><span class="token operator">%</span>cycleIndex<span class="token punctuation">)</span>
            <span class="token keyword">print</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>loss<span class="token punctuation">)</span>
            self<span class="token punctuation">.</span>writeRelationVector<span class="token punctuation">(</span><span class="token string">&#34;E:\pythoncode\knownlageGraph\\transE-master\\relationVector.txt&#34;</span><span class="token punctuation">)</span>
            self<span class="token punctuation">.</span>writeEntilyVector<span class="token punctuation">(</span><span class="token string">&#34;E:\pythoncode\knownlageGraph\\transE-master\\entityVector.txt&#34;</span><span class="token punctuation">)</span>
            self<span class="token punctuation">.</span>loss <span class="token operator">&#61;</span> <span class="token number">0</span>
<span class="token keyword">def</span> <span class="token function">getSample</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> size<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token keyword">return</span> sample<span class="token punctuation">(</span>self<span class="token punctuation">.</span>tripleList<span class="token punctuation">,</span> size<span class="token punctuation">)</span>

<span class="token keyword">def</span> <span class="token function">getCorruptedTriplet</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> triplet<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token triple-quoted-string string">&#39;&#39;&#39;
    training triplets with either the head or tail replaced by a random entity (but not both at the same time)
    :param triplet:
    :return corruptedTriplet:
    &#39;&#39;&#39;</span>
    i <span class="token operator">&#61;</span> uniform<span class="token punctuation">(</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span>
    <span class="token keyword">if</span> i <span class="token operator">&lt;</span> <span class="token number">0</span><span class="token punctuation">:</span>  <span class="token comment"># 小于0&#xff0c;打坏三元组的第一项</span>
        <span class="token keyword">while</span> <span class="token boolean">True</span><span class="token punctuation">:</span>
            entityTemp <span class="token operator">&#61;</span> sample<span class="token punctuation">(</span>self<span class="token punctuation">.</span>entityList<span class="token punctuation">.</span>keys<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span>
            <span class="token keyword">if</span> entityTemp <span class="token operator">!&#61;</span> triplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">:</span>
                <span class="token keyword">break</span>
        corruptedTriplet <span class="token operator">&#61;</span> <span class="token punctuation">(</span>entityTemp<span class="token punctuation">,</span> triplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> triplet<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
    <span class="token keyword">else</span><span class="token punctuation">:</span>  <span class="token comment"># 大于等于0&#xff0c;打坏三元组的第二项</span>
        <span class="token keyword">while</span> <span class="token boolean">True</span><span class="token punctuation">:</span>
            entityTemp <span class="token operator">&#61;</span> sample<span class="token punctuation">(</span>self<span class="token punctuation">.</span>entityList<span class="token punctuation">.</span>keys<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span>
            <span class="token keyword">if</span> entityTemp <span class="token operator">!&#61;</span> triplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">:</span>
                <span class="token keyword">break</span>
        corruptedTriplet <span class="token operator">&#61;</span> <span class="token punctuation">(</span>triplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> entityTemp<span class="token punctuation">,</span> triplet<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
    <span class="token keyword">return</span> corruptedTriplet

<span class="token keyword">def</span> <span class="token function">update</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> Tbatch<span class="token punctuation">)</span><span class="token punctuation">:</span>
    copyEntityList <span class="token operator">&#61;</span> deepcopy<span class="token punctuation">(</span>self<span class="token punctuation">.</span>entityList<span class="token punctuation">)</span>
    copyRelationList <span class="token operator">&#61;</span> deepcopy<span class="token punctuation">(</span>self<span class="token punctuation">.</span>relationList<span class="token punctuation">)</span>

    <span class="token keyword">for</span> tripletWithCorruptedTriplet <span class="token keyword">in</span> Tbatch<span class="token punctuation">:</span>
        headEntityVector <span class="token operator">&#61;</span> copyEntityList<span class="token punctuation">[</span>
            tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span>  <span class="token comment"># tripletWithCorruptedTriplet是原三元组和打碎的三元组的元组tuple</span>
        tailEntityVector <span class="token operator">&#61;</span> copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
        relationVector <span class="token operator">&#61;</span> copyRelationList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
        headEntityVectorWithCorruptedTriplet <span class="token operator">&#61;</span> copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
        tailEntityVectorWithCorruptedTriplet <span class="token operator">&#61;</span> copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span>

        headEntityVectorBeforeBatch <span class="token operator">&#61;</span> self<span class="token punctuation">.</span>entityList<span class="token punctuation">[</span>
            tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span>  <span class="token comment"># tripletWithCorruptedTriplet是原三元组和打碎的三元组的元组tuple</span>
        tailEntityVectorBeforeBatch <span class="token operator">&#61;</span> self<span class="token punctuation">.</span>entityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
        relationVectorBeforeBatch <span class="token operator">&#61;</span> self<span class="token punctuation">.</span>relationList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
        headEntityVectorWithCorruptedTripletBeforeBatch <span class="token operator">&#61;</span> self<span class="token punctuation">.</span>entityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
        tailEntityVectorWithCorruptedTripletBeforeBatch <span class="token operator">&#61;</span> self<span class="token punctuation">.</span>entityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span>

        <span class="token keyword">if</span> self<span class="token punctuation">.</span>L1<span class="token punctuation">:</span>
            distTriplet <span class="token operator">&#61;</span> distanceL1<span class="token punctuation">(</span>headEntityVectorBeforeBatch<span class="token punctuation">,</span> tailEntityVectorBeforeBatch<span class="token punctuation">,</span>
                                     relationVectorBeforeBatch<span class="token punctuation">)</span>
            distCorruptedTriplet <span class="token operator">&#61;</span> distanceL1<span class="token punctuation">(</span>headEntityVectorWithCorruptedTripletBeforeBatch<span class="token punctuation">,</span>
                                              tailEntityVectorWithCorruptedTripletBeforeBatch<span class="token punctuation">,</span>
                                              relationVectorBeforeBatch<span class="token punctuation">)</span>
        <span class="token keyword">else</span><span class="token punctuation">:</span>
            distTriplet <span class="token operator">&#61;</span> distanceL2<span class="token punctuation">(</span>headEntityVectorBeforeBatch<span class="token punctuation">,</span> tailEntityVectorBeforeBatch<span class="token punctuation">,</span>
                                     relationVectorBeforeBatch<span class="token punctuation">)</span>
            distCorruptedTriplet <span class="token operator">&#61;</span> distanceL2<span class="token punctuation">(</span>headEntityVectorWithCorruptedTripletBeforeBatch<span class="token punctuation">,</span>
                                              tailEntityVectorWithCorruptedTripletBeforeBatch<span class="token punctuation">,</span>
                                              relationVectorBeforeBatch<span class="token punctuation">)</span>
        eg <span class="token operator">&#61;</span> self<span class="token punctuation">.</span>margin <span class="token operator">&#43;</span> distTriplet <span class="token operator">-</span> distCorruptedTriplet
        <span class="token keyword">if</span> eg <span class="token operator">&gt;</span> <span class="token number">0</span><span class="token punctuation">:</span>  <span class="token comment"># [function]&#43; 是一个取正值的函数</span>
            self<span class="token punctuation">.</span>loss <span class="token operator">&#43;&#61;</span> eg
            <span class="token keyword">if</span> self<span class="token punctuation">.</span>L1<span class="token punctuation">:</span>
                tempPositive <span class="token operator">&#61;</span> <span class="token number">2</span> <span class="token operator">*</span> self<span class="token punctuation">.</span>learingRate <span class="token operator">*</span> <span class="token punctuation">(</span>
                            tailEntityVectorBeforeBatch <span class="token operator">-</span> headEntityVectorBeforeBatch <span class="token operator">-</span> relationVectorBeforeBatch<span class="token punctuation">)</span>
                tempNegtative <span class="token operator">&#61;</span> <span class="token number">2</span> <span class="token operator">*</span> self<span class="token punctuation">.</span>learingRate <span class="token operator">*</span> <span class="token punctuation">(</span>
                            tailEntityVectorWithCorruptedTripletBeforeBatch <span class="token operator">-</span> headEntityVectorWithCorruptedTripletBeforeBatch <span class="token operator">-</span> relationVectorBeforeBatch<span class="token punctuation">)</span>
                tempPositiveL1 <span class="token operator">&#61;</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
                tempNegtativeL1 <span class="token operator">&#61;</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
                <span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>dim<span class="token punctuation">)</span><span class="token punctuation">:</span>  <span class="token comment"># 不知道有没有pythonic的写法&#xff08;比如列表推倒或者numpy的函数&#xff09;&#xff1f;</span>
                    <span class="token keyword">if</span> tempPositive<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">&gt;&#61;</span> <span class="token number">0</span><span class="token punctuation">:</span>
                        tempPositiveL1<span class="token punctuation">.</span>append<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span>
                    <span class="token keyword">else</span><span class="token punctuation">:</span>
                        tempPositiveL1<span class="token punctuation">.</span>append<span class="token punctuation">(</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span>
                    <span class="token keyword">if</span> tempNegtative<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">&gt;&#61;</span> <span class="token number">0</span><span class="token punctuation">:</span>
                        tempNegtativeL1<span class="token punctuation">.</span>append<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span>
                    <span class="token keyword">else</span><span class="token punctuation">:</span>
                        tempNegtativeL1<span class="token punctuation">.</span>append<span class="token punctuation">(</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span>
                tempPositive <span class="token operator">&#61;</span> array<span class="token punctuation">(</span>tempPositiveL1<span class="token punctuation">)</span>
                tempNegtative <span class="token operator">&#61;</span> array<span class="token punctuation">(</span>tempNegtativeL1<span class="token punctuation">)</span>

            <span class="token keyword">else</span><span class="token punctuation">:</span>
                <span class="token comment">#根据损失函数的求梯度</span>
                tempPositive <span class="token operator">&#61;</span> <span class="token number">2</span> <span class="token operator">*</span> self<span class="token punctuation">.</span>learingRate <span class="token operator">*</span> <span class="token punctuation">(</span>
                            tailEntityVectorBeforeBatch <span class="token operator">-</span> headEntityVectorBeforeBatch <span class="token operator">-</span> relationVectorBeforeBatch<span class="token punctuation">)</span>
                tempNegtative <span class="token operator">&#61;</span> <span class="token number">2</span> <span class="token operator">*</span> self<span class="token punctuation">.</span>learingRate <span class="token operator">*</span> <span class="token punctuation">(</span>
                            tailEntityVectorWithCorruptedTripletBeforeBatch <span class="token operator">-</span> headEntityVectorWithCorruptedTripletBeforeBatch <span class="token operator">-</span> relationVectorBeforeBatch<span class="token punctuation">)</span>

            headEntityVector <span class="token operator">&#61;</span> headEntityVector <span class="token operator">&#43;</span> tempPositive<span class="token comment">#更新向量</span>
            tailEntityVector <span class="token operator">&#61;</span> tailEntityVector <span class="token operator">-</span> tempPositive
            relationVector <span class="token operator">&#61;</span> relationVector <span class="token operator">&#43;</span> tempPositive <span class="token operator">-</span> tempNegtative
            headEntityVectorWithCorruptedTriplet <span class="token operator">&#61;</span> headEntityVectorWithCorruptedTriplet <span class="token operator">-</span> tempNegtative
            tailEntityVectorWithCorruptedTriplet <span class="token operator">&#61;</span> tailEntityVectorWithCorruptedTriplet <span class="token operator">&#43;</span> tempNegtative

            <span class="token comment"># 只归一化这几个刚更新的向量&#xff0c;而不是按原论文那些一口气全更新了</span>
            copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">&#61;</span> norm<span class="token punctuation">(</span>headEntityVector<span class="token punctuation">)</span>
            copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">&#61;</span> norm<span class="token punctuation">(</span>tailEntityVector<span class="token punctuation">)</span>
            copyRelationList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">&#61;</span> norm<span class="token punctuation">(</span>relationVector<span class="token punctuation">)</span>
            copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">&#61;</span> norm<span class="token punctuation">(</span>headEntityVectorWithCorruptedTriplet<span class="token punctuation">)</span>
            copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">&#61;</span> norm<span class="token punctuation">(</span>tailEntityVectorWithCorruptedTriplet<span class="token punctuation">)</span>

    self<span class="token punctuation">.</span>entityList <span class="token operator">&#61;</span> copyEntityList
    self<span class="token punctuation">.</span>relationList <span class="token operator">&#61;</span> copyRelationList
<span class="token keyword">def</span> <span class="token function">writeEntilyVector</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> <span class="token builtin">dir</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">&#34;写入实体&#34;</span><span class="token punctuation">)</span>
    entityVectorFile <span class="token operator">&#61;</span> <span class="token builtin">open</span><span class="token punctuation">(</span><span class="token builtin">dir</span><span class="token punctuation">,</span> <span class="token string">&#39;w&#39;</span><span class="token punctuation">,</span> encoding<span class="token operator">&#61;</span><span class="token string">&#34;utf-8&#34;</span><span class="token punctuation">)</span>
    <span class="token keyword">for</span> entity <span class="token keyword">in</span> self<span class="token punctuation">.</span>entityList<span class="token punctuation">.</span>keys<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
        entityVectorFile<span class="token punctuation">.</span>write<span class="token punctuation">(</span>entity <span class="token operator">&#43;</span> <span class="token string">&#34;    &#34;</span><span class="token punctuation">)</span>
        entityVectorFile<span class="token punctuation">.</span>write<span class="token punctuation">(</span><span class="token builtin">str</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>entityList<span class="token punctuation">[</span>entity<span class="token punctuation">]</span><span class="token punctuation">.</span>tolist<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
        entityVectorFile<span class="token punctuation">.</span>write<span class="token punctuation">(</span><span class="token string">&#34;\n&#34;</span><span class="token punctuation">)</span>
    entityVectorFile<span class="token punctuation">.</span>close<span class="token punctuation">(</span><span class="token punctuation">)</span>

<span class="token keyword">def</span> <span class="token function">writeRelationVector</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> <span class="token builtin">dir</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">&#34;写入关系&#34;</span><span class="token punctuation">)</span>
    relationVectorFile <span class="token operator">&#61;</span> <span class="token builtin">open</span><span class="token punctuation">(</span><span class="token builtin">dir</span><span class="token punctuation">,</span> <span class="token string">&#39;w&#39;</span><span class="token punctuation">,</span> encoding<span class="token operator">&#61;</span><span class="token string">&#34;utf-8&#34;</span><span class="token punctuation">)</span>
    <span class="token keyword">for</span> relation <span class="token keyword">in</span> self<span class="token punctuation">.</span>relationList<span class="token punctuation">.</span>keys<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
        relationVectorFile<span class="token punctuation">.</span>write<span class="token punctuation">(</span>relation <span class="token operator">&#43;</span> <span class="token string">&#34;    &#34;</span><span class="token punctuation">)</span>
        relationVectorFile<span class="token punctuation">.</span>write<span class="token punctuation">(</span><span class="token builtin">str</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>relationList<span class="token punctuation">[</span>relation<span class="token punctuation">]</span><span class="token punctuation">.</span>tolist<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
        relationVectorFile<span class="token punctuation">.</span>write<span class="token punctuation">(</span><span class="token string">&#34;\n&#34;</span><span class="token punctuation">)</span>
    relationVectorFile<span class="token punctuation">.</span>close<span class="token punctuation">(</span><span class="token punctuation">)</span>

def init(dim):
return uniform(-6/(dim0.5), 6/(dim0.5))

def norm(list):
‘’’
归一化
:param 向量
:return: 向量的平方和的开方后的向量
‘’’

var = linalg.norm(list)
i = 0
while i < len(list):
list[i] = list[i]/var
i += 1
return array(list)

def distanceL1(h, t ,r):
s = h + r - t
sum = fabs(s).sum()
return sum

def distanceL2(h, t, r):
s = h + r - t
sum = (s*s).sum()
return sum

def openDetailsAndId(dir,sp=" "):
idNum = 0
list = []
with open(dir,“r”, encoding=“utf-8”) as file:
lines = file.readlines()
for line in lines:
DetailsAndId = line.strip().split(sp)
list.append(DetailsAndId[0])
idNum += 1
return idNum, list

def openTrain(dir,sp=" "):
num = 0
list = []
with open(dir, “r”, encoding=“utf-8”) as file:
lines = file.readlines()
for line in lines:
triple = line.strip().split(sp)
if(len(triple)<3):
continue
list.append(tuple(triple))
num += 1
return num, list

if name == main:
dirEntity = “E:\pythoncode\ZXknownlageGraph\TransEgetvector\entity2id.txt”
entityIdNum, entityList = openDetailsAndId(dirEntity)
dirRelation = “E:\pythoncode\ZXknownlageGraph\TransEgetvector\relation2id.txt”
relationIdNum, relationList = openDetailsAndId(dirRelation)
dirTrain = “E:\pythoncode\ZXknownlageGraph\TransEgetvector\train.txt”
tripleNum, tripleList = openTrain(dirTrain)
# print(tripleNum, tripleList)
print(“打开TransE”)
transE = TransE(entityList,relationList,tripleList, margin=1, dim = 128)
print(“TranE初始化”)
transE.initialize()
transE.transE(1500)
transE.writeRelationVector(“E:\pythoncode\ZXknownlageGraph\TransEgetvector\relationVector.txt”)
transE.writeEntilyVector(“E:\pythoncode\ZXknownlageGraph\TransEgetvector\entityVector.txt”)

数据

在这里插入图片描述
在这里插入图片描述在这里插入图片描述
结果向量在这里插入图片描述

  • 点赞 5
  • 评论 21
  • 分享
    x

    海报分享

    扫一扫,分享海报

  • 收藏 37
  • 打赏

    打赏

    jiayalu

    你的鼓励将是我创作的最大动力

    C币 余额
    2C币 4C币 6C币 10C币 20C币 50C币
  • 举报
  • 关注
  • 一键三连

这是我自己的代码,主要用来存储,若能帮到其他人,我也很愿意。
表情包
插入表情
还能输入1000个字符
相关推荐
介绍 TransE 算法(Translating Embedding)
知识表示学习 网络上已经存在了大量知识库(KBs),比如OpenCyc,WordNet,Freebase,Dbpedia等等。

这些知识库是为了各种各样的目的建立的,因此很难用到其他系统上面。为了发挥知识库的图(graph)性,也为了得到统计学习(包括机器学习和深度学习)的优势,我们需要将知识库嵌入(embedding)到一个低维空间里(比如10、20、50维)。我们都知道,获得了向量后,就可以运用各种数学工具进行分析。它为许多知识获取任务和下游应用铺平了道路。
总的来说,废话这么多,所谓知识表示学习,就


	</dl>

算法伪代码

SGD中的向量更新

代码实现

关于TransE,博客上各种博文漫天飞,对于原理我就不做重复性劳动,只多说一句,TransE是知识表示算法翻译算法系列中的最基础算法,此处还有TransH、TransD等等;个人觉得翻译算法的叫法是不太合适的,translating,叫做平移或者变换算法可能更加符合作者的原本意图,利用向量的平移不变性去做链路预测。了解原理个人觉得以下两篇足够…


TranE是一篇Bordes等人2013年发表在NIPS上的文章提出的 算法。它的提出,是为了解决多关系数据(multi-relational data)的处理问题。 TransE的直观含义,就是 TransE基于实体和关系的分布式向量表示,将每个三元组实例(head,relation,tail)中的关系relation看做从实体head到实体tail的翻译,通过不断调整h、r和t(head、relat...
表示学习旨在学习一系列低维稠密向量来表征语义信息,而知识表示学习是面向知识库中实体和关系的表示学习。当今大规模知识库(或称知识图谱)的构建为许多NLP任务提供了底层支持,但由于其规模庞大且不完备,如何高效存储和补全知识库成为了一项非常重要的任务,这就依托于知识表示学习。 transE 算法就是一个非常经典的知识表示学习,用分布式表示(distributed representation)来描述知识库...
TransE 算法详解 @(机器学习)[知识图谱|知识表示| TransE]

文章目录TransE 算法详解算法背景知识图谱是什么知识表示是什么基本思想算法描述梯度参考文献
算法背景
知识图谱是什么
一条知识图谱可以表示为一个三元组(sub,rel,obj)。举个例子:小明的爸爸是大明,表示成三元组是(小明,爸爸,大明)。前者是主体,中间是关系,后者是客体。主体和客体统称为实体(entity)。关…


transE(Translating Embedding)详解+简单python实现 概念 transE 算法是一个非常经典的知识表示学习,用分布式表示(distributed representation)来描述知识库中的三元组。 原理 transE 算法利用了word2vec的平移不变性, TransE的直观含义,就是 TransE基于实体和关系的分布式向量表示,将每个三元组实例(head,relation,tail)中的关系relation看做从实体head到实体tail的翻译(其实就是向量相加),通过不断调
原文地址:https://blog.csdn.net/elaine_bao/article/details/52012537
TransE 算法中存在一个设定,它将关系看作是实体间的平移向量,也就是说对于一个三元组(h,r,t)对应的向量lh,lr,lt,希望 lh+lr =lt 这源于Mikolov等人在2013年提出的word2vec词表示学习模型,他们发现词向量空间存在着平移不变现象,如 C(king)−C(queen)≈C(man...
1 代码来源 本代码来源于github项目地址,项目实现了 TransE 算法。下面结合项目代码,对 TransE 算法 原理及实现进行详细说明。 2基本思想 TransE是一篇Bordes等人2013年发表在NIPS上的文章提出的 算法。它的提出,是为了解决多关系数据(multi-relational data)的处理问题。我们现在有很多很多的知识库数据knowledge bases (KBs),比如Fre...
      Svm(support Vector Mac)又称为支持向量机,是一种二分类的模型。当然如果进行修改之后也是可以用于多类别问题的分类。支持向量机可以分为线性核非线性两大类。其主要思想为找到空间中的一个更够将所有数据样本划开的超平面,并且使得本本集中所有数据到这个超平面的距离最短。一、基于最大间隔分隔数据1.1支持向量与超平面    在了解svm 算法之前,我们首先需要了解一下线性分类器这...

先看下train2id.txt,大概是这样子:

253 3643 35
438 10640 94
36 13172 18
8484 35 17
406 3869 38
6039 6038 384
5771 8658 50
7111 683 10
7293 9471 61
4312 2557 382

就只有head tail relatio…


转自:http://blog.csdn.net/u011274209/article/details/50991385 一、引言 网络上已经存在了大量知识库(KBs),比如OpenCyc,WordNet,Freebase,Dbpedia等等。这些知识库是为了各种各样的目的建立的,因此很难用到其他系统上面。为了发挥知识库的图(graph)性,也为了得到统计学习(包括机器学习和深度学习)的优势,我们需要将知识库嵌入(embedding)到一个低维空间里(比如10、20、50维)。我们都知道,获得了向量后,就可以
CSDN开发者助手由CSDN官方开发,集成一键呼出搜索、万能快捷工具、个性化新标签页和官方免广告四大功能。帮助您提升10倍开发效率!
传统的知识图谱表示方法是采用OWL、RDF等本体语言进行描述;随着深度学习的发展与应用,我们期望采用一种更为简单的方式表示,那就是【向量】,采用向量形式可以方便我们进行之后的各种工作,比如:推理,所以,我们现在的目标就是把每条简单的三元组&lt; subject, relation, object &gt; 编码为一个低维分布式向量。(有关【分布式表示】的概念请大家自行百度)

paper:Tra…


前言 目前网络上存在大量知识库(KBs):如OpenCyc、WordNet、Freebase、Dbpedia等等,它们因不同目的建成,因此很难用到其他系统上。为发挥知识库的图(graph)性,也为得到统计学习(包括机器学习和深度学习)的优势,需要将知识库嵌入(embedding)到一个低维空间里(比如10、20、50维)。获得向量后,就可以运用各种数学工具进行分析。 表示学习 :学习一系列低维稠密向量来表征语义信息,知识表示学习是面向知识库中实体和关系的表示学习。大规模知识库(知识图谱)的构建为许多NLP任
我们为什么要关注表示学习这个问题呢?我们可以看关于机器学习的一个重要公式,这个公式有三个部分组成,第一部分是关于数据或者问题的表示,在表示的基础上我们要去设计或者构建一个目标,也就是说我们要实现一个什么样的目标。在设定了目标之后,开始看怎么实现这个目标,这就是优化的过程。对于机器学习来讲,表示是这三个环节中最基础的部分,也是我们为什么会关注它的重要原因。对于自然语言处理和多媒体处理而言,所处理的数...
©️2020 CSDN 皮肤主题: 游动-白 设计师:白松林 返回首页

最新评论

目录

  • 2
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值