TuckER:Tensor Factorization for Knowledge Graph Completion

最新推荐文章于 2022-11-23 13:04:50 发布

三石大数据

最新推荐文章于 2022-11-23 13:04:50 发布

阅读量1.7k

点赞数 5

分类专栏：知识图谱文章标签：人工智能算法 python 知识图谱图计算

本文链接：https://blog.csdn.net/qq_42397330/article/details/116290128

版权

知识图谱专栏收录该内容

9 篇文章

订阅专栏

论文来源：ICML2019

论文链接：https://arxiv.org/abs/1901.09590

代码链接：https://github.com/ibalazevic/TuckER

总结：这篇文章利用了一个高级的公式，可能很多人看到这个公式就怕了，确实如此，我看到这个公式就晕了，不知道其中的具体含义，其实如果你实在搞不懂具体的意思，也可以忽略，大致理解作者的想法就行了，我是在看完一遍paper之后，然后去看了代码，于是对这篇文章的训练过程就有了一定的了解，然后又将paper看了一遍，大家也可以采用这种学习方法去阅读paper。

1. 背景知识

1.1 Tucker Decomposition

定义：将一个张量分解为一组矩阵和一个核张量(core tensor)
公式如下：
$\mathcal{X} \approx \mathcal{Z} \times_{1} \mathbf{A} \times{ }_{2} \mathbf{B} \times{ }_{3} \mathbf{C}$
其中： $\mathcal{X} \in \mathbb{R}^{I \times J \times K }$ 、 $\mathcal{Z} \in \mathbb{R}^{P \times Q \times R}$ 、 $\in \mathbb{R}^{I \times P}$ 、 $\in \mathbb{R}^{J \times Q}$ 、 $\in \mathbb{R}^{K \times R}$
公式解释：
- $\times_{n}$ 表示沿着模式n的张量乘积（可以简单理解为一种计算公式的简化写法）
- A、B、C可以理解为每种模式下的主成分。
- $\mathcal{Z}$ 中的每个元素代表了不同成分之间的交互程度。
- 其中 P、Q 、R 分别小于 I、J、K，所以也可以认为 $\mathcal{Z}$ 是 $\mathcal{X}$ 的压缩版本

2. 模型架构

在这里插入图片描述

评分函数：
$\phi\left(e_{s}, r, e_{o}\right)=\mathcal{W} \times_{1} \mathbf{e}_{s} \times_{2} \mathbf{w}_{r} \times_{3} \mathbf{e}_{o}$
其中： $\mathbf{e}_{s}、\mathbf{w}_{r}、\mathbf{e}_{o}$ 表示头实体、关系和尾实体的嵌入； $d_e、d_r$ 分别表示实体和关系的嵌入维度； $\mathcal{W} \in \mathbb{R}^{d_e \times w_r \times d_e}$ 表示核张量

3. 模型训练

将上述评分函数得到的结果输入到sigmoid函数中，会得到一个概率值，然后计算下面的损失值：
$L=-\frac{1}{n_{e}} \sum_{i=1}^{n_{e}}\left(\mathbf{y}^{(i)} \log \left(\mathbf{p}^{(i)}\right)+\left(1-\mathbf{y}^{(i)}\right) \log \left(1-\mathbf{p}^{(i)}\right)\right)$
在看代码之前，我认为这个评分函数是对整个三元组的评分，实际上代码中并不是这个意思
前向传播计算输出的是，一个矩阵，大小为(batch,len(entity))，输入的是一个batch的头实体和关系，相当于是在预测尾实体出现在每个位置的概率值。然后将这个概率值同目标位置组成的矩阵(正确位置为1)计算loss（可以理解为二分类问题）

4. 核心代码以及解释

class TuckER(torch.nn.Module):
    def __init__(self, d, d1, d2, **kwargs):
        '''
        :param d: 数据集
        :param d1: 实体嵌入维度 200
        :param d2: 关系嵌入维度 200
        :param kwargs: 字典
        '''
        super(TuckER, self).__init__()
        self.E = torch.nn.Embedding(len(d.entities), d1)
        self.R = torch.nn.Embedding(len(d.relations), d2)
        self.W = torch.nn.Parameter(torch.tensor(np.random.uniform(-1, 1, (d2, d1, d1)),
                                                 dtype=torch.float, device="cuda", requires_grad=True))
        self.input_dropout = torch.nn.Dropout(kwargs["input_dropout"])
        self.hidden_dropout1 = torch.nn.Dropout(kwargs["hidden_dropout1"])
        self.hidden_dropout2 = torch.nn.Dropout(kwargs["hidden_dropout2"])
        self.loss = torch.nn.BCELoss()	# 类似于二分类的交叉熵损失
        self.bn0 = torch.nn.BatchNorm1d(d1)
        self.bn1 = torch.nn.BatchNorm1d(d1)
        torch.nn.init.xavier_normal_(self.E.weight.data)
        torch.nn.init.xavier_normal_(self.R.weight.data)

    def forward(self, e1_idx, r_idx):
        '''简单理解: pred=e1*r*W*E'''
        e1 = self.E(e1_idx)
        x = self.bn0(e1)
        x = self.input_dropout(x)  # [128,200]
        x = x.view(-1, 1, e1.size(1))  # [128,1,200]
		
        r = self.R(r_idx)
        W_mat = torch.mm(r, self.W.view(r.size(1), -1))  # [128,40000]
        W_mat = W_mat.view(-1, e1.size(1), e1.size(1))  # [128,200,200]
        W_mat = self.hidden_dropout1(W_mat)

        x = torch.bmm(x, W_mat)  # [128,1,200]
        x = x.view(-1, e1.size(1))  # [128,200]
        x = self.bn1(x)
        x = self.hidden_dropout2(x)
        x = torch.mm(x, self.E.weight.transpose(1, 0))  # transpose相当于转置 [128,14541]
        pred = torch.sigmoid(x)
        return pred

模型分析：
- 这个模型中要训练的参数一共有13个
```
W
E.weight
R.weight
bn0.weight
bn0.bias
bn0.running_mean
bn0.running_var
bn0.num_batches_tracked
bn1.weight
bn1.bias
bn1.running_mean
bn1.running_var
bn1.num_batches_tracked
```
- 训练过程中加入了反关系的三元组，例如 $(h, r, t)$ 和 $t,r^{-1},h)$