GGS-NNs + babi任务

FakeOccupational

已于 2022-02-26 13:08:41 修改

阅读量787

点赞数 1

分类专栏：深度学习文章标签：算法

于 2022-02-25 16:31:24 首次发布

本文链接：https://blog.csdn.net/ResumeProject/article/details/123117540

版权

图神经网络 bAbI任务推理能力传播模型输出模型

关键词由CSDN通过智能技术生成

深度学习专栏收录该内容

162 篇文章 19 订阅

订阅专栏

数据符号

graph	$G = (V, E)$
node	$v$
edge	$e = (v, v^{'}), (v, v^{'}) 表示由 v 到 v^{'} 的边$
node representation	$h_v\in R^D$
node label	$l_v$
顶点集合	$h_S=\{ h_v \vert v \in S \},when \ S \ is \ a \ set \ of \ nodes$
边集合	$l_S=\{ l_e \vert v \in S \},when \ S \ is \ a \ set \ of \ edges$
v的前驱节点	$I_N(v)=\{ v'\vert (v',v)\in E \}$
v的后继节点	$O_{UT}(v)= \{ v'\vert (v,v')\in E \}$
v的邻接节点	$N_{BR}(v)=I_N(v) ∪ O_{UT}(v)$
关联边	$C_O(v)$ = $\{(v', v'') ∈ E \vert v = v'\vee v = v''\}$

GGS-NNs

在这里插入图片描述

此网络是由一系列的GG-NNs算子串联而成的。
$\left\{\begin{array}{l} F_o^{(k)}: x^{(k)} \rightarrow o^{(k)} \\ F_x^{(k)}:x^{(k)} \rightarrow x^{(k+1)} \end{array}\right. \\ {\tiny 二者都含有传播模型与输出模型，}\\ {\tiny 并且二者可以公用同样的传播模型而使用不同的输出模型。}\\ 传播模型中的“节点”矩阵H^{(k,t)}=[h_1^{k,t};…;[h_{|V|}^{k,t}]^T \in R^{|V|\times D}表示第t次传播的第k次的输出。\\ 初始化：h在第k步的传播中，H^{(k,1)}仍然使用之前的x^{(k)}+0-extending 方法$

节点annotation输出模型

$从H^{(k,T)} 预测 X^{(k+1)}$

在这里插入图片描述

训练网络的两种方法、

$\left\{\begin{array}{l} 详列中间的annotations \ X^{(k)} \\ 端到端提供X^{(1)} ，图和目标序列 \end{array}\right.$
           当我们有关于特定中间信息的领域知识时,前者可提高性能，这些信息应该在节点的内部状态中表示，而后者更一般。
           Sequence outputs with observed annotations:考虑一个图的预测序列的任务，其中每个预测仅是图的一部分。为了确保我们准确地预测图的每个部分的输出一次，每个节点有一个bit就足够了，这表明到目前为止节点是否已被“解释”。在某些设置中，少量注释足以捕获输出过程的状态。在这种情况下，我们可能希望通过指示目标中间注释的标签直接将该信息输入到模型中。在某些情况下，这些注释可能就足够了，因为我们可以定义一个模型，其中GG-NNs可以有条件地渲染给定annotation。
           在这种情况下，给定annotation $X^{(k)}$ ，序列预测任务分解为单步预测任务，并且可以训练为单独的GG-NNs。在测试时，来自一个步骤的预测annotatin将被用作下一步的输入。这类似于完全观察数据时训练定向图形模型。

Sequence outputs with latent annotations 更一般地，当训练期间中间节点annotation $X^{(k)}$ 不可用时，我们将它们视为网络中的隐藏单元，并通过在整个序列中反向传播来联合训练整个模型。

BABI TASKS

           任务旨在测试人工智能系统应该具备的推理能力。在bAbI套件中，有20项任务测试推理的基本形式，如演绎、归纳、计数和路径发现。
           我们定义了一个基本的转换过程，将bAbI任务映射到GG-NNs或GGS-NNs。我们使用发布的bAbI代码中的–symbolic选项来获取只涉及实体之间关系序列的故事，然后将其转换为图形。每个实体映射到一个节点，每个关系映射到一条边，边标签由关系给出。整个故事将被映射到单个图。
           在数据中:问题以eval标记，由问题类型 (e.g., has fear),和一些参数(e.g., one or more nodes)组成。参数被转换为初始节点注释，将第i个参数节点的注释向量的第i位设置为1。

$\ E > A \ true \left\{\begin{array}{l} E的annotation \ \ x^{(1)}_E = [1, 0]^T \\ A的annotation \ \ x^{(1)}_A = [0, 1]^T \\ 其它annotation \ \ x^{(1)}_v = [0, 0]^T \end{array}\right.$
它的Question type is 1 (对于‘>’) and 输出类别 1 (对于 ‘true’)。有些任务有多种问题类型，例如，任务4有4种问题类型：e、s、w、n。对于这样的情况，我们简单的为每个任务分别训练GG-NN。我们在任何实验中都不使用强监督标签，也不给GGS-NNs任何中间注释。
虽然简单，这种转换并不能保存有关故事的所有信息y (e.g., it discards temporal order of the inputs),，并且它不容易处理三元和高阶关系 (e.g., Yesterday John went to the garden is not easily mapped to a simple edge).我们还强调，将一般自然语言映射为符号形式是一项非常重要的任务，¹所以我们不能直接将这种方法应用于任意的自然语言。放松这些限制将留给未来的工作。

然而，即使通过这种简单的转换，也可以制定各种各样的bAbI任务，包括任务19（路径发现），这可以说是最难的任务。我们提供了基线，以表明符号表示对RNN或LSTM没有显著帮助，并表明GGS-NNs通过少量训练实例解决了这个问题。
我们还开发了两个新的类似bAbI的任务，涉及在图上输出序列：最短路径和欧拉回路的一种简单形式（在随机连通的2-正则图上）。这些实验的目的是说明GGS-NNs在各种问题上的能力。

Example1.下面是bAbI任务15“基本演绎推断”符号数据集的一个实例。

在这里插入图片描述            前8行描述了事实，GG-NN将使用这些事实构建一个图。大写字母是节点，is和has_fear被解释为边标签或边类型。最后4行是针对该输入数据提出的4个问题。has_fear在这些行中被解释为问题类型。
           对于这项任务，每个问题中只有一个节点是特殊的，e.g. the B in eval B has fear, 我们为这个特殊节点的注释向量指定一个值1，为所有其他节点指定一个值0。
           对于RNN和LSTM，数据被转换为token序列，如下所示：
在这里插入图片描述
           其中，n(id)是节点，e(id)是边，q(id)是问题类型，添加额外的标记eol（行尾符）和ans（答案），使RNN&LSTM能够访问数据集中可用的完整信息。最后一个数字是类别标签。

代码中的网路结构

超参数：
           propogation steps number of GGNN=5
           GGNN hidden state size = 4
           opt.annotation_dim = 1 # for bAbI

import torch
import torch.nn as nn

class AttrProxy(object):
    """
    Translates index lookups into attribute lookups.
    To implement some trick which able to use list of nn.Module in a nn.Module
    see https://discuss.pytorch.org/t/list-of-nn-module-in-a-nn-module/219/2
    """
    def __init__(self, module, prefix):
        self.module = module
        self.prefix = prefix

    def __getitem__(self, i):
        return getattr(self.module, self.prefix + str(i))


class Propogator(nn.Module):
    """
    Gated Propogator for GGNN
    Using LSTM gating mechanism
    """
    def __init__(self, state_dim, n_node, n_edge_types):
        super(Propogator, self).__init__()

        self.n_node = n_node
        self.n_edge_types = n_edge_types

        self.reset_gate = nn.Sequential(
            nn.Linear(state_dim*3, state_dim),
            nn.Sigmoid()
        )
        self.update_gate = nn.Sequential(
            nn.Linear(state_dim*3, state_dim),
            nn.Sigmoid()
        )
        self.tansform = nn.Sequential(
            nn.Linear(state_dim*3, state_dim),
            nn.Tanh()
        )

    def forward(self, state_in, state_out, state_cur, A):
        A_in = A[:, :, :self.n_node*self.n_edge_types]
        A_out = A[:, :, self.n_node*self.n_edge_types:]

        a_in = torch.bmm(A_in, state_in)
        a_out = torch.bmm(A_out, state_out)
        a = torch.cat((a_in, a_out, state_cur), 2)

        r = self.reset_gate(a)
        z = self.update_gate(a)
        joined_input = torch.cat((a_in, a_out, r * state_cur), 2)
        h_hat = self.tansform(joined_input)

        output = (1 - z) * state_cur + z * h_hat

        return output


class GGNN(nn.Module):
    """
    Gated Graph Sequence Neural Networks (GGNN)
    Mode: SelectNode
    Implementation based on https://arxiv.org/abs/1511.05493
    """
    def __init__(self, opt):
        super(GGNN, self).__init__()

        assert (opt.state_dim >= opt.annotation_dim,  \
                'state_dim must be no less than annotation_dim')

        self.state_dim = opt.state_dim
        self.annotation_dim = opt.annotation_dim
        self.n_edge_types = opt.n_edge_types
        self.n_node = opt.n_node
        self.n_steps = opt.n_steps

        for i in range(self.n_edge_types):
            # incoming and outgoing edge embedding
            in_fc = nn.Linear(self.state_dim, self.state_dim)
            out_fc = nn.Linear(self.state_dim, self.state_dim)
            self.add_module("in_{}".format(i), in_fc)
            self.add_module("out_{}".format(i), out_fc)

        self.in_fcs = AttrProxy(self, "in_")
        self.out_fcs = AttrProxy(self, "out_")

        # Propogation Model
        self.propogator = Propogator(self.state_dim, self.n_node, self.n_edge_types)

        # Output Model
        self.out = nn.Sequential(
            nn.Linear(self.state_dim + self.annotation_dim, self.state_dim),
            nn.Tanh(),
            nn.Linear(self.state_dim, 1)
        )

        self._initialization()

    def _initialization(self):
        for m in self.modules():
            if isinstance(m, nn.Linear):
                m.weight.data.normal_(0.0, 0.02)
                m.bias.data.fill_(0)

    def forward(self, prop_state, annotation, A):
        for i_step in range(self.n_steps):
            in_states = []
            out_states = []
            for i in range(self.n_edge_types):
                in_states.append(self.in_fcs[i](prop_state))
                out_states.append(self.out_fcs[i](prop_state))
            in_states = torch.stack(in_states).transpose(0, 1).contiguous()
            in_states = in_states.view(-1, self.n_node*self.n_edge_types, self.state_dim)
            out_states = torch.stack(out_states).transpose(0, 1).contiguous()
            out_states = out_states.view(-1, self.n_node*self.n_edge_types, self.state_dim)

            prop_state = self.propogator(in_states, out_states, prop_state, A)

        join_state = torch.cat((prop_state, annotation), 2)

        output = self.out(join_state)
        output = output.sum(2)

        return output

dataset 中：

bAbIDataset类的getitem方法：

    def __getitem__(self, index):
        am = create_adjacency_matrix(self.data[index][0], self.n_node, self.n_edge_types)
        annotation = self.data[index][1]
        target = self.data[index][2] - 1
        return am, annotation, target

每次item返回的数据：
在这里插入图片描述

其中的create_adjacency_matrix方法：

def create_adjacency_matrix(edges, n_nodes, n_edge_types):
    a = np.zeros([n_nodes, n_nodes * n_edge_types * 2])
    for edge in edges:
        src_idx = edge[0]
        e_type = edge[1]
        tgt_idx = edge[2]
        a[tgt_idx-1][(e_type - 1) * n_nodes + src_idx - 1] =  1
        a[src_idx-1][(e_type - 1 + n_edge_types) * n_nodes + tgt_idx - 1] =  1
    return a

每个edge数组的内容[src,e_type,tgt]：

由这些edge构造邻接矩阵：
在这里插入图片描述
邻接矩阵：

训练函数

def train(epoch, dataloader, net, criterion, optimizer, opt):
    net.train()
    for i, (adj_matrix, annotation, target) in enumerate(dataloader, 0):
        net.zero_grad()
# adj_matrix torch.Size([10, 8, 32])
# annotation torch.Size([10, 8, 1])
# target 10
        padding = torch.zeros(len(annotation), opt.n_node, opt.state_dim - opt.annotation_dim).double()# 在annotation后添加的0
        
        init_input = torch.cat((annotation, padding), 2)
        init_input = Variable(init_input)
        adj_matrix = Variable(adj_matrix)
        annotation = Variable(annotation)
        target = Variable(target)

        output = net(init_input, annotation, adj_matrix)

        loss = criterion(output, target)

        loss.backward()
        optimizer.step()

        if i % int(len(dataloader) / 10 + 1) == 0 and opt.verbal:
            print('[%d/%d][%d/%d] Loss: %.4f' % (epoch, opt.niter, i, len(dataloader), loss.data[0]))

# init_input, annotation, adj_matrix
# prop_state, annotation, A
    def forward(self, prop_state, annotation, A):
        for i_step in range(self.n_steps):
            in_states = []
            out_states = []
            for i in range(self.n_edge_types):
                in_states.append(self.in_fcs[i](prop_state))# 通过一个liner
                out_states.append(self.out_fcs[i](prop_state))
            in_states = torch.stack(in_states).transpose(0, 1).contiguous()
            in_states = in_states.view(-1, self.n_node*self.n_edge_types, self.state_dim)
            out_states = torch.stack(out_states).transpose(0, 1).contiguous()
            out_states = out_states.view(-1, self.n_node*self.n_edge_types, self.state_dim)

            prop_state = self.propogator(in_states, out_states, prop_state, A)
			# in_states   torch.Size([10, 16, 4])
			# out_states   torch.Size([10, 16, 4])
			# prop_state   torch.Size([10, 8, 4])
			# A      torch.Size([10, 8, 32])
        join_state = torch.cat((prop_state, annotation), 2)
        output = self.out(join_state)# liner
        output = output.sum(2)

        return output

prop_state = self.propogator(in_states, out_states, prop_state, A)
			# in_states   torch.Size([10, 16, 4])
			# out_states   torch.Size([10, 16, 4])
			# prop_state   torch.Size([10, 8, 4])
			# A      torch.Size([10, 8, 32])
propogator的传播函数：
    def forward(self, state_in, state_out, state_cur, A):
        A_in = A[:, :, :self.n_node*self.n_edge_types] #torch.Size([10, 8, 16])
        A_out = A[:, :, self.n_node*self.n_edge_types:] #torch.Size([10, 8, 16])

        a_in = torch.bmm(A_in, state_in)
        a_out = torch.bmm(A_out, state_out)
        a = torch.cat((a_in, a_out, state_cur), 2)

        r = self.reset_gate(a)
        z = self.update_gate(a)
        joined_input = torch.cat((a_in, a_out, r * state_cur), 2)
        h_hat = self.tansform(joined_input)

        output = (1 - z) * state_cur + z * h_hat

        return output

参考

babi任务
 GitHub实现

Although the bAbI data is quite templatic, so it is straightforward to hand-code a parser that will work for the bAbI data; the symbolic option removes the need for this. ↩︎