Graph Attention Network 图注意力网络 (三) 更改邻接masked attention

最新推荐文章于 2024-05-28 12:31:11 发布

祥瑞Coding

最新推荐文章于 2024-05-28 12:31:11 发布

阅读量5.6k

点赞数 7

分类专栏：机器学习 PyTorch 图神经网络GNN

本文链接：https://blog.csdn.net/weixin_36474809/article/details/90693821

版权

机器学习同时被 3 个专栏收录

133 篇文章 51 订阅

订阅专栏

PyTorch

27 篇文章 11 订阅

订阅专栏

图神经网络GNN

16 篇文章 74 订阅

订阅专栏

背景：需要将GAT实现在resnet的预测score之上，并且将masked attention运用上。

Graph Attention Network (一) 训练运行与代码概览

Graph Attention Network (二) 模型定义

博主代码地址：https://github.com/Xingxiangrui/masked_graph_attention_network/blob/master/masked_gat_after_resnet.py

三、代码实现masked attention

3.1 矩阵对应位置相乘（哈达马积）

3.2 masked矩阵

3.3 masked attention

一、预测输出加上GAT

1.1 resnet结构

与resnet结构一致，最终根据需要输出的维度决定输出的fc维度。

class Head(nn.Module):
    def __init__(self, nclasses):
        super(Head, self).__init__()
        self.gmp = nn.AdaptiveMaxPool2d(1)
        self.fc = nn.Linear(2048, nclasses)

    def forward(self, x):
        B, C, _, _ = x.size()
        x = self.gmp(x).view(B, C)
        x = self.fc(x)
        return x


class Resnet(nn.Module):
    def __init__(self, nclasses, backbone):
        super(Resnet, self).__init__()
        if backbone == 'resnet101':
            model = models.resnet101(pretrained=False)
            print('load pretrained resnet101 model from local...')
            model.load_state_dict(torch.load('./resnet101-5d3b4d8f.pth'))
        else:
            raise Exception()
        self.features = nn.Sequential(
            model.conv1,
            model.bn1,
            model.relu,
            model.maxpool,
            model.layer1,
            model.layer2,
            model.layer3,
            model.layer4, )
        self.heads = Head(nclasses)

    def forward(self, x, embedding=None):
        x = self.features(x)  # [B,2048,H,W]
        x = self.heads(x)
        return x

1.2 加入GAT

将GATLayer加入

class Head(nn.Module):
    def __init__(self, nclasses):
        super(Head, self).__init__()
        self.gmp = nn.AdaptiveMaxPool2d(1)
        self.fc = nn.Linear(2048, nclasses)
        self.gat = BGATLayer(in_features=1, out_features=1, dropout=0, alpha=0.2)

需要保证输入的size为 [batch, n_classes, channels ]，用view函数即可实现

    def forward(self, x):
        # size [batch, channels=2048, W=14 , H=14 ]
        B, C, _, _ = x.size()
        # output x [ Batch, channels , W]
        x = self.gmp(x).view(B, C)
        x = self.fc(x)
        x=x.view(x.size(0),x.size(1),1)
        residual=x
        output=residual+self.gat(x)

1.3 输出维度

输出为 [batch, n_classes=80, class_channels=1 ]

输出时需要将第三维度去掉，

参考：torch中维度相关变换

torch.squeeze() / torch.unsqueeze()
torch.squeeze(n)函数表示压缩tensor中第n维为1的维数，比如下面第一个，b.squeeze(2).size()，原始的b为上面的torch.Size([1, 3, 2])，第二维是2≠1，所以不压缩，尺寸保持不变；而若b.squeeze(0).size()，则发现第一维为1，因此压缩为3x2的tensor

        x = self.gmp(x).view(B, C)
        x = self.fc(x)
        x=x.view(x.size(0),x.size(1),1)
        residual=x
        output=residual+self.gat(x)
        output=output.squeeze(2)
        return output

二、masked attention推导

2.1 特征提取与注意力机制

为了得到相应的输入与输出的转换，我们需要根据输入的feature至少一次线性变换得到输出的feature，所以我们需要对所有节点训练一个权值矩阵：，这个权值矩阵就是输入与输出的F个feature与输出的F'个feature之间的关系。

We then perform self-attention on the nodes—a shared attentional mechanism，针对每个节点实行self-attention的注意力机制，机制为

注意力互相关系数为attention coefficients：

这个公式表示的节点 j 对于节点 i 的重要性，而不去考虑图结构性的信息
如前面所言，向量h就是 feature向量
下标i，j表示第i个节点和第j个节点

作者通过masked attention将这个注意力机制引入图结构之中，masked attention的含义：只计算节点 i 的相邻的节点 j

节点 j 为，其中Ni为节点i的所有相邻节点。为了使得互相关系数更容易计算和便于比较，我们引入了softmax对所有的i的相邻节点j进行正则化：

实验之中，注意力机制a是一个单层的前馈神经网络，通过权值向量来确定，并且加入了 LeakyRelu的非线性激活，这里小于零斜率为0.2。（这里我们回顾下几种Relu函数，relu:小于0就是0，大于零斜率为1；LRelu:小于零斜率固定一个值，大于零斜率为1；PRelu:小于零斜率可变，大于零斜率为1；还有CRelu,Elu,SELU）。注意力机制如下：

，也是我们前面需要得到的注意力互相关系数

在模型中应用相互注意机制a（Whi，Whj），通过权重向量 a 参数化，应用 LeakyReLU 激活

模型权重为
转置表示为T
concatenation 用 || 表示
公式含义就是权值矩阵与F'个特征相乘，然后节点相乘后并列在一起，与权重相乘，LRelu激活后指数操作得到softmax的分子

2.2 邻接的选取

，也是我们前面需要得到的注意力互相关系数

如果i，j之间没有连接，可以将LeakeyReLU之后的结果设为0，即softmax之后的结果设为1.

2.3 加入的位置

代码：

    def forward(self, x):
        # [B,N,C]
        B, N, C = x.size()
        # h = torch.bmm(x, self.W.expand(B, self.in_features, self.out_features))  # [B,N,C]
        h = torch.matmul(x, self.W)  # [B,N,C]
        a_input = torch.cat([h.repeat(1, 1, N).view(B, N * N, C), h.repeat(1, N, 1)], dim=2).view(B, N, N,
                                                                                                  2 * self.out_features)  # [B,N,N,2C]
        # temp = self.a.expand(B, self.out_features * 2, 1)
        # temp2 = torch.matmul(a_input, self.a)
        attention = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(3))  # [B,N,N]

        attention = F.softmax(attention, dim=2)  # [B,N,N]
        attention = F.dropout(attention, self.dropout, training=self.training)
        h_prime = torch.bmm(attention, h)  # [B,N,N]*[B,N,C]-> [B,N,C]
        out = F.elu(h_prime + self.beta * h)
        return out

即在leakey_ReLU之后加入此矩阵。即attention，如果coco的互相关概率小于某个阈值，则此值置为0.

三、代码实现masked attention

3.1 矩阵对应位置相乘（哈达马积）

需要实现矩阵对应点的相乘。即aij*bij=cij，两个形状一样的矩阵，其中对应的点相乘。

对应点相乘，x.mul(y) ，即点乘操作，点乘不求和操作，又可以叫作Hadamard product；点乘再求和，即为卷积

    data = [[1,2], [3,4], [5, 6]]
    tensor = torch.FloatTensor(data)
     
    tensor
    Out[27]:
    tensor([[ 1.,  2.],
            [ 3.,  4.],
            [ 5.,  6.]])
     
    tensor.mul(tensor)
    Out[28]:
    tensor([[  1.,   4.],
            [  9.,  16.],
            [ 25.,  36.]])

即tensor.mul(another_tensor)

3.2 masked矩阵

即小于某值即为0，否则为1

需要用到np.where

        with open('coco_correlations.pkl', 'rb') as f:
            print("loading coco_correlations.pkl ")
            correlations = pickle.load(f)
        # with open('coco_names.pkl', 'rb') as f:
        #     print("loading coco_names.pkl")
        #     self.names = pickle.load(f)
        self.coco_correlation_A_B = correlations['pp']
        self.probability_filter_threshold=0.1
        self.mask=torch.FloatTensor(np.where(self.coco_correlation_A_B>self.probability_filter_threshold,1,0))

即将条件概率矩阵改为阈值相关，小于阈值设为0，大于阈值设为1.座位mask。

3.3 masked attention

leakeyReLU之后乘以masked attention

        attention = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(3))  # [Batch,N_clsses,N_classes]

        # fixme add masked attention
        attention= attention.mul(self.mask)

        attention = F.softmax(attention, dim=2)  # [B,N,N]

四、运行与调试

4.1 显卡中的Tensor

为了保证mask能与相应的张量在GPU中运算，需要在forward之中进行相乘

self.mask=torch.FloatTensor(np.where(self.coco_correlation_A_B>self.probability_filter_threshold,1,0))

        # fixme add masked attention
        mask=self.mask.cuda()
        attention= attention.mul(mask)

init的时候不要加cuda，再forward的时候再加cuda。

祥瑞Coding

关注

7
点赞
踩
40

收藏

觉得还不错? 一键收藏
打赏
16
评论
Graph Attention Network 图注意力网络 (三) 更改邻接masked attention

背景：需要将GAT实现在resnet的预测score之上，并且将masked attention运用上。相关内容：图注意力网络(GAT) ICLR2018, Graph Attention Network论文详解 Graph Attention Network (一) 训练运行与代码概览 Graph Attention Network (二) 模型定义博主代码地址：https...
复制链接

扫一扫