Graph Attention Network 图注意力网络 (二) 模型定义

本文链接：https://blog.csdn.net/weixin_36474809/article/details/89447533

本文深入剖析Graph Attention Network (GAT)模型，探讨其在图数据上的注意力机制实现细节，包括模型结构定义、图注意力层工作原理及GAT整体模型的公式推导。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

目的：前面详解了GAT（Graph Attention Network）的论文，并且概览了代码，我们需要对于原论文查看模型结构如何定义的。

图注意力网络(GAT) ICLR2018, Graph Attention Network论文详解

代码地址：https://github.com/Diego999/pyGAT

论文地址：This is a pytorch implementation of the Graph Attention Network (GAT) model presented by Veličković et. al (2017, https://arxiv.org/abs/1710.10903). ICLR 2018

二、图注意力层graph attention layer

一、模型结构定义

1.1 定义位置

train.py之中，调用模型

# Model and optimizer
if args.sparse:
    model = SpGAT(nfeat=features.shape[1], 
                nhid=args.hidden, 
                nclass=int(labels.max()) + 1, 
                dropout=args.dropout, 
                nheads=args.nb_heads, 
                alpha=args.alpha)
else:
    model = GAT(nfeat=features.shape[1], 
                nhid=args.hidden, 
                nclass=int(labels.max()) + 1, 
                dropout=args.dropout, 
                nheads=args.nb_heads, 
                alpha=args.alpha)
optimizer = optim.Adam(model.parameters(), 
                       lr=args.lr, 
                       weight_decay=args.weight_decay)

1.2 输入参数

    model = SpGAT(nfeat=features.shape[1], 
                nhid=args.hidden, 
                nclass=int(labels.max()) + 1, 
                dropout=args.dropout, 
                nheads=args.nb_heads, 
                alpha=args.alpha)

nfeat,即输入的特征的个数，但是我们不知道是论文中所讲的F还是F'
nhid,即隐层单元的个数，默认是8
nclass，需要输出的分类数，这个肯定为F'
dropout,dropout的概率，(1 - keep probability)，默认为0.6
nheads，注意力机制之中head的个数，也就是论文中的K
alpha，LeakeyRelu的小于零的斜率，默认为0.2

1.3 GAT网络结构

模型初始化

class GAT(nn.Module):
    def __init__(self, nfeat, nhid, nclass, dropout, alpha, nheads):
        """Dense version of GAT."""
        super(GAT, self).__init__()
        self.dropout = dropout

        self.attentions = [GraphAttentionLayer(nfeat, nhid, dropout=dropout, alpha=alpha, concat=True) for _ in range(nheads)]
        for i, attention in enumerate(self.attentions):
            self.add_module('attention_{}'.format(i), attention)

        self.out_att = GraphAttentionLayer(nhid * nheads, nclass, dropout=dropout, alpha=alpha, concat=False)

前馈运算

    def forward(self, x, adj):
        x = F.dropout(x, self.dropout, training=self.training)
        x = torch.cat([att(x, adj) for att in self.attentions], dim=1)
        x = F.dropout(x, self.dropout, training=self.training)
        x = F.elu(self.out_att(x, adj))
        return F.log_softmax(x, dim=1)

二、图注意力层graph attention layer

2.1 论文中layer公式

作者通过masked attention将这个注意力机制引入图结构之中，masked attention的含义：只计算节点 i 的相邻的节点 j

节点 j 为，其中Ni为节点i的所有相邻节点。为了使得互相关系数更容易计算和便于比较，我们引入了softmax对所有的i的相邻节点j进行正则化：

实验之中，注意力机制a是一个单层的前馈神经网络，通过权值向量来确定，并且加入了 LeakyRelu的非线性激活，这里小于零斜率为0.2。（这里我们回顾下几种Relu函数，relu:小于0就是0，大于零斜率为1；LRelu:小于零斜率固定一个值，大于零斜率为1；PRelu:小于零斜率可变，大于零斜率为1；还有CRelu,Elu,SELU）。注意力机制如下：

，也是我们前面需要得到的注意力互相关系数

在模型中应用相互注意机制a（Whi，Whj），通过权重向量 a 参数化，应用 LeakyReLU 激活

模型权重为
转置表示为T
concatenation 用 || 表示
公式含义就是权值矩阵与F'个特征相乘，然后节点相乘后并列在一起，与权重相乘，LRelu激活后指数操作得到softmax的分子

2.2 初始化

初始化时，定义模型中需要的参数W和a，论文中有描述

    def __init__(self, in_features, out_features, dropout, alpha, concat=True):
        super(GraphAttentionLayer, self).__init__()
        self.dropout = dropout
        self.in_features = in_features
        self.out_features = out_features
        self.alpha = alpha
        self.concat = concat

        self.W = nn.Parameter(torch.zeros(size=(in_features, out_features)))
        nn.init.xavier_uniform_(self.W.data, gain=1.414)
        self.a = nn.Parameter(torch.zeros(size=(2*out_features, 1)))
        nn.init.xavier_uniform_(self.a.data, gain=1.414)

        self.leakyrelu = nn.LeakyReLU(self.alpha)

三、GAT模型

3.1 论文中公式

在上面的output feature加入计算multi-head的运算公式:

concate操作为||
第k个注意力机制为
共大K个注意力机制需要考虑，小k表示大K中的第k个
输入特征的线性变换表示为
最终的输出为h' 共由KF' 个特征影响

3.2 前馈运算

    def forward(self, input, adj):
        h = torch.mm(input, self.W)
        N = h.size()[0]

        a_input = torch.cat([h.repeat(1, N).view(N * N, -1), h.repeat(N, 1)], dim=1).view(N, -1, 2 * self.out_features)
        e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2))

        zero_vec = -9e15*torch.ones_like(e)
        attention = torch.where(adj > 0, e, zero_vec)
        attention = F.softmax(attention, dim=1)
        attention = F.dropout(attention, self.dropout, training=self.training)
        h_prime = torch.matmul(attention, h)

        if self.concat:
            return F.elu(h_prime)
        else:
            return h_prime