【源码复现】图神经网络之GAT

鲸可落

已于 2023-10-25 17:12:58 修改

阅读量302

点赞数 1

分类专栏：图神经网络文章标签：神经网络人工智能深度学习

于 2023-10-25 17:11:26 首次发布

本文链接：https://blog.csdn.net/qq_44426403/article/details/133893681

版权

图神经网络专栏收录该内容

20 篇文章 5 订阅

订阅专栏

一、论文介绍

论文作者——Petar Velickovic，Guillem Cucurull et al.
论文链接——GRAPH ATTENTION NETWORKS
论文源码——https://github.com/PetarV-/GAT

二、论文核心简介

前面提到的两种模型都是基于谱域的方法，这篇文章所复现的GAT是基于空域的方法。基于空域的方法试图设计不同的特征传播机制，利用网络的拓扑结构并沿着拓扑信息聚合结点表征。本文的GAT方法的灵感来自于注意力机制（self-attention），认为结点 $i$ 的每个邻居结点对结点 $i$ 重要性不同，或者说邻居结点的特征对于当前结点 $i$ 的特征学习得到的表征的贡献度不同，通过注意力机制，学习到每个邻居结点的贡献权重，来聚合表征信息。
输入是结点特征的集合 $\bold h =\{ \overrightarrow{h_1} ,\overrightarrow{h_2},...,\overrightarrow{h_N}\},\overrightarrow{h_i}\in \mathbb{R}^F,N$ 是结点的数量，输出是一个新的结点特征集 $\bold h^{'} =\{ \overrightarrow{h_1^{'}} ,\overrightarrow{h_2^{'}},...,\overrightarrow{h_N^{'}}\},\overrightarrow{h_i^{'}}\in \mathbb{R}^{F^{'}}$ 。
为了提高模型的表达能力，作者先对输入的特征进行一个共享的线性变换，其中参数矩阵 $\in \mathbb{R}^{F'\times F}$ 。然后使用一个共享的注意力机制 $a:\mathbb{R}^{F'} \times \mathbb{R}^{F'} \to \mathbb{R}$ ,来计算注意力系数：
$e_{ij}=a(W\overrightarrow{h_i},W\overrightarrow{h_j})$
表明结点 $j$ 对结点 $i$ 的重要程度。我们接下来在考虑到图的拓扑信息，这里仅考虑结点 $i$ 的一阶邻居结点 $\in \mathcal{N_i}$ ,为了使不同的结点可以比较，使用了 $so f t ma x$ 函数对所有的邻居结点标准化。
$\alpha_{ij} =softmax_j(e_{ij}) = \frac{exp(e_{ij})}{\sum_{k\in \mathcal{N_i} }exp(e_{ik})}$

注意力机制 $a$ 的计算是使用一个单层前馈神经网络计算，使用 $L e ak y R e LU$ 激活函数，即：
$\alpha_{ij} =softmax_j(e_{ij}) = \frac{exp(LeakyReLU(\overrightarrow{a}^T[W\overrightarrow{h_i}||W\overrightarrow{h_j}]))}{\sum_{k\in \mathcal{N_i} }exp(LeakyReLU(\overrightarrow{a}^T[W\overrightarrow{h_i}||W\overrightarrow{h_k}])}$
其中，||是连接操作。
通过计算得到了注意力系数 $\alpha_{ij}$ ，接下来就可以通过线性聚合来计算特征输出向量：
$\overrightarrow{h_i^{'}} =\sigma(\sum_{j\in \mathcal{N_i}}\alpha_{ij} W\overrightarrow{h_j})$
其中， $\sigma$ 为非线性激活函数。
作者发现将注意力机制扩展到多头注意力机制会使自注意力的学习过程更稳定，类似于Vaswani（2017）等人提出的。K是注意力机制头的数量：
$\overrightarrow{h_i^{'}} =||_{k=1}^K \sigma(\sum_{j\in \mathcal{N_i}}\alpha_{ij}^k W^k\overrightarrow{h_j})$
其中，||是连结操作。
最后一层采用平均操作:
$\overrightarrow{h_i^{'}} = \sigma(\frac{1}{K}\sum_{j\in \mathcal{N_i}}\alpha_{ij}^k W^k\overrightarrow{h_j})$
多头注意力机制的聚合过程如下图所示：
在这里插入图片描述

三、源码复现

这里仅仅列出了GAT的模型代码，全部代码详见
GAT源码

1、单层的注意力机制（单头）

import torch 
from torch import nn
from torch.nn import Module,Parameter
from torch.nn import functional as F

class GraphAttentionLayer(Module):
    def __init__(self,input_dim,output_dim,dropout,concat=True,alpha=0.2):
        super(GraphAttentionLayer,self).__init__()
        #输入输出维数
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.concat = concat
        self.leakyRelu = nn.LeakyReLU(alpha)
        self.dropout = dropout
        #设置参数
        self.W = Parameter(torch.empty(size=(input_dim,output_dim)))
        self.a = Parameter(torch.empty(size=(2*output_dim,1)))
        #self.W = Parameter(torch.FloatTensor(input_dim,output_dim))
        #self.a = Parameter(torch.FloatTensor(2*output_dim,1))
        self.init_parameter()

    def init_parameter(self):
        nn.init.xavier_uniform_(self.W.data,gain=1.414)
        nn.init.xavier_uniform_(self.a.data,gain=1.414)
    #聚合操作
    def concatenation(self,wh):
        # wh : N * output_dim
        # a : 2ouput_dim * 1
        # wh1 : N*1
        # wh2 : N*1
        wh1 = torch.matmul(wh,self.a[:self.output_dim,:])
        wh2 = torch.matmul(wh,self.a[self.output_dim:,:])
        #拼接
        #广播的方式进行拼接
        e = wh1 + wh2.T
        return self.leakyRelu(e)
        #torch.mm(a, b) 43是矩阵a和b矩阵相乘,只适用于二维矩阵
        #torch.matmul可以适用于高维  一维*二维；二维*一维；
        #torch.mul(a, b) 是矩阵 a 和 b 对应位相乘，a 和 b 的维度必须相等
    def forward(self,h,adj):
        # h : N*input_dim  
        # W : input_dim * output_dim
        # wh : N * ouput_dim
        #print(h.shape)
        #print(self.W.shape)
        wh = torch.mm(h,self.W)
        #获得处理后的矩阵
        # e : N*N
        e = self.concatenation(wh) 
        #将结点i 的非邻居设置为0
        zero_inf = -9e15*torch.ones_like(e)
        #选择结点 i 的邻居
        Neighbor = torch.where(adj>0,e,zero_inf)
        #按行softmax
        #N*N
        attention = F.softmax(Neighbor,dim=1)
        attention = F.dropout(attention,self.dropout,training=self.training)
        #head ： N * output_dim
        head = torch.matmul(attention,wh)
        if self.concat:
            return F.relu(head)#F.elu(head)
        else:
            return head

2、GAT完整模型（仅使用两层的GAT模型）

import torch
from torch.nn import Module,Parameter
from torch.nn import functional as F


from layers import GraphAttentionLayer


class GAT(Module):
    def __init__(self,input_dim,hid_dim,output_dim,num_h,dropout,alpha):
        super(GAT,self).__init__()
        #两层结构
        self.dropout = dropout
        #第一层
        self.MultiHeadAttention = [GraphAttentionLayer(input_dim,hid_dim,dropout,concat=True,alpha=alpha) for _ in range(num_h)]
        for i ,attention in enumerate(self.MultiHeadAttention):
            self.add_module(f"attention_{i}",attention)
        #第二层
        self.last_layer = GraphAttentionLayer(hid_dim*num_h,output_dim,dropout,concat=False,alpha=alpha)


    def forward(self,feature,adj):
        feature = F.dropout(feature,self.dropout,training=self.training)
        output = torch.cat([attention(feature,adj) for attention in self.MultiHeadAttention],dim=1)
        output = F.dropout(output,self.dropout,training=self.training)
        output = self.last_layer(output,adj)
        #output = F.relu(output)
        return F.log_softmax(output,dim=1)