GAT学习小结&torch实现

 本博客是 图神经网络GNN/GCN自学笔记&基础代码实现-CSDN博客 的续写

图注意力神经网络(Graph Attention Networks)

GAT原理认识

跟GCN一样,联系邻点来进行训练,但略有区别:GAT提出了在计算上添加Attention

对于每一个点i,逐个计算自己和它的邻居,并且引入了共享参数w;

共享参数W在计算Eij前先对Hi和Hj进行了一次线性映射,对于顶迪阿尼的特征进行了增维(一种常见的Feature Augment特征增强方法)

点i和j之间的相关性,就是通过可学习参数W和映射a完成,

之后,把要把所有点联系起来,就可以做好注意力系数了,直接用一个Softmax层进行归一化即可

根据已经计算好的注意力系数,把特征加权求和一下,得到的就是每一个点i的新特征;

右边等式括号外是套用了激活该函数;

最后用multi-head attention来优化一下效果(括号内1到k的求和);

与GCN的区别&深入理解:

在计算图卷积网络(GCN)时,通常是根据点的邻点来进行计算。GCN是一种半监督的图神经网络模型,利用图结构上的邻接关系来进行计算。对于每个节点,GCN会聚集该节点的邻接节点的信息并进行计算,然后将聚集后的信息与该节点的特征进行融合。这样可以利用邻接节点的信息来更新每个节点的特征表示。整个过程以类似于卷积操作的方式进行,因此称为图卷积。GCN在每一层都会利用节点的邻接节点来计算,以获得更丰富的图结构信息。

我们可以发现,GCN和GAT都是这样子,把邻点的特征聚合到中心顶点上,区别就是GCN利用了拉普拉斯矩阵,而GAT用的是attention系数。效果显然是GAT更强,因为顶点特征的相关性被更好地融入到了模型中

不同于GCN,图注意力网络(GAT)在计算时不仅考虑邻接点,还会对所有节点进行注意力加权。GAT通过使用注意力机制,为每个邻接节点分配一个权重,代表其对中心节点的重要程度。这意味着GAT在计算时不仅仅依赖于邻接节点的特征,还会对所有节点的特征进行加权聚合。

具体来说,GAT将每个节点的特征与其邻接节点的特征通过注意力权重进行加权求和。注意力权重是通过计算节点之间的相似度得到的,然后通过softmax函数进行归一化,确保加权后的权重和为1。这样,每个节点都可以利用所有节点的信息来更新自己的特征表示。每个节点都具有相应的注意力权重,这使得GAT能够学习到不同节点之间的重要性差异,从而更精准地利用节点间的关系。

因此,GAT是一种更灵活和精确的图神经网络模型,相比于GCN,它能够更好地捕捉节点之间的复杂关系。

基于pytorch实现GAT模型

数据集:Cora

 GAT模型的构建

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

class GraphAttentionLayer(nn.Module):
    def __init__(self, in_features, out_features, dropout, alpha, concat=True):
        super(GraphAttentionLayer, self).__init__()
        self.dropout = dropout
        self.in_features = in_features
        self.out_features = out_features
        self.alpha = alpha
        self.concat = concat

        self.W = nn.Parameter(torch.empty(size=(in_features, out_features)))
        nn.init.xavier_uniform_(self.W.data, gain=1.414)
        self.a = nn.Parameter(torch.empty(size=(2*out_features, 1))) # 尝试:"1"应该换为一个超参数t
        nn.init.xavier_uniform_(self.a.data, gain=1.414)
        self.leakyrelu = nn.LeakyReLU(self.alpha)

    def forward(self, h, adj):
        Wh = torch.mm(h, self.W) # h.shape:(N, in_features), Wh.shape:(N, out_features)
        e = self._prepare_attentional_mechanism_input(Wh) # 为什么要用广播机制?
        zero_vec = -9e15*torch.ones_like(e)
        attention = torch.where(adj>0, e, zero_vec)
        attention = F.softmax(attention, dim=1)
        attention = F.dropout(attention, self.dropout, training=self.training)
        h_prime = torch.matmul(attention, Wh)
        if self.concat:
            return F.elu(h_prime)
        else:
            return h_prime

    def _prepare_attentional_mechanism_input(self, Wh):
        # Wh.shape (N, out_feature)
        # self.a.shape (2 * out_feature, 1)
        # Wh1&2.shape (N, 1)
        # e.shape (N, N)
        Wh1 = torch.matmul(Wh, self.a[:self.out_features, :])
        Wh2 = torch.matmul(Wh, self.a[self.out_features:, :])
        # a[:self.out_features, :]用于被处理的节点i
        # a[self.out_features:, :]用于节点i邻域内的节点j
        e = Wh1 * Wh2.T #broadcast add 注:源代码用的是"+" 尝试:对比二者效果
        # e为N个节点中任意两个节点之间的相关度组成的矩阵(N*N)
        return self.leakyrelu(e)

    def __repr__(self):
        return self.__class__.__name__+"("+str(self.in_features)+"->"+str(self.out_features)+")"

    
class GAT(nn.Module):
    def __init__(self, nfeat, nhid, nclass, dropout, alpha, nheads):
        super(GAT, self).__init__()
        self.dropout = dropout
        self.attentions = [GraphAttentionLayer(nfeat, nhid, dropout=dropout, alpha=alpha, concat=True) for _ in range(nheads)]
        for i, attention in enumerate(self.attentions):
            self.add_module("attention_{}".format(i), attention)
        self.out_att = GraphAttentionLayer(nhid * nheads, nclass, dropout=dropout, alpha=alpha, concat=False)
    def forward(self, x, adj):
        x = F.dropout(x, self.dropout, training = self.training)
        x = torch.cat([att(x, adj) for att in self.attentions], dim=1) # 每个attention的输出维度为8, 8个attention拼接即得64维
        x = F.dropout(x, self.dropout, training=self.training)
        x = F.elu(self.out_att(x, adj)) # out_att的输入维度为64, 输出维度为7, 即种类数
        return F.log_softmax(x, dim=1)

数据处理

import torch
import numpy as np
import scipy.sparse as sp

def encode_onehot(labels):
    classes = sorted(list(set(labels)))
    classes_dict = {c:np.identity(len(classes))[i, :] for i, c in enumerate(classes)}
    labels_onehot = np.array(list(map(classes_dict.get, labels)), dtype=np.int32)
    return labels_onehot

def load_data(path="./cora/", dataset="cora"):
    print("Loading {} dataset...".format(dataset))
    idx_features_labels = np.genfromtxt("{}{}.content".format(path, dataset), dtype=np.dtype(str))
    features = sp.csr_matrix(idx_features_labels[:, 1:-1], dtype=np.float32)
    labels = encode_onehot(idx_features_labels[:, -1])

    idx = np.array(idx_features_labels[:, 0], dtype=np.int32)
    idx_map = {j:i for i, j in enumerate(idx)}
    edges_unordered = np.genfromtxt("{}{}.cites".format(path, dataset), dtype=np.int32)
    edges = np.array(list(map(idx_map.get, edges_unordered.flatten())), dtype=np.int32).reshape(edges_unordered.shape)
    adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])), shape=(labels.shape[0], labels.shape[0]), dtype=np.float32)
    adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)
    features = normalize_features(features)
    adj = normalize_adj(adj+sp.eye(adj.shape[0]))

    idx_train = range(140)
    idx_val = range(200, 500)
    idx_test = range(500, 1500)
    adj = torch.FloatTensor(np.array(adj.todense())) # 区别于GCN
    features = torch.FloatTensor(np.array(features.todense()))
    labels = torch.LongTensor(np.where(labels)[1])
    idx_train = torch.LongTensor(idx_train) # 此行与下面两行未注释的语句可省略
    idx_val = torch.LongTensor(idx_val)
    idx_test = torch.LongTensor(idx_test)
    return adj, features, labels, idx_train, idx_val, idx_test

def normalize_features(mx):
    rowsum = np.array(mx.sum(1))
    r_inv = np.power(rowsum, -1).flatten()
    r_inv[np.isinf(r_inv)] = 0
    r_mat_inv = sp.diags(r_inv)
    mx = r_mat_inv.dot(mx)
    return mx

def normalize_adj(mx):
    rowsum = np.array(mx.sum(1))
    r_inv_sqrt = np.power(rowsum, -0.5).flatten()
    r_inv_sqrt[np.isinf(r_inv_sqrt)] = 0.
    r_mat_inv_sqrt = sp.diags(r_inv_sqrt)
    # return mx.dot(r_mat_inv_sqrt).transpose().dot(r_mat_inv_sqrt)
    mid = np.dot(r_mat_inv_sqrt, mx)
    return np.dot(mid, r_mat_inv_sqrt)

def sparse_mx_to_torch_sparse_tensor(sparse_mx):
    sparse_mx = sparse_mx.tocoo().astype(np.float32)
    indices = torch.from_numpy(np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64))
    values = torch.from_numpy(sparse_mx.data)
    shape = sparse_mx.shape
    return torch.sparse.FloatTensor(indices, values, shape)

训练部分

import os
import glob
import time
import random
import argparse
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.autograd import Variable
from GAT_model import GAT
from load_data import load_data

def train(epoch, model, optimizer, features, adj, labels, idx_train, idx_val, fastmode):
    t = time.time()
    model.train()
    optimizer.zero_grad()
    output = model(features, adj)
    loss_train = F.nll_loss(output[idx_train], labels[idx_train])
    acc_train = accuracy(output[idx_train], labels[idx_train])
    loss_train.backward()
    optimizer.step()

    if not fastmode:
        model.eval()
        output = model(features, adj)
    loss_val = F.nll_loss(output[idx_val], labels[idx_val])
    acc_val = accuracy(output[idx_val], labels[idx_val])
    print('Epoch: {:04d}'.format(epoch+1),
          'loss_train: {:.4f}'.format(loss_train.data.item()),
          'acc_train: {:.4f}'.format(acc_train.data.item()),
          'loss_val: {:.4f}'.format(loss_val.data.item()),
          'acc_val: {:.4f}'.format(acc_val.data.item()),
          'time: {:.4f}s'.format(time.time() - t))
    return loss_val.data.item()

def test(model, features, adj, labels, idx_test):
    model.eval()
    output = model(features, adj)
    loss_test = F.nll_loss(output[idx_test], labels[idx_test])
    acc_test = accuracy(output[idx_test], labels[idx_test])
    print("Test set results:",
          "loss= {:.4f}".format(loss_test.item()),
          "accuracy= {:.4f}".format(acc_test.item()))

def accuracy(output, labels):
    preds = output.max(1)[1].type_as(labels).reshape(labels.shape)
    correct = preds.eq(labels).double()
    correct = correct.sum()
    return correct / len(labels)

def cora_train():
    parser = argparse.ArgumentParser()
    parser.add_argument("--no_cuda", action="store_true", default=False, help="Disables CUDA training.")
    parser.add_argument("--fastmode", action="store_true", default=False, help="Validate during training pass.")
    parser.add_argument("--seed", type=int, default=42, help="Random seed.")
    parser.add_argument("--epochs", type=int, default=100, help="Number of epochs to train.")
    parser.add_argument("--lr", type=float, default=0.005, help="Initial learning rate.")
    parser.add_argument("--weight_decay", type=float, default=5e-4, help="Weight dacay (L2 loss on parameters).")
    parser.add_argument("--hidden", type=int, default=8, help="Number of hidden units.")
    parser.add_argument("--nb_heads", type=int, default=8, help="Number of head attentions.")
    parser.add_argument("--dropout", type=float, default=0.6, help="Dropout rate (1 - keep probability).")
    parser.add_argument("--alpha", type=float, default=0.2, help="Alpha for the leaky_relu.")
    parser.add_argument("--patience", type=int, default=100, help="Patience")
    args = parser.parse_known_args()[0]
    args.cuda = not args.no_cuda and torch.cuda.is_available()
    random.seed(args.seed)
    np.random.seed(args.seed)
    torch.manual_seed(args.seed)
    if args.cuda:
        torch.cuda.manual_seed(args.seed)

    #下面一段二者一致
    adj, features, labels, idx_train, idx_val, idx_test = load_data()
    model = GAT(nfeat=features.shape[1], nhid=args.hidden, nclass=int(labels.max()) + 1, dropout=args.dropout,
                nheads=args.nb_heads, alpha=args.alpha)
    optimizer = optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
    if args.cuda:
        model.cuda()
        features = features.cuda()
        adj = adj.cuda()
        labels = labels.cuda()
        idx_train = idx_train.cuda()
        idx_val = idx_val.cuda()
        idx_test = idx_test.cuda()

    # features, adj, labels = Variable(features), Variable(adj), Variable(labels)

    t_total = time.time()
    loss_values = []
    bad_counter = 0
    best = args.epochs + 1
    best_epoch = 0
    for epoch in range(args.epochs):
        loss_values.append(train(epoch=epoch, model=model, optimizer=optimizer, features=features, adj=adj, labels=labels, idx_train=idx_train, idx_val=idx_val, fastmode=args.fastmode))
        torch.save(model.state_dict(), "GAT.{}.pkl".format(epoch))
        if loss_values[-1] < best:
            best = loss_values[-1]
            best_epoch = epoch
            bad_counter = 0
        else:
            bad_counter += 1
        if bad_counter == args.patience:
            break

        files = glob.glob("*.pkl")
        for file in files:
            epoch_nb = int(file.split(".")[1])
            if epoch_nb < best_epoch:
                os.remove(file)

    files = glob.glob('*.pkl')
    for file in files:
        epoch_nb = int(file.split('.')[1])
        if epoch_nb > best_epoch:
            os.remove(file)

    test(model=model, features=features, adj=adj, labels=labels, idx_test=idx_test)

if __name__ =="__main__":
    cora_train()

部分运行结果展示(Jupyter运行)

参考注明

向往的GAT(图注意力网络的原理、实现及计算复杂度)

对cora数据集不了解的传送门 

      Cora数据集介绍 - 知乎 (zhihu.com)

代码出自  

                   ZhiyangLiang/GraphSAGE-GCN-GAT (github.com)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值