pytorch|图卷积神经网络(GCN)与图注意力(GAT)在cora数据集的应用

该文实现了GCN和GAT在Cora数据集上的图卷积网络,通过训练和可视化展示了两种模型的性能。在训练过程中,记录并绘制了损失和准确率的变化,同时使用TSNE进行二维投影,以颜色区分不同类别,直观地呈现了分类结果。此外,还对比了两种模型的分类性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文实现的两层GCN与GAT在cora数据集上效果,并可视化展示。

1.导入的相关包

import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import pandas as pd
import torch
from torch._C import parse_ir
import torch_geometric
from torch_geometric import datasets
from torch_geometric.data import Data, InMemoryDataset
from torch_geometric.datasets import Planetoid
import torch.nn.functional as F
from torch_geometric.nn import GCNConv,GATConv # GCN
from sklearn.manifold import TSNE

2.数据处理部分

path = "E:\\download\\cora\\cora\\"
cites = path + "cora.cites"
content = path + "cora.content"

# 索引字典,将原本的论文id转换到从0开始编码
index_dict = dict()
# 标签字典,将字符串标签转化为数值
label_to_index = dict()

features = []
labels = []
edge_index = []

with open(content,"r") as f:
    nodes = f.readlines()
    for node in nodes:
        node_info = node.split()
        index_dict[int(node_info[0])] = len(index_dict)
        features.append([int(i) for i in node_info[1:-1]])
        
        label_str = node_info[-1]
        if(label_str not in label_to_index.keys()):
            label_to_index[label_str] = len(label_to_index)
        labels.append(label_to_index[label_str])

with open(cites,"r") as f:
    edges = f.readlines()
    for edge in edges:
        start, end = edge.split()
        # 训练时将边视为无向的,但原本的边是有向的,因此需要正反添加两次
        edge_index.append([index_dict[int(start)],index_dict[int(end)]])
        edge_index.append([index_dict[int(end)],index_dict[int(start)]])

# 转换为Tensor
labels = torch.LongTensor(labels)
features = torch.FloatTensor(features)
# 行归一化
features = torch.nn.functional.normalize(features, p=1, dim=1)
edge_index =  torch.LongTensor(edge_index)

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') # 本电脑只有一个GPU

mask = torch.randperm(len(index_dict)) # 随机打乱顺序
train_mask = mask[:140]
val_mask = mask[140:640]
test_mask = mask[1708:2708]


cora = Data(x = features, edge_index = edge_index.t().contiguous(), y = labels).to(device)

3.GCN模块

class GCNNet(torch.nn.Module):
    def __init__(self, num_feature, num_label):
        super(GCNNet,self).__init__()
        self.GCN1 = GCNConv(num_feature, 16)
        self.GCN2 = GCNConv(16, num_label)  
        self.dropout = torch.nn.Dropout(p=0.5)
        
    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        
        x = self.GCN1(x, edge_index)
        x = F.relu(x)
        x = self.dropout(x)
        x = self.GCN2(x, edge_index)
        
        return F.log_softmax(x, dim=1)


4.GAT模块

class GATNet(torch.nn.Module):
    def __init__(self, num_feature, num_label):
        super(GATNet,self).__init__()
        self.GAT1 = GATConv(num_feature, 8, heads = 8, concat = True, dropout = 0.6)
        self.GAT2 = GATConv(8*8, num_label, dropout = 0.6)  
        
    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        
        x = self.GAT1(x, edge_index)
        x = F.relu(x)
        x = self.GAT2(x, edge_index)
        
        return F.log_softmax(x, dim=1)


5.画图及显示模块

def gcn_apply():
    model = GCNNet(features.shape[1], len(label_to_index)).to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

    acc_record = []
    loss_record = []

    for epoch in range(500):
        optimizer.zero_grad()
        out = model(cora)
        loss = F.nll_loss(out[train_mask], cora.y[train_mask])
        loss_record.append(loss.item())
        print('epoch: %d loss: %.4f' %(epoch, loss))
        loss.backward()
        optimizer.step()
        
        # if((epoch + 1)% 10 == 0):
        model.eval()
        _, pred = model(cora).max(dim=1)
        correct = int(pred[test_mask].eq(cora.y[test_mask]).sum().item())
        acc = correct / len(test_mask)
        acc_record.append(acc)
        print('Accuracy: {:.4f}'.format(acc))
        model.train()
            


    ts = TSNE(n_components=2)
    ts.fit_transform(out[test_mask].to('cpu').detach().numpy())

    x = ts.embedding_
    y = cora.y[test_mask].to('cpu').detach().numpy()

    xi = []
    for i in range(7):
        xi.append(x[np.where(y==i)])

    colors = ['mediumblue','green','red','yellow','cyan','mediumvioletred','mediumspringgreen']
    plt.figure(figsize=(8, 6))
    for i in range(7):
        plt.scatter(xi[i][:,0],xi[i][:,1],s=30,color=colors[i],marker='+',alpha=1)
    plt.show()

    plt.figure(figsize=(14, 9))
    x = range(500)

    plt.plot(x,acc_record,label='acc')
    plt.plot(x,loss_record,label='loss')
    plt.legend()
    plt.grid()
    plt.show()


def gat_apply():
    seed = 1
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)  
    np.random.seed(seed)  # Numpy module.
    # random.seed(seed)  # Python random module.
    torch.manual_seed(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
    model = GATNet(features.shape[1], len(label_to_index)).to(device)

    optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

    acc_record = []
    loss_record = []


    for epoch in range(500):
        optimizer.zero_grad()
        out = model(cora)
        loss = F.nll_loss(out[train_mask], cora.y[train_mask])
        loss_record.append(loss.item())
        print('epoch: %d loss: %.4f' %(epoch, loss))
        loss.backward()
        optimizer.step()
        
        # if((epoch + 1)% 10 == 0):
        model.eval()
        _, pred = model(cora).max(dim=1)
        correct = int(pred[test_mask].eq(cora.y[test_mask]).sum().item())
        acc = correct / len(test_mask)
        acc_record.append(acc)
        print('Accuracy: {:.4f}'.format(acc))
        model.train()
            


    ts = TSNE(n_components=2)
    ts.fit_transform(out[test_mask].to('cpu').detach().numpy())

    x = ts.embedding_
    y = cora.y[test_mask].to('cpu').detach().numpy()

    xi = []
    for i in range(7):
        xi.append(x[np.where(y==i)])

    colors = ['mediumblue','green','red','yellow','cyan','mediumvioletred','mediumspringgreen']
    plt.figure(figsize=(8, 6))
    for i in range(7):
        plt.scatter(xi[i][:,0],xi[i][:,1],s=30,color=colors[i],marker='+',alpha=1)
    plt.show()

    plt.figure(figsize=(14, 9))
    x = range(500)

    plt.plot(x,acc_record,label='acc')
    plt.plot(x,loss_record,label='loss')
    plt.legend()
    plt.grid()
    plt.show()

6.主函数

if __name__ == "__main__":
    gcn_apply()
    gat_apply()

7.整个网络的代码

import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import pandas as pd
import torch
from torch._C import parse_ir
import torch_geometric
from torch_geometric import datasets
from torch_geometric.data import Data, InMemoryDataset
from torch_geometric.datasets import Planetoid
import torch.nn.functional as F
from torch_geometric.nn import GCNConv,GATConv # GCN
from sklearn.manifold import TSNE

path = "E:\\download\\cora\\cora\\"
cites = path + "cora.cites"
content = path + "cora.content"

# 索引字典,将原本的论文id转换到从0开始编码
index_dict = dict()
# 标签字典,将字符串标签转化为数值
label_to_index = dict()

features = []
labels = []
edge_index = []

with open(content,"r") as f:
    nodes = f.readlines()
    for node in nodes:
        node_info = node.split()
        index_dict[int(node_info[0])] = len(index_dict)
        features.append([int(i) for i in node_info[1:-1]])
        
        label_str = node_info[-1]
        if(label_str not in label_to_index.keys()):
            label_to_index[label_str] = len(label_to_index)
        labels.append(label_to_index[label_str])

with open(cites,"r") as f:
    edges = f.readlines()
    for edge in edges:
        start, end = edge.split()
        # 训练时将边视为无向的,但原本的边是有向的,因此需要正反添加两次
        edge_index.append([index_dict[int(start)],index_dict[int(end)]])
        edge_index.append([index_dict[int(end)],index_dict[int(start)]])


class GCNNet(torch.nn.Module):
    def __init__(self, num_feature, num_label):
        super(GCNNet,self).__init__()
        self.GCN1 = GCNConv(num_feature, 16)
        self.GCN2 = GCNConv(16, num_label)  
        self.dropout = torch.nn.Dropout(p=0.5)
        
    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        
        x = self.GCN1(x, edge_index)
        x = F.relu(x)
        x = self.dropout(x)
        x = self.GCN2(x, edge_index)
        
        return F.log_softmax(x, dim=1)


class GATNet(torch.nn.Module):
    def __init__(self, num_feature, num_label):
        super(GATNet,self).__init__()
        self.GAT1 = GATConv(num_feature, 8, heads = 8, concat = True, dropout = 0.6)
        self.GAT2 = GATConv(8*8, num_label, dropout = 0.6)  
        
    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        
        x = self.GAT1(x, edge_index)
        x = F.relu(x)
        x = self.GAT2(x, edge_index)
        
        return F.log_softmax(x, dim=1)


# 为每个节点增加自环,但后续GCN层默认会添加自环,跳过即可
# for i in range(2708):
#     edge_index.append([i,i])
  
# 转换为Tensor
labels = torch.LongTensor(labels)
features = torch.FloatTensor(features)
# 行归一化
features = torch.nn.functional.normalize(features, p=1, dim=1)
edge_index =  torch.LongTensor(edge_index)

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') # 本电脑只有一个GPU

mask = torch.randperm(len(index_dict)) # 随机打乱顺序
train_mask = mask[:140]
val_mask = mask[140:640]
test_mask = mask[1708:2708]


cora = Data(x = features, edge_index = edge_index.t().contiguous(), y = labels).to(device)

def gcn_apply():
    model = GCNNet(features.shape[1], len(label_to_index)).to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

    acc_record = []
    loss_record = []

    for epoch in range(500):
        optimizer.zero_grad()
        out = model(cora)
        loss = F.nll_loss(out[train_mask], cora.y[train_mask])
        loss_record.append(loss.item())
        print('epoch: %d loss: %.4f' %(epoch, loss))
        loss.backward()
        optimizer.step()
        
        # if((epoch + 1)% 10 == 0):
        model.eval()
        _, pred = model(cora).max(dim=1)
        correct = int(pred[test_mask].eq(cora.y[test_mask]).sum().item())
        acc = correct / len(test_mask)
        acc_record.append(acc)
        print('Accuracy: {:.4f}'.format(acc))
        model.train()
            


    ts = TSNE(n_components=2)
    ts.fit_transform(out[test_mask].to('cpu').detach().numpy())

    x = ts.embedding_
    y = cora.y[test_mask].to('cpu').detach().numpy()

    xi = []
    for i in range(7):
        xi.append(x[np.where(y==i)])

    colors = ['mediumblue','green','red','yellow','cyan','mediumvioletred','mediumspringgreen']
    plt.figure(figsize=(8, 6))
    for i in range(7):
        plt.scatter(xi[i][:,0],xi[i][:,1],s=30,color=colors[i],marker='+',alpha=1)
    plt.show()

    plt.figure(figsize=(14, 9))
    x = range(500)

    plt.plot(x,acc_record,label='acc')
    plt.plot(x,loss_record,label='loss')
    plt.legend()
    plt.grid()
    plt.show()


def gat_apply():
    seed = 1
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)  
    np.random.seed(seed)  # Numpy module.
    # random.seed(seed)  # Python random module.
    torch.manual_seed(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
    model = GATNet(features.shape[1], len(label_to_index)).to(device)

    optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

    acc_record = []
    loss_record = []


    for epoch in range(500):
        optimizer.zero_grad()
        out = model(cora)
        loss = F.nll_loss(out[train_mask], cora.y[train_mask])
        loss_record.append(loss.item())
        print('epoch: %d loss: %.4f' %(epoch, loss))
        loss.backward()
        optimizer.step()
        
        # if((epoch + 1)% 10 == 0):
        model.eval()
        _, pred = model(cora).max(dim=1)
        correct = int(pred[test_mask].eq(cora.y[test_mask]).sum().item())
        acc = correct / len(test_mask)
        acc_record.append(acc)
        print('Accuracy: {:.4f}'.format(acc))
        model.train()
            


    ts = TSNE(n_components=2)
    ts.fit_transform(out[test_mask].to('cpu').detach().numpy())

    x = ts.embedding_
    y = cora.y[test_mask].to('cpu').detach().numpy()

    xi = []
    for i in range(7):
        xi.append(x[np.where(y==i)])

    colors = ['mediumblue','green','red','yellow','cyan','mediumvioletred','mediumspringgreen']
    plt.figure(figsize=(8, 6))
    for i in range(7):
        plt.scatter(xi[i][:,0],xi[i][:,1],s=30,color=colors[i],marker='+',alpha=1)
    plt.show()

    plt.figure(figsize=(14, 9))
    x = range(500)

    plt.plot(x,acc_record,label='acc')
    plt.plot(x,loss_record,label='loss')
    plt.legend()
    plt.grid()
    plt.show()

if __name__ == "__main__":
    gcn_apply()
    gat_apply()

8.相关图片

图片(1)程序运行截图
在这里插入图片描述
图片(2)GCN分类图
在这里插入图片描述
图片(3)GCN LOSS与ACC
在这里插入图片描述
图片(4)GAT分类可视化
在这里插入图片描述
图片(5)GAT LOSS与ACC
在这里插入图片描述

<think>好的,用户想寻找PyTorch相关的图神经网络库推荐。首先,我需要回忆一下常见的PyTorch生态中的图神经网络库。PyTorch Geometric(PyG)应该是最主要的,之前有看过相关资料,它专门为图数据处理设计,支持各种图神经网络模型,比如GCNGAT、GraphSAGE等。用户提供的引用中也提到了PyG,尤其是引用[1]、[2]、[3]都详细介绍了PyG的基础操作、教程和性能优势,比如比DGL快14倍,这可能是一个重要卖点。 接下来,我需要确认是否有其他库值得推荐。比如DGL也是一个流行的图神经网络库,但用户特别询问PyTorch相关的,而PyG是直接基于PyTorch的,可能更符合用户需求。此外,还有可能有其他库如Deep Graph Library(DGL)虽然支持多框架,但用户可能更倾向于纯PyTorch的解决方案。 然后,需要考虑用户可能的背景。他们可能在使用PyTorch进行深度学习项目,现在需要扩展到图结构数据,因此推荐库的易用性、文档、社区支持以及性能都是关键因素。PyG在这些方面表现良好,有丰富的文档和教程,如引用[2]提到的在线代码笔记和视频教程,这对新手很有帮助。 用户的问题中还提到需要生成相关问题,这可能包括库的安装、基本用法、性能比较、应用案例等。根据引用中的信息,PyG支持快速数据加载和高效的数据处理,适合处理大规模图数据,这些可以作为推荐的理由。此外,引用[3]提到PyG的速度优势,这也是一个重要的推荐点。 需要确保回答结构清晰,分点列出推荐库,并说明各自特点。同时,引用提供的资料来支持推荐,比如PyG的官方文档和社区资源。可能还需要对比其他库,但用户明确要求PyTorch相关,所以重点放在PyG上。 最后,生成的相关问题需要围绕用户可能进一步关心的方面,比如安装教程、模型实现、性能优化、实际应用等,确保覆盖用户潜在的需求。</think>PyTorch生态中广泛使用的图神经网络库推荐如下: ### 1. PyTorch Geometric(PyG) - **核心特点**:专为图数据处理设计,提供$O(1)$时间复杂度的稀疏矩阵运算,支持$GCN$、$GAT$、$GraphSAGE$等模型[^1] - **性能优势**:相比其他库(如DGL)在消息传递机制上快14倍[^3] - **数据格式**:内置`Data`和`Batch`对象,可直接处理图结构节点特征 ```python from torch_geometric.datasets import Planetoid dataset = Planetoid(root='/tmp/Cora', name='Cora') ``` - **应用场景**:社交网络分析、分子结构预测、推荐系统[^2] ### 2. Deep Graph Library(DGL) - **跨框架支持**:虽然原生支持MXNet/TensorFlow,但提供完整PyTorch后端 - **独特功能**:内置异构图神经网络组件,适合处理复杂关系数据 - **可视化工具**:提供图结构可视化模块 ### 推荐优先级 $$ \text{PyG} \succ \text{DGL} $$ 对PyTorch用户优先选择PyG,因其深度集成、性能优化和活跃社区[^3]
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

xiao黄

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值