PyG下载、处理、探索Cora、Citeseer、Pubmed数据集【PyTorch geometric】

最新推荐文章于 2024-07-02 09:27:56 发布

智慧的旋风

最新推荐文章于 2024-07-02 09:27:56 发布

阅读量6.9k

点赞数 8

分类专栏： PyG学习与应用文章标签： python PyTorch PyG 图神经网络

本文链接：https://blog.csdn.net/weixin_41650348/article/details/112754933

版权

发现PyG已经有了封装好的数据加载、预处理模块了。感觉自己之前处理Cora、Citeseer、Pubmed都白搞了。所以现在我决定站在巨人的肩膀上😂，PyG大法好啊！

参考资料：https://pytorch-geometric.readthedocs.io/en/latest/notes/introduction.html
需要的第三方库

torch
torch_geometric

我的代码：https://github.com/ytchx1999/GNN-Dataset/blob/main/Citation.ipynb

from torch_geometric.datasets import Planetoid
import torch

1.Cora数据集的处理

1.1 下载数据集

# 下载并保存预处理的数据集
dataset_cora = Planetoid(root='./cora/', name='Cora')

Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!

# 打印数据集
print(dataset_cora)

Cora()

1.2 法一：使用[0]方式从dataset中提取data

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cpu

# 提取data，并转换为device格式
data_cora = dataset_cora[0].to(device)
# 打印dataset的属性
print(dataset_cora.num_classes)  # 标签的类别数量
print(dataset_cora.num_node_features)  # 节点特征的维度
print(len(dataset_cora))  # 数据集图的个数
# 打印data
print(data_cora)

7
1433
1
Data(edge_index=[2, 10556], test_mask=[2708], train_mask=[2708], val_mask=[2708], x=[2708, 1433], y=[2708])

1.3 查看data的各项属性

# 提取各项属性
x = data_cora.x  # 节点的特征矩阵[N,input_dim]
edge_index

最低0.47元/天解锁文章

智慧的旋风

关注

8
点赞
踩
52

收藏

觉得还不错? 一键收藏
4
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录