0.import torch_geometric 的Data 查看_冬炫的博客-CSDN博客_import torch_geometric
1. import torch_geometric 加载一些常见数据集_冬炫的博客-CSDN博客_torch_geometric 数据集
2. torch_geometric mini batch 的那些事_冬炫的博客-CSDN博客
Common Benchmark Datasets
当然每个库都喜欢自己加载一些常用的数据集,比如图像分类的某些库会加载Mnist 数据集,也不用你特意从网站下载。
当然torch_geometric 可以加载一些常见的公开的图数据集比如
all Planetoid datasets (Cora, Citeseer, Pubmed),
all graph classification datasets from http://graphkernels.cs.tu-dortmund.de and their cleaned versions, the QM7 and QM9 dataset,
3D mesh/point cloud datasets like FAUST, ModelNet10/40 and ShapeNet.
加载 ENZYMES dataset (consisting of 600 graphs within 6 classes)
例子①
from torch_geometric.datasets import TUDataset
dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES')
>>> ENZYMES(600)
len(dataset)
>>> 600
dataset.num_classes
>>> 6
dataset.num_node_features
>>> 3
We now have access to all 600 graphs in the dataset:
data = dataset[0]
>>> Data(edge_index=[2, 168], x=[37, 3], y=[1])
data.is_undirected()
>>> True
这是一个168个对称边,37个节点,每个节点的特征维度是3个,只有一个图的类型标签,为1.
我们也可以用切片获取多个图。
创建一个 90/10 train/test split
例子②
train_dataset = dataset[:540]
>>> ENZYMES(540)
test_dataset = dataset[540:]
>>> ENZYMES(60)
手动重洗数据集
dataset = dataset.shuffle()
>>> ENZYMES(600)
等价的做法:
perm = torch.randperm(len(dataset))
dataset = dataset[perm]
>> ENZYMES(600)
下载 Cora, 此数据集为半监督的图节点分类任务:
from torch_geometric.datasets import Planetoid
dataset = Planetoid(root='/tmp/Cora', name='Cora')
>>> Cora()
len(dataset)
>>> 1
dataset.num_classes
>>> 7
dataset.num_node_features
>>> 1433
可以看出来这个图数据集就一张图,一张引用图
data = dataset[0]
>>> Data(edge_index=[2, 10556], test_mask=[2708],
train_mask=[2708], val_mask=[2708], x=[2708, 1433], y=[2708])
data.is_undirected()
>>> True
data.train_mask.sum().item()
>>> 140
data.val_mask.sum().item()
>>> 500
data.test_mask.sum().item()
>>> 1000
总共2708个节点,每个节点都有一个标签。节点特征1433维,test_mask train_mask val_mask 应该是二值bool 值,标注哪个被选中。
attributes: train_mask
, val_mask
and test_mask
, where
-
train_mask
denotes against which nodes to train (140 nodes), -
val_mask
denotes which nodes to use for validation, e.g., to perform early stopping (500 nodes), -
test_mask
denotes against which nodes to test (1000 nodes).