超大图上的节点表征学习
开源学习地址:datawhale
1.Cluster-GCN实践
Cluster-GCN的提出是为了在最大限度地提高表征利用率的同时,使模型有较好的收敛结果。具体代码如下:
import torch
import torch.nn.functional as F
from torch.nn import ModuleList
from tqdm import tqdm
from torch_geometric.datasets import Reddit, Reddit2
from torch_geometric.data import ClusterData, ClusterLoader, NeighborSampler
from torch_geometric.nn import SAGEConv
dataset = Reddit('dataset/Reddit')
# dataset = Reddit2('dataset/Reddit2')
data = dataset[0]
#图节点聚类与数据加载器生成
cluster_data = ClusterData(data, num_parts=1500, recursive=False,
save_dir=dataset.processed_dir)
#通过修改num_part,可将数据集切分成不同数量的簇
train_loader = ClusterLoader(cluster_data, batch_size=20, shuffle=True,
num_workers=12)
subgraph_loader = NeighborSampler(data.edge_index, sizes=[-1], batch_size=1024,
shuffle=False, num_workers=12)
#图神经网络的构建
class Net(torch.nn.Module