参考资料
- https://docs.dgl.ai/en/latest/guide/minibatch.html
- https://docs.dgl.ai/en/latest/guide/minibatch-node.html#guide-minibatch-node-classification-sampler
概述
之前学习的训练图神经网络的方法,都是在整个图上进行的。对于大型图,图的节点或边是百万级、亿级的。假设一个 L L L层的GCN,隐藏状态(hidden state)的维度为 H H H,图的节点数量为 N N N,那么只是存储中间的隐藏状态向量就需要 O ( N L H ) O\left ( NLH\right ) O(NLH)的空间,很容易超过显存。
类比传统的mini-batch训练方式,我们一次只对部分节点进行训练,那么就只需要将与这些节点与其 L L L阶邻居送入GPU,而不用将所有节点的特征都送入GPU。
邻居节点采样(Neighborhood Sampling)
刚才提到,我们在mini-batch训练中,一次只求batch_size个节点在第 L L L层的输出。为了得到这些节点第 L L L层的表示,又需要从它们在第 L − 1 L-1 L−1层的邻居得到,以此类推,直到到达第0层,即我们的输入特征。
如下图所示,例如我们要求第2层的节点8的表示,我们需要得到它第1层的邻居的表示,而第一层邻居的表示,又需要第0层邻居的表示。如此,我们就得到了一个子图,现在只需要在这个子图上进行消息传递就可以得到节点8在第二层的表示。可以类别递归问题,从我们的目标节点8,逐层递归到求它的邻居、它邻居的邻居的表示,最后在第0层达到递归终止条件,于是又逐层传递回来。
而邻居节点采样,就是指,在上述过程中,我们可以不用每层把节点所有邻居都纳入我们的子图,而可以按照一些策略进行采样。DGL提供了一些采样方法,以后再展开讲。
随机训练之同构图上的节点分类
要将以前那些在全图上的训练模型,改为随机训练模型,只需三步:
- 定义一个邻居采样器;
- 调整模型
- 修改训练循环
定义邻居采样器
这里使用DGL内置的最简单的采样器“MultiLayerFullNeighborSampler”,其实就是不采样,训练将使用所有的邻居。内置采样器要搭配“NodeDataLoader”使用。下面“MultiLayerFullNeighborSampler(2)”中的参数“2”表示有两层GCN。
这里使用数据集“Citeseer”;随机选择1000个节点作为训练集;batch size设为256;“num_workers”指加载数据时的进程数,Windows系统不可用。
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataset = dgl.data.CiteseerGraphDataset()
g = dataset[0]
train_nids = np.random.choice(np.arange(g.num_nodes()), (1000,), replace=False)
dataloader = dgl.dataloading.NodeDataLoader(
g, train_nids, sampler,
batch_size=256,
shuffle=True,
drop_last=False,
num_workers=False)
“dataloader”会返回三个部分。第一部分是由输入节点ID(第0层)组成的张量,第二部分是输出节点ID(第 L L L层,本例中指第2层)组成的张量,第三部分是blocks,即每层消息传递对应的子图。blocks[0]、blocks[1]… … 依次对应第0层到第1层的子图、第1层到第2层的子图… …。所以blocks一共有 L L L个。
input_nodes, output_nodes, blocks = next(iter(dataloader))
print(blocks)
调整模型
下面是修改前的针对全图的训练模型。
class TwoLayerGCN(nn.Module):
def __init__(self, in_features, hidden_features, out_features):
super().__init__()
self.conv1 = dglnn.GraphConv(in_features, hidden_features)
self.conv2 = dglnn.GraphConv(hidden_features, out_features)
def forward(self, g, x):
x = F.relu(self.conv1(g, x))
x = F.relu(self.conv2(g, x))
return x
要将其调整为适应随机训练的模型,只需要将原来的全图“g”替换为子图“blocks[0]”、“blocks[1]”。
class StochasticTwoLayerGCN(nn.Module):
def __init__(self, in_features, hidden_features, out_features):
super().__init__()
self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)
def forward(self, blocks, x):
x = F.relu(self.conv1(blocks[0], x))
x = F.relu(self.conv2(blocks[1], x))
return x
修改训练循环(CPU版)
如果理解了第一步中的blocks,那么训练部分的修改就很好理解。blocks[0]对应的是第0层到第1层的子图,那么输入特征就是blocks[0]的源节点特征;blocks[1]对应的是第1层到第2层的子图,那么真实标签就是blocks[1]目标节点对应的标签。
for input_nodes, output_nodes, blocks in dataloader:
input_features = blocks[0].srcdata['feat']
output_labels = blocks[-1].dstdata['label']
output_predictions = model(blocks, input_features)
loss = F.cross_entropy(output_predictions, output_labels)
opt.zero_grad()
loss.backward()
print('loss: ', loss.item())
opt.step()
修改训练循环(GPU版)
与CPU版相比,就多了将模型和子图送入GPU的步骤。
model = model.cuda()
opt = torch.optim.Adam(model.parameters())
for input_nodes, output_nodes, blocks in dataloader:
blocks = [b.to(torch.device('cuda')) for b in blocks]
input_features = blocks[0].srcdata['feat']
output_labels = blocks[-1].dstdata['label']
output_predictions = model(blocks, input_features)
loss = compute_loss(output_labels, output_predictions)
opt.zero_grad()
loss.backward()
opt.step()
随机训练之异构图上的节点分类
定义邻居采样器
还是使用“跟着官方文档学DGL框架第八天”中人工构建的异构图数据集。
依然可以使用DGL内置的采样器,并搭配“NodeDataLoader”使用,与同构图的区别在于,训练集的节点需要以字典的形式给出:键为节点类型,值为节点ID。这里为了简便,训练集只包含了“user”节点,但注意采样后的子图还是有两种类型的节点的。
n_users = 1000
n_items = 500
n_follows = 3000
n_clicks = 5000
n_dislikes = 500
n_hetero_features = 10
n_user_classes = 5
n_max_clicks = 10
follow_src = np.random.randint(0, n_users, n_follows)
follow_dst = np.random.randint(0, n_users, n_follows)
click_src = np.random.randint(0, n_users, n_clicks)
click_dst = np.random.randint(0, n_items, n_clicks)
dislike_src = np.random.randint(0, n_users, n_dislikes)
dislike_dst = np.random.randint(0, n_items, n_dislikes)
hetero_graph = dgl.heterograph({
('user', 'follow', 'user'): (follow_src, follow_dst),
('user', 'followed-by', 'user'): (follow_dst, follow_src),
('user', 'click', 'item'): (click_src, click_dst),
('item', 'clicked-by', 'user'): (click_dst, click_src),
('user', 'dislike', 'item'): (dislike_src, dislike_dst),
('item', 'disliked-by', 'user'): (dislike_dst, dislike_src)})
hetero_graph.nodes['user'].data['feat'] = torch.randn(n_users, n_hetero_features)
hetero_graph.nodes['item'].data['feat'] = torch.randn(n_items, n_hetero_features)
hetero_graph.nodes['user'].data['label'] = torch.randint(0, n_user_classes, (n_users,))
hetero_graph.edges['click'].data['label'] = torch.randint(1, n_max_clicks, (n_clicks,)).float()
g = hetero_graph
train_nid_dict = {'user': np.random.choice(np.arange(n_users), (500, ), replace=False)}
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader = dgl.dataloading.NodeDataLoader(
g, train_nid_dict, sampler,
batch_size=256,
shuffle=True,
drop_last=False,
num_workers=False)
调整模型
与同构图一样,将原来的“g”替换为“blocks”即可。
class StochasticTwoLayerRGCN(nn.Module):
def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
super().__init__()
self.conv1 = dglnn.HeteroGraphConv({
rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
for rel in rel_names
})
self.conv2 = dglnn.HeteroGraphConv({
rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
for rel in rel_names
})
def forward(self, blocks, x):
x = self.conv1(blocks[0], x)
x = self.conv2(blocks[1], x)
return x
修改训练循环(CPU版)
与同构图还是没有太大区别,只是需要注意输入特征“input_features”和模型输出“output_labels”都是字典:键为类型,值为特征。由于模型输出是字典类型,所以在编写损失函数时需要为每类节点单独编写。这里为了简便,训练集只使用了“user”类型的节点,所以输出也取出“user”对应的输出来计算损失。
opt = torch.optim.Adam(model.parameters())
for input_nodes, output_nodes, blocks in dataloader:
input_features = blocks[0].srcdata['feat']
output_labels = blocks[-1].dstdata['label']
output_predictions = model(blocks, input_features)
loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
opt.zero_grad()
loss.backward()
print('loss: ', loss.item())
opt.step()
修改训练循环(GPU版)
model = model.cuda()
opt = torch.optim.Adam(model.parameters())
for input_nodes, output_nodes, blocks in dataloader:
blocks = [b.to(torch.device('cuda')) for b in blocks]
input_features = blocks[0].srcdata['feat']
output_labels = blocks[-1].dstdata['label']
output_predictions = model(blocks, input_features)
loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
opt.zero_grad()
loss.backward()
opt.step()
完整代码
随机训练之同构图上的节点分类
import dgl
import dgl.nn as dglnn
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataset = dgl.data.CiteseerGraphDataset()
g = dataset[0]
train_nids = np.random.choice(np.arange(g.num_nodes()), (1000,), replace=False)
dataloader = dgl.dataloading.NodeDataLoader(
g, train_nids, sampler,
batch_size=256,
shuffle=True,
drop_last=False,
num_workers=False)
class StochasticTwoLayerGCN(nn.Module):
def __init__(self, in_features, hidden_features, out_features):
super().__init__()
self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)
def forward(self, blocks, x):
x = F.relu(self.conv1(blocks[0], x))
x = F.relu(self.conv2(blocks[1], x))
return x
in_features = g.ndata['feat'].shape[1]
hidden_features = 100
out_features = dataset.num_labels
model = StochasticTwoLayerGCN(in_features, hidden_features, out_features)
# model = model.cuda()
# opt = torch.optim.Adam(model.parameters())
# for input_nodes, output_nodes, blocks in dataloader:
# blocks = [b.to(torch.device('cuda')) for b in blocks]
# input_features = blocks[0].srcdata['feat']
# output_labels = blocks[-1].dstdata['label']
# output_predictions = model(blocks, input_features)
# loss = compute_loss(output_labels, output_predictions)
# opt.zero_grad()
# loss.backward()
# opt.step()
opt = torch.optim.Adam(model.parameters())
for input_nodes, output_nodes, blocks in dataloader:
input_features = blocks[0].srcdata['feat']
output_labels = blocks[-1].dstdata['label']
output_predictions = model(blocks, input_features)
loss = F.cross_entropy(output_predictions, output_labels)
opt.zero_grad()
loss.backward()
print('loss: ', loss.item())
opt.step()
随机训练之异构图上的节点分类
import dgl
import dgl.nn as dglnn
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
n_users = 1000
n_items = 500
n_follows = 3000
n_clicks = 5000
n_dislikes = 500
n_hetero_features = 10
n_user_classes = 5
n_max_clicks = 10
follow_src = np.random.randint(0, n_users, n_follows)
follow_dst = np.random.randint(0, n_users, n_follows)
click_src = np.random.randint(0, n_users, n_clicks)
click_dst = np.random.randint(0, n_items, n_clicks)
dislike_src = np.random.randint(0, n_users, n_dislikes)
dislike_dst = np.random.randint(0, n_items, n_dislikes)
hetero_graph = dgl.heterograph({
('user', 'follow', 'user'): (follow_src, follow_dst),
('user', 'followed-by', 'user'): (follow_dst, follow_src),
('user', 'click', 'item'): (click_src, click_dst),
('item', 'clicked-by', 'user'): (click_dst, click_src),
('user', 'dislike', 'item'): (dislike_src, dislike_dst),
('item', 'disliked-by', 'user'): (dislike_dst, dislike_src)})
hetero_graph.nodes['user'].data['feat'] = torch.randn(n_users, n_hetero_features)
hetero_graph.nodes['item'].data['feat'] = torch.randn(n_items, n_hetero_features)
hetero_graph.nodes['user'].data['label'] = torch.randint(0, n_user_classes, (n_users,))
hetero_graph.edges['click'].data['label'] = torch.randint(1, n_max_clicks, (n_clicks,)).float()
g = hetero_graph
train_nid_dict = {'user': np.random.choice(np.arange(n_users), (500, ), replace=False)}
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader = dgl.dataloading.NodeDataLoader(
g, train_nid_dict, sampler,
batch_size=256,
shuffle=True,
drop_last=False,
num_workers=False)
class StochasticTwoLayerRGCN(nn.Module):
def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
super().__init__()
self.conv1 = dglnn.HeteroGraphConv({
rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
for rel in rel_names
})
self.conv2 = dglnn.HeteroGraphConv({
rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
for rel in rel_names
})
def forward(self, blocks, x):
x = self.conv1(blocks[0], x)
x = self.conv2(blocks[1], x)
return x
in_features = n_hetero_features
hidden_features = 100
out_features = n_user_classes
model = StochasticTwoLayerRGCN(in_features, hidden_features, out_features, g.etypes)
# model = model.cuda()
# opt = torch.optim.Adam(model.parameters())
# for input_nodes, output_nodes, blocks in dataloader:
# blocks = [b.to(torch.device('cuda')) for b in blocks]
# input_features = blocks[0].srcdata['feat']
# output_labels = blocks[-1].dstdata['label']
# output_predictions = model(blocks, input_features)
# loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
# opt.zero_grad()
# loss.backward()
# opt.step()
opt = torch.optim.Adam(model.parameters())
for input_nodes, output_nodes, blocks in dataloader:
input_features = blocks[0].srcdata['feat']
output_labels = blocks[-1].dstdata['label']
output_predictions = model(blocks, input_features)
loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
opt.zero_grad()
loss.backward()
print('loss: ', loss.item())
opt.step()