跟着官方文档学DGL框架第十二天——大型图上的随机训练之节点分类

参考资料

  1. https://docs.dgl.ai/en/latest/guide/minibatch.html
  2. https://docs.dgl.ai/en/latest/guide/minibatch-node.html#guide-minibatch-node-classification-sampler

概述

之前学习的训练图神经网络的方法,都是在整个图上进行的。对于大型图,图的节点或边是百万级、亿级的。假设一个 L L L层的GCN,隐藏状态(hidden state)的维度为 H H H,图的节点数量为 N N N,那么只是存储中间的隐藏状态向量就需要 O ( N L H ) O\left ( NLH\right ) O(NLH)的空间,很容易超过显存。

类比传统的mini-batch训练方式,我们一次只对部分节点进行训练,那么就只需要将与这些节点与其 L L L阶邻居送入GPU,而不用将所有节点的特征都送入GPU。

邻居节点采样(Neighborhood Sampling)

刚才提到,我们在mini-batch训练中,一次只求batch_size个节点在第 L L L层的输出。为了得到这些节点第 L L L层的表示,又需要从它们在第 L − 1 L-1 L1层的邻居得到,以此类推,直到到达第0层,即我们的输入特征。

如下图所示,例如我们要求第2层的节点8的表示,我们需要得到它第1层的邻居的表示,而第一层邻居的表示,又需要第0层邻居的表示。如此,我们就得到了一个子图,现在只需要在这个子图上进行消息传递就可以得到节点8在第二层的表示。可以类别递归问题,从我们的目标节点8,逐层递归到求它的邻居、它邻居的邻居的表示,最后在第0层达到递归终止条件,于是又逐层传递回来。
在这里插入图片描述
而邻居节点采样,就是指,在上述过程中,我们可以不用每层把节点所有邻居都纳入我们的子图,而可以按照一些策略进行采样。DGL提供了一些采样方法,以后再展开讲。

随机训练之同构图上的节点分类

要将以前那些在全图上的训练模型,改为随机训练模型,只需三步:

  1. 定义一个邻居采样器;
  2. 调整模型
  3. 修改训练循环

定义邻居采样器

这里使用DGL内置的最简单的采样器“MultiLayerFullNeighborSampler”,其实就是不采样,训练将使用所有的邻居。内置采样器要搭配“NodeDataLoader”使用。下面“MultiLayerFullNeighborSampler(2)”中的参数“2”表示有两层GCN。

这里使用数据集“Citeseer”;随机选择1000个节点作为训练集;batch size设为256;“num_workers”指加载数据时的进程数,Windows系统不可用。

sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataset = dgl.data.CiteseerGraphDataset()
g = dataset[0]

train_nids = np.random.choice(np.arange(g.num_nodes()), (1000,), replace=False)
dataloader = dgl.dataloading.NodeDataLoader(
    g, train_nids, sampler,
    batch_size=256,
    shuffle=True,
    drop_last=False,
    num_workers=False)

“dataloader”会返回三个部分。第一部分是由输入节点ID(第0层)组成的张量,第二部分是输出节点ID(第 L L L层,本例中指第2层)组成的张量,第三部分是blocks,即每层消息传递对应的子图。blocks[0]、blocks[1]… … 依次对应第0层到第1层的子图、第1层到第2层的子图… …。所以blocks一共有 L L L个。

input_nodes, output_nodes, blocks = next(iter(dataloader))
print(blocks)

调整模型

下面是修改前的针对全图的训练模型。

class TwoLayerGCN(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super().__init__()
        self.conv1 = dglnn.GraphConv(in_features, hidden_features)
        self.conv2 = dglnn.GraphConv(hidden_features, out_features)

    def forward(self, g, x):
        x = F.relu(self.conv1(g, x))
        x = F.relu(self.conv2(g, x))
        return x

要将其调整为适应随机训练的模型,只需要将原来的全图“g”替换为子图“blocks[0]”、“blocks[1]”。

class StochasticTwoLayerGCN(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super().__init__()
        self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
        self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)

    def forward(self, blocks, x):
        x = F.relu(self.conv1(blocks[0], x))
        x = F.relu(self.conv2(blocks[1], x))
        return x

修改训练循环(CPU版)

如果理解了第一步中的blocks,那么训练部分的修改就很好理解。blocks[0]对应的是第0层到第1层的子图,那么输入特征就是blocks[0]的源节点特征;blocks[1]对应的是第1层到第2层的子图,那么真实标签就是blocks[1]目标节点对应的标签。

for input_nodes, output_nodes, blocks in dataloader:
    input_features = blocks[0].srcdata['feat']
    output_labels = blocks[-1].dstdata['label']
    output_predictions = model(blocks, input_features)
    loss = F.cross_entropy(output_predictions, output_labels)
    opt.zero_grad()
    loss.backward()
    print('loss: ', loss.item())
    opt.step()

修改训练循环(GPU版)

与CPU版相比,就多了将模型和子图送入GPU的步骤。

model = model.cuda()
opt = torch.optim.Adam(model.parameters())

for input_nodes, output_nodes, blocks in dataloader:
    blocks = [b.to(torch.device('cuda')) for b in blocks]
    input_features = blocks[0].srcdata['feat']
    output_labels = blocks[-1].dstdata['label']
    output_predictions = model(blocks, input_features)
    loss = compute_loss(output_labels, output_predictions)
    opt.zero_grad()
    loss.backward()
    opt.step()

随机训练之异构图上的节点分类

定义邻居采样器

还是使用“跟着官方文档学DGL框架第八天”中人工构建的异构图数据集。

依然可以使用DGL内置的采样器,并搭配“NodeDataLoader”使用,与同构图的区别在于,训练集的节点需要以字典的形式给出:键为节点类型,值为节点ID。这里为了简便,训练集只包含了“user”节点,但注意采样后的子图还是有两种类型的节点的。

n_users = 1000
n_items = 500
n_follows = 3000
n_clicks = 5000
n_dislikes = 500
n_hetero_features = 10
n_user_classes = 5
n_max_clicks = 10

follow_src = np.random.randint(0, n_users, n_follows)
follow_dst = np.random.randint(0, n_users, n_follows)
click_src = np.random.randint(0, n_users, n_clicks)
click_dst = np.random.randint(0, n_items, n_clicks)
dislike_src = np.random.randint(0, n_users, n_dislikes)
dislike_dst = np.random.randint(0, n_items, n_dislikes)

hetero_graph = dgl.heterograph({
    ('user', 'follow', 'user'): (follow_src, follow_dst),
    ('user', 'followed-by', 'user'): (follow_dst, follow_src),
    ('user', 'click', 'item'): (click_src, click_dst),
    ('item', 'clicked-by', 'user'): (click_dst, click_src),
    ('user', 'dislike', 'item'): (dislike_src, dislike_dst),
    ('item', 'disliked-by', 'user'): (dislike_dst, dislike_src)})

hetero_graph.nodes['user'].data['feat'] = torch.randn(n_users, n_hetero_features)
hetero_graph.nodes['item'].data['feat'] = torch.randn(n_items, n_hetero_features)
hetero_graph.nodes['user'].data['label'] = torch.randint(0, n_user_classes, (n_users,))
hetero_graph.edges['click'].data['label'] = torch.randint(1, n_max_clicks, (n_clicks,)).float()

g = hetero_graph
train_nid_dict = {'user': np.random.choice(np.arange(n_users), (500, ), replace=False)}
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader = dgl.dataloading.NodeDataLoader(
    g, train_nid_dict, sampler,
    batch_size=256,
    shuffle=True,
    drop_last=False,
    num_workers=False)

调整模型

与同构图一样,将原来的“g”替换为“blocks”即可。

class StochasticTwoLayerRGCN(nn.Module):
    def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
        super().__init__()
        self.conv1 = dglnn.HeteroGraphConv({
                rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
                for rel in rel_names
            })
        self.conv2 = dglnn.HeteroGraphConv({
                rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
                for rel in rel_names
            })

    def forward(self, blocks, x):
        x = self.conv1(blocks[0], x)
        x = self.conv2(blocks[1], x)
        return x

修改训练循环(CPU版)

与同构图还是没有太大区别,只是需要注意输入特征“input_features”和模型输出“output_labels”都是字典:键为类型,值为特征。由于模型输出是字典类型,所以在编写损失函数时需要为每类节点单独编写。这里为了简便,训练集只使用了“user”类型的节点,所以输出也取出“user”对应的输出来计算损失。

opt = torch.optim.Adam(model.parameters())

for input_nodes, output_nodes, blocks in dataloader:
    input_features = blocks[0].srcdata['feat']
    output_labels = blocks[-1].dstdata['label']
    output_predictions = model(blocks, input_features)
    loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
    opt.zero_grad()
    loss.backward()
    print('loss: ', loss.item())
    opt.step()

修改训练循环(GPU版)

model = model.cuda()
opt = torch.optim.Adam(model.parameters())

for input_nodes, output_nodes, blocks in dataloader:
    blocks = [b.to(torch.device('cuda')) for b in blocks]
    input_features = blocks[0].srcdata['feat']     
    output_labels = blocks[-1].dstdata['label']
    output_predictions = model(blocks, input_features)
    loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
    opt.zero_grad()
    loss.backward()
    opt.step()

完整代码

随机训练之同构图上的节点分类

import dgl
import dgl.nn as dglnn
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataset = dgl.data.CiteseerGraphDataset()
g = dataset[0]

train_nids = np.random.choice(np.arange(g.num_nodes()), (1000,), replace=False)
dataloader = dgl.dataloading.NodeDataLoader(
    g, train_nids, sampler,
    batch_size=256,
    shuffle=True,
    drop_last=False,
    num_workers=False)

class StochasticTwoLayerGCN(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super().__init__()
        self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
        self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)

    def forward(self, blocks, x):
        x = F.relu(self.conv1(blocks[0], x))
        x = F.relu(self.conv2(blocks[1], x))
        return x

in_features = g.ndata['feat'].shape[1]
hidden_features = 100
out_features = dataset.num_labels
model = StochasticTwoLayerGCN(in_features, hidden_features, out_features)
# model = model.cuda()
# opt = torch.optim.Adam(model.parameters())

# for input_nodes, output_nodes, blocks in dataloader:
#     blocks = [b.to(torch.device('cuda')) for b in blocks]
#     input_features = blocks[0].srcdata['feat']
#     output_labels = blocks[-1].dstdata['label']
#     output_predictions = model(blocks, input_features)
#     loss = compute_loss(output_labels, output_predictions)
#     opt.zero_grad()
#     loss.backward()
#     opt.step()

opt = torch.optim.Adam(model.parameters())

for input_nodes, output_nodes, blocks in dataloader:
    input_features = blocks[0].srcdata['feat']
    output_labels = blocks[-1].dstdata['label']
    output_predictions = model(blocks, input_features)
    loss = F.cross_entropy(output_predictions, output_labels)
    opt.zero_grad()
    loss.backward()
    print('loss: ', loss.item())
    opt.step()

随机训练之异构图上的节点分类

import dgl
import dgl.nn as dglnn
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

n_users = 1000
n_items = 500
n_follows = 3000
n_clicks = 5000
n_dislikes = 500
n_hetero_features = 10
n_user_classes = 5
n_max_clicks = 10

follow_src = np.random.randint(0, n_users, n_follows)
follow_dst = np.random.randint(0, n_users, n_follows)
click_src = np.random.randint(0, n_users, n_clicks)
click_dst = np.random.randint(0, n_items, n_clicks)
dislike_src = np.random.randint(0, n_users, n_dislikes)
dislike_dst = np.random.randint(0, n_items, n_dislikes)

hetero_graph = dgl.heterograph({
    ('user', 'follow', 'user'): (follow_src, follow_dst),
    ('user', 'followed-by', 'user'): (follow_dst, follow_src),
    ('user', 'click', 'item'): (click_src, click_dst),
    ('item', 'clicked-by', 'user'): (click_dst, click_src),
    ('user', 'dislike', 'item'): (dislike_src, dislike_dst),
    ('item', 'disliked-by', 'user'): (dislike_dst, dislike_src)})

hetero_graph.nodes['user'].data['feat'] = torch.randn(n_users, n_hetero_features)
hetero_graph.nodes['item'].data['feat'] = torch.randn(n_items, n_hetero_features)
hetero_graph.nodes['user'].data['label'] = torch.randint(0, n_user_classes, (n_users,))
hetero_graph.edges['click'].data['label'] = torch.randint(1, n_max_clicks, (n_clicks,)).float()

g = hetero_graph
train_nid_dict = {'user': np.random.choice(np.arange(n_users), (500, ), replace=False)}
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader = dgl.dataloading.NodeDataLoader(
    g, train_nid_dict, sampler,
    batch_size=256,
    shuffle=True,
    drop_last=False,
    num_workers=False)


class StochasticTwoLayerRGCN(nn.Module):
    def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
        super().__init__()
        self.conv1 = dglnn.HeteroGraphConv({
                rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
                for rel in rel_names
            })
        self.conv2 = dglnn.HeteroGraphConv({
                rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
                for rel in rel_names
            })

    def forward(self, blocks, x):
        x = self.conv1(blocks[0], x)
        x = self.conv2(blocks[1], x)
        return x

in_features = n_hetero_features
hidden_features = 100
out_features = n_user_classes
model = StochasticTwoLayerRGCN(in_features, hidden_features, out_features, g.etypes)
# model = model.cuda()
# opt = torch.optim.Adam(model.parameters())

# for input_nodes, output_nodes, blocks in dataloader:
#     blocks = [b.to(torch.device('cuda')) for b in blocks]
#     input_features = blocks[0].srcdata['feat']     
#     output_labels = blocks[-1].dstdata['label']
#     output_predictions = model(blocks, input_features)
#     loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
#     opt.zero_grad()
#     loss.backward()
#     opt.step()

opt = torch.optim.Adam(model.parameters())

for input_nodes, output_nodes, blocks in dataloader:
    input_features = blocks[0].srcdata['feat']
    output_labels = blocks[-1].dstdata['label']
    output_predictions = model(blocks, input_features)
    loss = F.cross_entropy(output_predictions['user'], output_labels['user'])
    opt.zero_grad()
    loss.backward()
    print('loss: ', loss.item())
    opt.step()

  • 5
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值