PyG异质图神经网络NotImplementedError问题

该博客介绍了在使用PyTorch Geometric库(PyG)处理异质图时遇到的一个常见问题,即某些节点没有入边导致模型无法正确运行。通过引入`ToUndirected`转换,将图转换为无向图,使得所有节点均有入边,从而解决了这一问题。博主展示了如何在OGB_MAG数据集上应用SAGEConv模型,并给出了转换前后的代码示例和输出结果。
摘要由CSDN通过智能技术生成

诸神缄默不语-个人CSDN博文目录

以PyG官方的数据集和示例代码来复现一下这个问题:

import torch

import torch_geometric.transforms as T
from torch_geometric.datasets import OGB_MAG
from torch_geometric.nn import SAGEConv, to_hetero

dataset = OGB_MAG(root='files/pyg_data', preprocess='metapath2vec')
data = dataset[0]

print(data)

class GNN(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = SAGEConv((-1, -1), hidden_channels)
        self.conv2 = SAGEConv((-1, -1), out_channels)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return x


model = GNN(hidden_channels=64, out_channels=dataset.num_classes)
model = to_hetero(model, data.metadata(), aggr='sum')

with torch.no_grad():  # Initialize lazy modules.
    out = model(data.x_dict, data.edge_index_dict)

print(out)

输出信息:

HeteroData(
  paper={
    x=[736389, 128],
    year=[736389],
    y=[736389],
    train_mask=[736389],
    val_mask=[736389],
    test_mask=[736389]
  },
  author={ x=[1134649, 128] },
  institution={ x=[8740, 128] },
  field_of_study={ x=[59965, 128] },
  (author, affiliated_with, institution)={ edge_index=[2, 1043998] },
  (author, writes, paper)={ edge_index=[2, 7145660] },
  (paper, cites, paper)={ edge_index=[2, 5416271] },
  (paper, has_topic, field_of_study)={ edge_index=[2, 7505078] }
)
my_env/lib/python3.8/site-packages/torch_geometric/nn/to_hetero_transformer.py:145: UserWarning: There exist node types ({'author'}) whose representations do not get updated during message passing as they do not occur as destination type in any edge type. This may lead to unexpected behaviour.
  warnings.warn(
Traceback (most recent call last):
  File "try2.py", line 25, in <module>
    model = to_hetero(model, data.metadata(), aggr='sum')
  File "env_path/lib/python3.8/site-packages/torch_geometric/nn/to_hetero_transformer.py", line 118, in to_hetero
    return transformer.transform()
  File "env_path/lib/python3.8/site-packages/torch_geometric/nn/fx.py", line 157, in transform
    getattr(self, op)(node, node.target, node.name)
  File "env_path/lib/python3.8/site-packages/torch_geometric/nn/to_hetero_transformer.py", line 294, in call_method
    args, kwargs = self.map_args_kwargs(node, key)
  File "env_path/lib/python3.8/site-packages/torch_geometric/nn/to_hetero_transformer.py", line 397, in map_args_kwargs
    args = tuple(_recurse(v) for v in node.args)
  File "env_path/lib/python3.8/site-packages/torch_geometric/nn/to_hetero_transformer.py", line 397, in <genexpr>
    args = tuple(_recurse(v) for v in node.args)
  File "env_path/lib/python3.8/site-packages/torch_geometric/nn/to_hetero_transformer.py", line 387, in _recurse
    raise NotImplementedError
NotImplementedError

可以很容易地看出来,这是由于有一种节点没有入边产生的问题。
解决方案就是使所有节点都有入边。如:

import torch

import torch_geometric.transforms as T
from torch_geometric.datasets import OGB_MAG
from torch_geometric.nn import SAGEConv, to_hetero

dataset = OGB_MAG(root='files/pyg_data', preprocess='metapath2vec',transform=T.ToUndirected())
data = dataset[0]

print(data)

class GNN(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = SAGEConv((-1, -1), hidden_channels)
        self.conv2 = SAGEConv((-1, -1), out_channels)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return x


model = GNN(hidden_channels=64, out_channels=dataset.num_classes)
model = to_hetero(model, data.metadata(), aggr='sum')

with torch.no_grad():  # Initialize lazy modules.
    out = model(data.x_dict, data.edge_index_dict)

print(out)

将异质图转换为无向图,这样就能得到正常的输出结果:

HeteroData(
  paper={
    x=[736389, 128],
    year=[736389],
    y=[736389],
    train_mask=[736389],
    val_mask=[736389],
    test_mask=[736389]
  },
  author={ x=[1134649, 128] },
  institution={ x=[8740, 128] },
  field_of_study={ x=[59965, 128] },
  (author, affiliated_with, institution)={ edge_index=[2, 1043998] },
  (author, writes, paper)={ edge_index=[2, 7145660] },
  (paper, cites, paper)={ edge_index=[2, 10792672] },
  (paper, has_topic, field_of_study)={ edge_index=[2, 7505078] },
  (institution, rev_affiliated_with, author)={ edge_index=[2, 1043998] },
  (paper, rev_writes, author)={ edge_index=[2, 7145660] },
  (field_of_study, rev_has_topic, paper)={ edge_index=[2, 7505078] }
)
{'paper': tensor([[-0.8212, -0.2630, -0.7286,  ...,  1.1904,  0.1617, -0.5388],
        [-1.2484, -0.3707, -1.0336,  ...,  0.9618, -0.0373, -0.1125],
        [-0.5375,  0.0357, -0.6772,  ...,  1.2185,  0.2292, -0.2130],
        ...,
        [-0.9934, -0.2688, -0.9547,  ...,  1.3144,  0.1519, -0.2015],
        [-1.4711, -0.6607, -0.7509,  ...,  2.3383,  0.6815, -1.0679],
        [-0.4352, -0.4255, -0.6907,  ...,  1.1532,  0.1152, -0.9703]]), 'author': tensor([[-0.2782,  0.1771,  0.4187,  ..., -0.5233, -0.2969,  0.2438],
        [-0.4543,  0.1019,  0.1637,  ..., -0.7748, -0.2809,  0.2598],
        [-0.1613, -0.0481, -0.2491,  ..., -0.6227, -0.4217,  0.1335],
        ...,
        [-0.4908,  0.2382,  0.2973,  ..., -0.7266, -0.2486,  0.6449],
        [-0.2819,  0.0125,  0.9843,  ..., -1.9652, -0.4280, -0.4842],
        [-0.4236, -0.1222,  1.0246,  ..., -2.0615, -0.3246, -0.1771]]), 'institution': tensor([[ 0.3911, -1.3527, -0.6624,  ...,  0.2732,  0.5270,  0.5756],
        [ 0.1512, -0.6687, -0.6516,  ...,  0.1482,  0.2535,  0.1935],
        [ 0.1933, -1.1643, -0.4936,  ...,  0.5382,  0.3407,  0.2199],
        ...,
        [ 0.1489, -0.3021, -0.3390,  ...,  0.2690,  0.1571, -0.0781],
        [ 0.1855, -0.4848, -0.3205,  ...,  0.4728,  0.0659,  0.1500],
        [ 0.1724, -0.0682, -0.0894,  ...,  0.1189,  0.1230, -0.2249]]), 'field_of_study': tensor([[ 0.1929, -0.5402, -0.5714,  ..., -0.4296,  0.4376, -0.0660],
        [-0.2281,  0.0773, -0.0486,  ..., -0.0544, -0.2894,  0.2706],
        [-0.2798, -0.1967, -0.3376,  ..., -0.3098, -0.1610,  0.1120],
        ...,
        [ 0.0775, -0.5927, -0.6084,  ..., -0.3190,  0.2483, -0.1418],
        [ 0.0286, -0.7393, -0.6629,  ..., -0.4745,  0.8461, -0.1554],
        [-0.0804, -0.5598, -0.8517,  ..., -0.2317,  0.3234, -0.0520]])}
# GPF ## 一、GPF(Graph Processing Flow):利用神经网络处理问题的一般化流程 1、节点预表示:利用NE框架,直接获得全每个节点的Embedding; 2、正负样本采样:(1)单节点样本;(2)节点对样本; 3、抽取封闭子:可做类化处理,建立一种通用数据结构; 4、子特征融合:预表示、节点特征、全局特征、边特征; 5、网络配置:可以是输入、输出的网络;也可以是输入,分类/聚类结果输出的网络; 6、训练和测试; ## 二、主要文件: 1、graph.py:读入数据; 2、embeddings.py:预表示学习; 3、sample.py:采样; 4、subgraphs.py/s2vGraph.py:抽取子; 5、batchgraph.py:子特征融合; 6、classifier.py:网络配置; 7、parameters.py/until.py:参数配置/帮助文件; ## 三、使用 1、在parameters.py中配置相关参数(可默认); 2、在example/文件夹中运行相应的案例文件--包括链接预测、节点状态预测; 以链接预测为例: ### 1、导入配置参数 ```from parameters import parser, cmd_embed, cmd_opt``` ### 2、参数转换 ``` args = parser.parse_args() args.cuda = not args.noCuda and torch.cuda.is_available() torch.manual_seed(args.seed) if args.cuda: torch.cuda.manual_seed(args.seed) if args.hop != 'auto': args.hop = int(args.hop) if args.maxNodesPerHop is not None: args.maxNodesPerHop = int(args.maxNodesPerHop) ``` ### 3、读取数据 ``` g = graph.Graph() g.read_edgelist(filename=args.dataName, weighted=args.weighted, directed=args.directed) g.read_node_status(filename=args.labelName) ``` ### 4、获取全节点的Embedding ``` embed_args = cmd_embed.parse_args() embeddings = embeddings.learn_embeddings(g, embed_args) node_information = embeddings #print node_information ``` ### 5、正负节点采样 ``` train, train_status, test, test_status = sample.sample_single(g, args.testRatio, max_train_num=args.maxTrainNum) ``` ### 6、抽取节点对的封闭子 ``` net = until.nxG_to_mat(g) #print net train_graphs, test_graphs, max_n_label = subgraphs.singleSubgraphs(net, train, train_status, test, test_status, args.hop, args.maxNodesPerHop, node_information) print('# train: %d, # test: %d' % (len(train_graphs), len(test_graphs))) ``` ### 7、加载网络模型,并在classifier中配置相关参数 ``` cmd_args = cmd_opt.parse_args() cmd_args.feat_dim = max_n_label + 1 cmd_args.attr_dim = node_information.shape[1] cmd_args.latent_dim = [int(x) for x in cmd_args.latent_dim.split('-')] if len(cmd_args.latent_dim) == 1: cmd_args.latent_dim = cmd_args.latent_dim[0] model = classifier.Classifier(cmd_args) optimizer = optim.Adam(model.parameters(), lr=args.learningRate) ``` ### 8、训练和测试 ``` train_idxes = list(range(len(train_graphs))) best_loss = None for epoch in range(args.num_epochs): random.shuffle(train_idxes) model.train() avg_loss = loop_dataset(train_graphs, model, train_idxes, cmd_args.batch_size, optimizer=optimizer) print('\033[92maverage training of epoch %d: loss %.5f acc %.5f auc %.5f\033[0m' % (epoch, avg_loss[0], avg_loss[1], avg_loss[2])) model.eval() test_loss = loop_dataset(test_graphs, model, list(range(len(test_graphs))), cmd_args.batch_size) print('\033[93maverage test of epoch %d: loss %.5f acc %.5f auc %.5f\033[0m' % (epoch, test_loss[0], test_loss[1], test_loss[2])) ``` ### 9、运行结果 ``` average test of epoch 0: loss 0.62392 acc 0.71462 auc 0.72314 loss: 0.51711 acc: 0.80000: 100%|███████████████████████████████████| 76/76 [00:07<00:00, 10.09batch/s] average training of epoch 1: loss 0.54414 acc 0.76895 auc 0.77751 loss: 0.37699 acc: 0.79167: 100%|█████████████████████████████████████| 9/9 [00:00<00:00, 34.07batch/s] average test of epoch 1: loss 0.51981 acc 0.78538 auc 0.79709 loss: 0.43700 acc: 0.84000: 100%|███████████████████████████████████| 76/76 [00:07<00:00, 9.64batch/s] average training of epoch 2: loss 0.49896 acc 0.79184 auc 0.82246 loss: 0.63594 acc: 0.66667: 100%|█████████████████████████████████████| 9/9 [00:00<00:00, 28.62batch/s] average test of epoch 2: loss 0.48979 acc 0.79481 auc 0.83416 loss: 0.57502 acc: 0.76000: 100%|███████████████████████████████████| 76/76 [00:07<00:00, 9.70batch/s] average training of epoch 3: loss 0.50005 acc 0.77447 auc 0.79622 loss: 0.38903 acc: 0.75000: 100%|█████████████████████████████████████| 9/9 [00:00<00:00, 34.03batch/s] average test of epoch 3: loss 0.41463 acc 0.81132 auc 0.86523 loss: 0.54336 acc: 0.76000: 100%|███████████████████████████████████| 76/76 [00:07<00:00, 9.57batch/s] average training of epoch 4: loss 0.44815 acc 0.81711 auc 0.84530 loss: 0.44784 acc: 0.70833: 100%|█████████████████████████████████████| 9/9 [00:00<00:00, 28.62batch/s] average test of epoch 4: loss 0.48319 acc 0.81368 auc 0.84454 loss: 0.36999 acc: 0.88000: 100%|███████████████████████████████████| 76/76 [00:07<00:00, 10.17batch/s] average training of epoch 5: loss 0.39647 acc 0.84184 auc 0.89236 loss: 0.15548 acc: 0.95833: 100%|█████████████████████████████████████| 9/9 [00:00<00:00, 28.62batch/s] average test of epoch 5: loss 0.30881 acc 0.89623 auc 0.95132 ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

诸神缄默不语

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值