组队学习-图神经网络 Taks 04

1、InMemoryDataset基类的使用

在PyG中,我们通过继承InMemoryDataset类来自定义一个数据可全部存储到内存的数据集类。

InMemoryDataset类初始化方法参数说明: 

root:字符串类型,**存储数据集的文件夹的路径**。

transform:函数类型,一个数据转换函数,它接收一个`Data`对象并返回一个转换后的`Data`对象。

pre_transform:函数类型,一个数据转换函数,它接收一个`Data`对象并返回一个转换后的Data对象。 

pre_filter:函数类型,**一个检查数据是否要保留的函数**,它接收一个Data对象,返回此Data对象是否应该被包含在最终的数据集中。

一个简化的`InMemory`数据集类

import torch
from torch_geometric.data import InMemoryDataset, download_url

class MyOwnDataset(InMemoryDataset):
    def __init__(self, root, transform=None, pre_transform=None, pre_filter=None):
        super().__init__(root=root, transform=transform, pre_transform=pre_transform, pre_filter=pre_filter)
        self.data, self.slices = torch.load(self.processed_paths[0])

    @property
    def raw_file_names(self):
        return ['some_file_1', 'some_file_2', ...]

    @property
    def processed_file_names(self):
        return ['data.pt']

    def download(self):
        # Download to `self.raw_dir`.
        download_url(url, self.raw_dir)
        ...

    def process(self):
        # Read data into huge `Data` list.
        data_list = [...]

        if self.pre_filter is not None:
            data_list = [data for data in data_list if self.pre_filter(data)]

        if self.pre_transform is not None:
            data_list = [self.pre_transform(data) for data in data_list]

        data, slices = self.collate(data_list)
        torch.save((data, slices), self.processed_paths[0])

PlanetoidPubMed数据集类的构造

import os.path as osp

import torch
from torch_geometric.data import (InMemoryDataset, download_url)
from torch_geometric.io import read_planetoid_data

class PlanetoidPubMed(InMemoryDataset):
    r""" 节点代表文章,边代表引文关系。
   		 训练、验证和测试的划分通过二进制掩码给出。
    参数:
        root (string): 存储数据集的文件夹的路径
        transform (callable, optional): 数据转换函数,每一次获取数据时被调用。
        pre_transform (callable, optional): 数据转换函数,数据保存到文件前被调用。
    """

    url = 'https://github.com/kimiyoung/planetoid/raw/master/data'

    def __init__(self, root, transform=None, pre_transform=None):

        super(PlanetoidPubMed, self).__init__(root, transform, pre_transform)
        self.data, self.slices = torch.load(self.processed_paths[0])

    @property
    def raw_dir(self):
        return osp.join(self.root, 'raw')

    @property
    def processed_dir(self):
        return osp.join(self.root, 'processed')

    @property
    def raw_file_names(self):
        names = ['x', 'tx', 'allx', 'y', 'ty', 'ally', 'graph', 'test.index']
        return ['ind.pubmed.{}'.format(name) for name in names]

    @property
    def processed_file_names(self):
        return 'data.pt'

    def download(self):
        for name in self.raw_file_names:
            download_url('{}/{}'.format(self.url, name), self.raw_dir)

    def process(self):
        data = read_planetoid_data(self.raw_dir, 'pubmed')
        data = data if self.pre_transform is None else self.pre_transform(data)
        torch.save(self.collate([data]), self.processed_paths[0])

    def __repr__(self):
        return '{}()'.format(self.name)

dataset = PlanetoidPubMed('dataset/PlanetoidPubMed')
print(dataset.num_classes)
print(dataset[0].num_nodes)
print(dataset[0].num_edges)
print(dataset[0].num_features)

3
19717
88648
500

2、尝试使用PyG中的不同的网络层去代替`GCNConv`,以及不同的层数和不同的`out_channels`,来实现节点分类任务。

设置1:

class Net(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Net, self).__init__()
        self.conv1 = GCNConv(in_channels, 128)
        self.conv2 = GCNConv(128, 128)
        self.conv3 = GCNConv(128, out_channels)

    def encode(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = x.relu()
        return self.conv2(x, edge_index)

    def decode(self, z, pos_edge_index, neg_edge_index):
        edge_index = torch.cat([pos_edge_index, neg_edge_index], dim=-1)
        return (z[edge_index[0]] * z[edge_index[1]]).sum(dim=-1)

    def decode_all(self, z):
        prob_adj = z @ z.t()
        return (prob_adj > 0).nonzero(as_tuple=False).t()
Epoch: 089, Loss: 0.4571, Val: 0.9033, Test: 0.8747
Epoch: 090, Loss: 0.4601, Val: 0.9049, Test: 0.8769
Epoch: 091, Loss: 0.4557, Val: 0.9070, Test: 0.8805
Epoch: 092, Loss: 0.4612, Val: 0.9091, Test: 0.8837
Epoch: 093, Loss: 0.4553, Val: 0.9117, Test: 0.8855
Epoch: 094, Loss: 0.4521, Val: 0.9139, Test: 0.8871
Epoch: 095, Loss: 0.4495, Val: 0.9156, Test: 0.8894
Epoch: 096, Loss: 0.4512, Val: 0.9165, Test: 0.8914
Epoch: 097, Loss: 0.4519, Val: 0.9165, Test: 0.8922
Epoch: 098, Loss: 0.4541, Val: 0.9185, Test: 0.8942
Epoch: 099, Loss: 0.4521, Val: 0.9206, Test: 0.8947
Epoch: 100, Loss: 0.4446, Val: 0.9217, Test: 0.8930

设置2:

class Net(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Net, self).__init__()
        self.conv1 = GCNConv(in_channels, 256)
        self.conv2 = GCNConv(256, 128)
        self.conv3 = GCNConv(128, out_channels)

    def encode(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = x.relu()
        return self.conv2(x, edge_index)

    def decode(self, z, pos_edge_index, neg_edge_index):
        edge_index = torch.cat([pos_edge_index, neg_edge_index], dim=-1)
        return (z[edge_index[0]] * z[edge_index[1]]).sum(dim=-1)

    def decode_all(self, z):
        prob_adj = z @ z.t()
        return (prob_adj > 0).nonzero(as_tuple=False).t()
Epoch: 093, Loss: 0.4613, Val: 0.8882, Test: 0.8851
Epoch: 094, Loss: 0.4657, Val: 0.8893, Test: 0.8856
Epoch: 095, Loss: 0.4582, Val: 0.8905, Test: 0.8863
Epoch: 096, Loss: 0.4629, Val: 0.8906, Test: 0.8888
Epoch: 097, Loss: 0.4609, Val: 0.8907, Test: 0.8894
Epoch: 098, Loss: 0.4602, Val: 0.8911, Test: 0.8888
Epoch: 099, Loss: 0.4571, Val: 0.8929, Test: 0.8894
Epoch: 100, Loss: 0.4575, Val: 0.8937, Test: 0.8906

设置3:

class Net(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Net, self).__init__()
        self.conv1 = GCNConv(in_channels, 256)
        self.conv2 = GCNConv(256, out_channels)

    def encode(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = x.relu()
        return self.conv2(x, edge_index)

    def decode(self, z, pos_edge_index, neg_edge_index):
        edge_index = torch.cat([pos_edge_index, neg_edge_index], dim=-1)
        return (z[edge_index[0]] * z[edge_index[1]]).sum(dim=-1)

    def decode_all(self, z):
        prob_adj = z @ z.t()
        return (prob_adj > 0).nonzero(as_tuple=False).t()
Epoch: 091, Loss: 0.4247, Val: 0.9253, Test: 0.9279
Epoch: 092, Loss: 0.4246, Val: 0.9246, Test: 0.9279
Epoch: 093, Loss: 0.4248, Val: 0.9232, Test: 0.9279
Epoch: 094, Loss: 0.4175, Val: 0.9236, Test: 0.9279
Epoch: 095, Loss: 0.4201, Val: 0.9245, Test: 0.9279
Epoch: 096, Loss: 0.4221, Val: 0.9259, Test: 0.9267
Epoch: 097, Loss: 0.4240, Val: 0.9242, Test: 0.9267
Epoch: 098, Loss: 0.4234, Val: 0.9220, Test: 0.9267
Epoch: 099, Loss: 0.4218, Val: 0.9235, Test: 0.9267
Epoch: 100, Loss: 0.4257, Val: 0.9246, Test: 0.9267

实践问题二:在边预测任务中,尝试用`torch_geometric.nn.Sequential`容器构造图神经网络。

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Sequential('x, edge_index, batch', [(GCNConv(dataset.num_features, 64), 'x, edge_index -> x1'),(GCNConv(64, 2), 'x1, edge_index -> x2'),]).to(device)
data = data.to(device)
optimizer = torch.optim.Adam(params=model.parameters(), lr=0.01)

思考问题三:如下方代码所示,我们以`data.train_pos_edge_index`为实际参数来进行训练集负样本采样,但这样采样得到的负样本可能包含一些验证集的正样本与测试集的正样本,即可能将真实的正样本标记为负样本,由此会产生冲突。但我们还是这么做,这是为什么?

还不知道呢

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值