GNN入门之路03

尼尔-冯-哈尔滨

已于 2022-03-08 20:06:57 修改

阅读量612

点赞数

分类专栏：我的博客文章标签：机器学习 python 深度学习

于 2022-03-01 11:36:17 首次发布

本文链接：https://blog.csdn.net/m0_37671786/article/details/122543260

版权

我的博客专栏收录该内容

41 篇文章 2 订阅

订阅专栏

GNN入门之路03

这部分的内容很久没更新了，主要是因为本人之前在在线教育公司，赶上双减，被裁了。不过，事情已经过去很久了，而且自己也找到了新的工作，所以现在又来更新博客了。
在图节点预测或边预测任务中，需要先构造节点表征（representation），节点表征是图节点预测和边预测任务成功的关键。在此篇文章中，我们将学习如何基于图神经网络学习节点表征。
在本篇文章中，我们分别基于MLP、GCN、GAT三种方式来比较其在节点分类任务中的效果，废话少说，下面进入正文

1.准备工作

获取并分析数据集

from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures

dataset = Planetoid(root='data/Planetoid', name='Cora', transform=NormalizeFeatures())

print()
print(f'Dataset: {dataset}:')
print('======================')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')

data = dataset[0]  # Get the first graph object.

print()
print(data)
print('======================')

# Gather some statistics about the graph.
print(f'Number of nodes: {data.num_nodes}')
print(f'Number of edges: {data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Number of training nodes: {data.train_mask.sum()}')
print(f'Training node label rate: {int(data.train_mask.sum()) / data.num_nodes:.2f}')
print(f'Contains isolated nodes: {data.contains_isolated_nodes()}')
print(f'Contains self-loops: {data.contains_self_loops()}')
print(f'Is undirected: {data.is_undirected()}')

在完成环境配置后，我们可以直接运行上述的代码，输出如下：

======================
Number of nodes: 2708
Number of edges: 10556
Average node degree: 3.90
Number of training nodes: 140
Training node label rate: 0.05
Contains isolated nodes: False
Contains self-loops: False

从上述输出中我们可以了解到这个图的节点数量和边数量，以及其他的节点度和有真实标签的节点数量等。

可视化节点表征方法

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

def visualize(h, color):
    z = TSNE(n_components=2).fit_transform(out.detach().cpu().numpy())
    plt.figure(figsize=(10,10))
    plt.xticks([])
    plt.yticks([])

    plt.scatter(z[:, 0], z[:, 1], s=70, c=color, cmap="Set2")
    plt.show()

此部分内容主要是为了将节点特征的可视化，可以观察到这部分运用了降维，将多维的数据降低到二维，然后可视化。

用MLP进行节点分类

MLP节点分类器

import torch
from torch.nn import Linear
import torch.nn.functional as F

class MLP(torch.nn.Module):
    def __init__(self, hidden_channels):
        super(MLP, self).__init__()
        torch.manual_seed(12345)
        self.lin1 = Linear(dataset.num_features, hidden_channels)
        self.lin2 = Linear(hidden_channels, dataset.num_classes)

    def forward(self, x):
        x = self.lin1(x)
        x = x.relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.lin2(x)
        return x

model = MLP(hidden_channels=16)
print(model)

MLP训练

model = MLP(hidden_channels=16)
criterion = torch.nn.CrossEntropyLoss()  # Define loss criterion.
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)  # Define optimizer.

def train():
    model.train()
    optimizer.zero_grad()  # Clear gradients.
    out = model(data.x)  # Perform a single forward pass.
    loss = criterion(out[data.train_mask], data.y[data.train_mask])  # Compute the loss solely based on the training nodes.
    loss.backward()  # Derive gradients.
    optimizer.step()  # Update parameters based on gradients.
    return loss

for epoch in range(1, 201):
    loss = train()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')

MLP预测

def test():
    model.eval()
    out = model(data.x)
    pred = out.argmax(dim=1)  # Use the class with highest probability.
    test_correct = pred[data.test_mask] == data.y[data.test_mask]  # Check against ground-truth labels.
    test_acc = int(test_correct.sum()) / int(data.test_mask.sum())  # Derive ratio of correct predictions.
    return test_acc

test_acc = test()
print(f'Test Accuracy: {test_acc:.4f}')

Test Accuracy: 0.5900
训练完成后，我们可以观察到其表现不是很理想，主要原因应该是其节点数量过少（140有标签），训练过拟合。

GCN在节点特征分类中的应用

GCN的定义

GCN 神经网络层来源于论文“Semi-supervised Classification with Graph Convolutional Network”，其数学定义为，
$\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}\mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta}$
其中 $\mathbf{A}$ 表示各节点的关系矩阵，其行列的维度均为总结点的数量，当i行j列的数字为1时，即可以表示第i个节点与第j个节点相邻。 $\mathbf{X}$ 的行为节点数量，列为节点的特征数量，储存了节点的关系特征信息。

$\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}$ 表示插入自环的邻接矩阵， $\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}$ 表示其对角线度矩阵。邻接矩阵可以包括不为 $1$ 的值，当邻接矩阵不为{0,1}值时，表示邻接矩阵存储的是边的权重。 $\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2}$ 为对称归一化矩阵，其实可以简单看做对 $\mathbf{\hat{A}}$ 的归一化操作， $\mathbf{\hat{D}}^{-1/2}$ 与 $\mathbf{\hat{D}}^{-1/2}$ 矩阵就是归一化矩阵。

基于GCN的图节点分类

from torch_geometric.nn import GCNConv

class GCN(torch.nn.Module):
    def __init__(self, hidden_channels):
        super(GCN, self).__init__()
        torch.manual_seed(12345)
        self.conv1 = GCNConv(dataset.num_features, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, dataset.num_classes)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index)
        return x

model = GCN(hidden_channels=16)
print(model)

GCN训练

model = GCN(hidden_channels=16)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()

def train():
      model.train()
      optimizer.zero_grad()  # Clear gradients.
      out = model(data.x, data.edge_index)  # Perform a single forward pass.
      loss = criterion(out[data.train_mask], data.y[data.train_mask])  # Compute the loss solely based on the training nodes.
      loss.backward()  # Derive gradients.
      optimizer.step()  # Update parameters based on gradients.
      return loss

for epoch in range(1, 201):
    loss = train()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')

GCN测试

model = GCN(hidden_channels=16)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()

def train():
      model.train()
      optimizer.zero_grad()  # Clear gradients.
      out = model(data.x, data.edge_index)  # Perform a single forward pass.
      loss = criterion(out[data.train_mask], data.y[data.train_mask])  # Compute the loss solely based on the training nodes.
      loss.backward()  # Derive gradients.
      optimizer.step()  # Update parameters based on gradients.
      return loss

for epoch in range(1, 201):
    loss = train()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')

通过简单地将线性层替换成GCN层，我们可以达到81.4%的测试准确率！与前面的仅获得59%的测试准确率的MLP分类器相比，现在的分类器准确性要高得多。这表明节点的邻接信息在取得更好的准确率方面起着关键作用。

GAT在节点特征分类中的应用

GAT的定义

图注意网络（GAT）来源于论文 Graph Attention Networks。其数学定义为，
$\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} +\sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},$
其中注意力系数 $\alpha_{i,j}$ 的计算方法为
公式
$\alpha_{i,j} =\frac{\exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top}[\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j]\right)\right)}{\sum_{k \in \mathcal{N}(i) \cup \{ i \}}\exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top}[\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k]\right)\right)}.$

代码

import torch
from torch.nn import Linear
import torch.nn.functional as F

from torch_geometric.nn import GATConv

class GAT(torch.nn.Module):
    def __init__(self, hidden_channels):
        super(GAT, self).__init__()
        torch.manual_seed(12345)
        self.conv1 = GATConv(dataset.num_features, hidden_channels)
        self.conv2 = GATConv(hidden_channels, dataset.num_classes)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index)
        return x

基于GAT图神经网络的训练和测试，与基于GCN图神经网络的训练和测试相同，此处不再赘述

尼尔-冯-哈尔滨

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
GNN入门之路03

GNN入门之路03 这部分的内容很久没更新了，主要是因为本人之前在在线教育公司，赶上双减，被裁了。不过，事情已经过去很久了，而且自己也找到了新的工作，所以现在又来更新博客了。在图节点预测或边预测任务中，需要先构造节点表征（representation），节点表征是图节点预测和边预测任务成功的关键。在此篇文章中，我们将学习如何基于图神经网络学习节点表征。在本篇文章中，我们分别基于MLP、GCN、GAT三种方式来比较其在节点分类任务中的效果，废话少说，下面进入正文1.准备工作 ...
复制链接

扫一扫