图神经网络—lesson3—基于图神经网络的节点表征学习


参考来源:https://github.com/datawhalechina/team-learning-nlp/tree/master/GNN

1. 引言

在图节点预测或边预测任务中,首先需要生成节点表征(Node Representation)。我们使用图神经网络来生成节点表征,并通过基于监督学习的对图神经网络的训练,使得图神经网络学会产生高质量的节点表征高质量的节点表征能够用于衡量节点的相似性,同时高质量的节点表征也是准确分类节点的前提。

本节中,我们将学习实现多层图神经网络的方法,并以节点分类任务为例,学习训练图神经网络的一般过程。我们将以Cora数据集为例子进行说明,Cora是一个论文引用网络,节点代表论文,如果两篇论文存在引用关系,则对应的两个节点之间存在边,各节点的属性都是一个1433维的词包特征向量。我们的任务是预测各篇论文的类别(共7类)。我们还将对MLP和GCN, GAT(两个知名度很高的图神经网络)三类神经网络在节点分类任务中的表现进行比较分析,以此来展现图神经网络的强大和论证图神经网络强于普通深度神经网络的原因。

2. 准备工作

2.1 获取并分析数据集

#获取并分析数据集
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures

dataset = Planetoid(root='dataset', name='Cora', transform=NormalizeFeatures())

print()
print(f'Dataset: {dataset}:')
print('======================')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')

data = dataset[0]  # Get the first graph object.

print()
print(data)
print('======================')

# Gather some statistics about the graph.
print(f'Number of nodes,节点数: {data.num_nodes}')
print(f'Number of edges,边数: {data.num_edges}')
print(f'Average node degree,平均节点度: {data.num_edges / data.num_nodes:.2f}')
print(f'Number of training nodes,训练数据点数: {data.train_mask.sum()}')
print(f'Training node label rate,训练数据比例: {int(data.train_mask.sum()) / data.num_nodes:.2f}')
print(f'Contains isolated nodes,孤立点个数: {data.contains_isolated_nodes()}')
print(f'Contains self-loops,是否存在自环边: {data.contains_self_loops()}')
print(f'Is undirected,是否无向图: {data.is_undirected()}')

在这里插入图片描述
分析数据我们可以发现,Cora图拥有2,708个节点和10,556条边,平均节点度为3.9,训练集仅使用了140个节点,占整体的5%。该图是无向图,不存在自环边和孤立的节点。

2.2 数据转换

数据转换(transform)在将数据输入到神经网络之前修改数据,这一功能可用于实现数据规范化或数据增强。使用NormalizeFeatures进行节点特征归一化,使各节点特征总和为1

NormalizeFeatures说明文档:
https://pytorch-geometric.readthedocs.io/en/latest/modules/transforms.html#torch_geometric.transforms.NormalizeFeatures

NormalizeFeatures源码:

class NormalizeFeatures(object):
    r"""Row-normalizes node features to sum-up to one."""

    def __call__(self, data):
        data.x = data.x / data.x.sum(1, keepdim=True).clamp(min=1)
        return data

    def __repr__(self):
        return '{}()'.format(self.__class__.__name__)

2.3 可视化节点表征分布的方法

定义一个可视化方法visualize,利用TSNE方法将高维的节点表征映射到二维平面空间,然后在二维平面画出节点,这样就实现了节点表征分布的可视化。

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

def visualize(h, color):
    z = TSNE(n_components=2).fit_transform(out.detach().cpu().numpy())
    plt.figure(figsize=(10,10))
    plt.xticks([])
    plt.yticks([])

    plt.scatter(z[:, 0], z[:, 1], s=70, c=color, cmap="Set2")
    plt.show()

3. 使用MLP神经网络进行节点分类

MLP的构造

理论上,仅根据文章的内容,即它的词包特征表征(bag-of-words feature representation),应该能够推断文章的类别,而无需考虑文章之间的任何关系信息。接下来,构建一个简单的MLP神经网络来验证这一点。

import torch
from torch.nn import Linear
import torch.nn.functional as F

class MLP(torch.nn.Module):
    def __init__(self, hidden_channels):
        super(MLP, self).__init__()
        torch.manual_seed(12345)
        self.lin1 = Linear(dataset.num_features, hidden_channels)
        self.lin2 = Linear(hidden_channels, dataset.num_classes)

    def forward(self, x):
        x = self.lin1(x)
        x = x.relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.lin2(x)
        return x

model = MLP(hidden_channels=16)
print(model)

MLP的训练

利用交叉熵损失Adam优化器来训练这个MLP神经网络

model = MLP(hidden_channels=16)
criterion = torch.nn.CrossEntropyLoss()  # Define loss criterion.
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)  # Define optimizer.

def train():
    model.train()
    optimizer.zero_grad()  # Clear gradients.
    out = model(data.x)  # Perform a single forward pass.
    loss = criterion(out[data.train_mask], data.y[data.train_mask])  # Compute the loss solely based on the training nodes.
    loss.backward()  # Derive gradients.
    optimizer.step()  # Update parameters based on gradients.
    return loss

for epoch in range(1, 201):
    loss = train()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')

MLP的测试

测试这个MLP神经网络在测试集上的表现

def test():
    model.eval()
    out = model(data.x)
    pred = out.argmax(dim=1)  # Use the class with highest probability.
    test_correct = pred[data.test_mask] == data.y[data.test_mask]  # Check against ground-truth labels.
    test_acc = int(test_correct.sum()) / int(data.test_mask.sum())  # Derive ratio of correct predictions.
    return test_acc

test_acc = test()
print(f'Test Accuracy: {test_acc:.4f}')

在这里插入图片描述
可以发现,这个简单的MLP模型表现并不好,准确性大概只有59%。主要原因可能是训练节点过少,模型过拟合等等。

4. 卷积神经网络(GCN)

4.1 GCN是什么

GCN的数学定义:
X ′ = D ^ − 1 / 2 A ^ D ^ − 1 / 2 X Θ , \mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}\mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta}, X=D^1/2A^D^1/2XΘ,
其中 A ^ = A + I \mathbf{\hat{A}} = \mathbf{A} + \mathbf{I} A^=A+I表示插入自环的邻接矩阵(使得每一个节点都有一条边连接到自身), D ^ i i = ∑ j = 0 A ^ i j \hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij} D^ii=j=0A^ij表示 A ^ \mathbf{\hat{A}} A^的对角线度矩阵(对角线元素为对应节点的度,其余元素为0)。邻接矩阵可以包括不为 1 1 1的值,当邻接矩阵不为{0,1}值时,表示邻接矩阵存储的是边的权重。 D ^ − 1 / 2 A ^ D ^ − 1 / 2 \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} D^1/2A^D^1/2是对称归一化矩阵,它的节点式表述为:
x i ′ = Θ ∑ j ∈ N ( v ) ∪ { i } e j , i d ^ j d ^ i x j \mathbf{x}^{\prime}_i = \mathbf{\Theta} \sum_{j \in \mathcal{N}(v) \cup\{ i \}} \frac{e_{j,i}}{\sqrt{\hat{d}_j \hat{d}_i}} \mathbf{x}_j xi=ΘjN(v){i}d^jd^i ej,ixj
其中, d ^ i = 1 + ∑ j ∈ N ( i ) e j , i \hat{d}_i = 1 + \sum_{j \in \mathcal{N}(i)} e_{j,i} d^i=1+jN(i)ej,i e j , i e_{j,i} ej,i表示从源节点 j j j到目标节点 i i i的边的对称归一化系数(默认值为1.0)。

4.2 GCNConv 模块说明

GCN官方文档:
https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.conv.GCNConv

GCNConv构造函数接口:

GCNConv(in_channels: int, out_channels: int, 
		improved: bool = False, cached: bool = False, 
		add_self_loops: bool = True, normalize: bool = True, 
		bias: bool = True, **kwargs)

其中:

  • in_channels:输入数据维度;
  • out_channels:输出数据维度;
  • improved:如果为true A ^ = A + 2 I \mathbf{\hat{A}} = \mathbf{A} + 2\mathbf{I} A^=A+2I,其目的在于增强中心节点自身信息;
  • cached:是否存储 D ^ − 1 / 2 A ^ D ^ − 1 / 2 \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} D^1/2A^D^1/2的计算结果以便后续使用,这个参数只应在归纳学习(transductive learning)的场景中设置为true(归纳学习可以简单理解为在训练、验证、测试、推理(inference)四个阶段都只使用一个数据集);
  • add_self_loops:是否在邻接矩阵中增加自环边;
  • normalize:是否添加自环边并在运行中计算对称归一化系数;
  • bias:是否包含偏置项。

4.3 GCN图神经网络的构造

将MLP网络构造中的torch.nn.Linear替换成torch_geometric.nn.GCNConv,就可以得到一个GCN图神经网络,如下方代码所示:

class GCN(torch.nn.Module):
    def __init__(self, hidden_channels):
        super(GCN, self).__init__()
        os.environ["CUDA_VISIBLE_DEVICES"] = "0"
        torch.manual_seed(12345)
        self.conv1 = GCNConv(dataset.num_features, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, dataset.num_classes)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index)
        return x

model = GCN(hidden_channels=16)
print(model)

在这里插入图片描述

4.4 可视化由未经训练的节点表征

可视化函数visualize()在上文中已经定义
为了提升训练速度,前面已经将datamodel都放到了GPU()中,在这里将datamodel都放入CPU()中,才可以使用之前定义的可视化函数。

from torch_geometric.nn import GCNConv
model = GCN(hidden_channels=16)
model.eval()

data1 = data.cpu()
model1 = model.cpu()
out = model1(data1.x, data1.edge_index)
visualize(out, color=data.y)

未经训练的GCN网络可视化的节点如下图:
在这里插入图片描述

4.5 GCN图神经网络的训练

下面使用训练数据对GCN网络进行训练:

model = GCN(hidden_channels=16)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()

def train():
    os.environ["CUDA_VISIBLE_DEVICES"] = "0"
    model.train()
    optimizer.zero_grad()  # Clear gradients.
    out = model(data.x, data.edge_index)  # Perform a single forward pass.
    loss = criterion(out[data.train_mask], data.y[data.train_mask])  # Compute the loss solely based on the training nodes.
    loss.backward()  # Derive gradients.
    optimizer.step()  # Update parameters based on gradients.
    return loss

for epoch in range(1, 201):
    loss = train()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')

4.6 GCN图神经网络的测试

def test():
    model.eval()
    out = model(data.x, data.edge_index)
    pred = out.argmax(dim=1)  # Use the class with highest probability.
    test_correct = pred[data.test_mask] == data.y[data.test_mask]  # Check against ground-truth labels.
    test_acc = int(test_correct.sum()) / int(data.test_mask.sum())  # Derive ratio of correct predictions.
    return test_acc

test_acc = test()
print(f'Test Accuracy: {test_acc:.4f}')

在这里插入图片描述
可以发现,使用GCN模型,在测试集上的表现明显提升,这表明节点的邻接信息在取得更好的准确率方面起着关键作用。。
将训练后的数据进行可视化:
在这里插入图片描述

5. 图注意力神经网络(GAT)

5.1 GAT是什么

参考来源:https://baijiahao.baidu.com/s?id=1671028964544884749&wfr=spider&for=pc

图神经网络 GNN 把深度学习应用到图结构 (Graph) 中,其中的图卷积网络 GCN 可以在 Graph 上进行卷积操作。但是 GCN 存在一些缺陷:依赖拉普拉斯矩阵,不能直接用于有向图;模型训练依赖于整个图结构,不能用于动态图;卷积的时候没办法为邻居节点分配不同的权重。因此 2018 年图注意力网络 GAT (Graph Attention Network) 被提出,解决 GCN 存在的问题。

GAT的数学定义为:
x i ′ = α i , i Θ x i + ∑ j ∈ N ( i ) α i , j Θ x j , \mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} +\sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j}, xi=αi,iΘxi+jN(i)αi,jΘxj,
其中注意力系数 α i , j \alpha_{i,j} αi,j的计算方法为:
α i , j = exp ⁡ ( L e a k y R e L U ( a ⊤ [ Θ x i   ∥   Θ x j ] ) ) ∑ k ∈ N ( i ) ∪ { i } exp ⁡ ( L e a k y R e L U ( a ⊤ [ Θ x i   ∥   Θ x k ] ) ) . \alpha_{i,j} =\frac{\exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top}[\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j]\right)\right)}{\sum_{k \in \mathcal{N}(i) \cup \{ i \}}\exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top}[\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k]\right)\right)}. αi,j=kN(i){i}exp(LeakyReLU(a[ΘxiΘxk]))exp(LeakyReLU(a[ΘxiΘxj])).

5.2 GATConv 模块说明

GAT官方文档:https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.conv.GATConv

GATConv构造函数接口:

GATConv(in_channels: Union[int, Tuple[int, int]], 
		out_channels: int, heads: int = 1, concat: bool = True, 
		negative_slope: float = 0.2, dropout: float = 0.0, 
		add_self_loops: bool = True, bias: bool = True, **kwargs)

in_channels:输入数据维度;
out_channels:输出数据维度;
heads:在GATConv使用多少个注意力模型(Number of multi-head-attentions);
concat:如为true,不同注意力模型得到的节点表征被拼接到一起(表征维度翻倍),否则对不同注意力模型得到的节点表征求均值;

5.3 GAT图神经网络的构造

将MLP神经网络例子中的torch.nn.Linear替换成torch_geometric.nn.GATConv,来实现GAT图神经网络的构造

import torch
from torch.nn import Linear
import torch.nn.functional as F

from torch_geometric.nn import GATConv

class GAT(torch.nn.Module):
    def __init__(self, hidden_channels):
        super(GAT, self).__init__()
        os.environ["CUDA_VISIBLE_DEVICES"] = "0"
        torch.manual_seed(12345)
        self.conv1 = GATConv(dataset.num_features, hidden_channels)
        self.conv2 = GATConv(hidden_channels, dataset.num_classes)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index)
        return x

5.4 GAT图神经网络的训练

model = GAT(hidden_channels=16)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()

def train():
    os.environ["CUDA_VISIBLE_DEVICES"] = "0"
    model.train()
    optimizer.zero_grad()  # Clear gradients.
    out = model(data.x, data.edge_index)  # Perform a single forward pass.
    loss = criterion(out[data.train_mask], data.y[data.train_mask])  # Compute the loss solely based on the training nodes.
    loss.backward()  # Derive gradients.
    optimizer.step()  # Update parameters based on gradients.
    return loss

for epoch in range(1, 201):
    loss = train()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')

5.5 GAT图神经网络的测试

def test():
    model.eval()
    out = model(data.x, data.edge_index)
    pred = out.argmax(dim=1)  # Use the class with highest probability.
    test_correct = pred[data.test_mask] == data.y[data.test_mask]  # Check against ground-truth labels.
    test_acc = int(test_correct.sum()) / int(data.test_mask.sum())  # Derive ratio of correct predictions.
    return test_acc

test_acc = test()
print(f'Test Accuracy: {test_acc:.4f}')

在这里插入图片描述
将训练后的数据进行可视化:

data1 = data.cpu()
model1 = model.cpu()
out = model1(data1.x, data1.edge_index)
visualize(out, color=data.y)

在这里插入图片描述

6. MLP、GCN、GAT在节点表征学习能力上的差异

在节点表征的学习中,MLP神经网络只考虑了节点自身属性,忽略了节点之间的连接关系,它的结果是最差的;而GCN图神经网络与GAT图神经网络,同时考虑了节点自身信息与周围邻接节点的信息,因此它们的结果都优于MLP神经网络。也就是说,对周围邻接节点的信息的考虑,是图神经网络由于普通深度神经网络的原因。

7. 使用CiteSeer数据集和GAT网络进行训练

#获取数据集
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures
import torch

dataset = Planetoid(root='dataset', name='CiteSeer', transform=NormalizeFeatures())
data = dataset[0]  # Get the first graph object.

model = GAT(hidden_channels=16)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()

#GAT网络训练
def train():
    os.environ["CUDA_VISIBLE_DEVICES"] = "0"
    model.train()
    optimizer.zero_grad()  # Clear gradients.
    out = model(data.x, data.edge_index)  # Perform a single forward pass.
    loss = criterion(out[data.train_mask], data.y[data.train_mask])  # Compute the loss solely based on the training nodes.
    loss.backward()  # Derive gradients.
    optimizer.step()  # Update parameters based on gradients.
    return loss

for epoch in range(1, 201):
    loss = train()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')

#效果测试
def test():
    model.eval()
    out = model(data.x, data.edge_index)
    pred = out.argmax(dim=1)  # Use the class with highest probability.
    test_correct = pred[data.test_mask] == data.y[data.test_mask]  # Check against ground-truth labels.
    test_acc = int(test_correct.sum()) / int(data.test_mask.sum())  # Derive ratio of correct predictions.
    return test_acc

test_acc = test()
print(f'Test Accuracy: {test_acc:.4f}')

得分
在这里插入图片描述

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值