pytorch geometric教程二 GCN源码详解+实战

每天都想躺平的大喵

已于 2022-06-15 11:11:53 修改

阅读量3.9k

点赞数 17

分类专栏： pytorch_geometric代码详解图深度学习文章标签：深度学习人工智能 pytorch

于 2021-11-18 17:06:58 首次发布

本文链接：https://blog.csdn.net/weixin_39925939/article/details/121338594

版权

pytorch_geometric代码详解同时被 2 个专栏收录

9 篇文章 62 订阅

订阅专栏

图深度学习

8 篇文章 4 订阅

订阅专栏

pytorch geometric教程二 GCN源码详解+实战

pytorch geometric教程二 GCN源码详解&实战
原理回顾
- 矩阵形式
- 点维度
GCN代码（GCNConv）
实战

pytorch geometric教程二 GCN源码详解&实战

这一篇是建立在你已经对pytorch geometric消息传递&跟新的原理有一定了解的基础上。如果没有的话，推荐先看这篇关于pytorch geometric消息传递&更新的博文（pytorch geometric教程一消息传递源码详解（MESSAGE PASSING）+实例）。

原理回顾

矩阵形式

先回顾一下GCN的原理，矩阵形式如下：
$\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}\mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta}$
其中 $\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}$ 是增加了自环的邻接矩阵， $\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}$ 是度对角矩阵。 $\mathbf{X}$ 是特征矩阵， $\mathbf{\Theta}$ 是参数矩阵。在pytorch geometric中，邻接矩阵可以通过可选择参数edge_weight来赋予边权重。

点维度

$\mathbf{x}^{\prime}_i = \mathbf{\Theta}^{\top} \sum_{j \in \mathcal{N}(v) \cup \{ i \}} \frac{e_{j,i}}{\sqrt{\hat{d}_j \hat{d}_i}} \mathbf{x}_j$
这是pytorch geometric代码注释中的数学公式。我们可以看到非线性变化不是在卷积层中实现的，需要我们后期自己加上。其中 $e_{j,i}$ 定义了从source点j 到target点i的边权重。点维度理解GCN就是，将邻居（包括自己）对应的特征进行一个权重叠加，并进行一个维度变换。

GCN代码（GCNConv）

init


import torch
from torch import Tensor
from torch.nn import Parameter
from torch_geometric.nn.dense.linear import Linear
from torch_geometric.nn.conv import MessagePassing

class GCNConv(MessagePassing):    
    def __init__(self, in_channels: int, out_channels: int,
                 improved: bool = False, cached: bool = False,
                 add_self_loops: bool = True, normalize: bool = True,
                 bias: bool = True, **kwargs):

        kwargs.setdefault('aggr', 'add')  # 'add' can be replaced with 'mean', 'max'
        super(GCNConv, self).__init__(**kwargs)

        self.in_channels = in_channels
        self.out_channels = out_channels
        self.improved = improved
        self.cached = cached
        self.add_self_loops = add_self_loops
        self.normalize = normalize

        self._cached_edge_index = None
        self._cached_adj_t = None

        self.lin = Linear(in_channels, out_channels, bias=False,
                          weight_initializer='glorot')

        if bias:
            self.bias = Parameter(torch.Tensor(out_channels))
        else:
            self.register_parameter('bias', None)

        self.reset_parameters()

    def reset_parameters(self):
        self.lin.reset_parameters()
        zeros(self.bias)
        self._cached_edge_index = None
        self._cached_adj_t = None

邻域聚合方式

kwargs.setdefault('aggr', 'add')检查关键字参数中是否定义了邻域聚合方式，也就是是否包含名为aggr的key。如果没有的话，采用默认的add聚合方式，也就是邻居特征求和（因为GCN对邻接矩阵进行了归一化，所以这里虽然是add，但实现的效果等同于于带权平均）。在我们定义model的时候，我们可以通过参数aggr = add, mean, max来选择邻域聚合方式。

参数含义

下面具体解释各个参数的含义：

in_channels：输入原始特征或者隐含层embedding的维度
out_channels：输出embedding的维度
improved: 默认是False, 如果是True的话，则 $\mathbf{\hat{A}} = \mathbf{A} + 2\mathbf{I}$ ，增强了自身的权重。
cached: 默认是False，如果是True的话，第一次执行就会缓存 $\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}\mathbf{\hat{D}}^{-1/2}$ 的计算结果，且在后期调用它。这个参数只应该在transductive，邻接矩阵不变的情况下才可设置为True.
add_self_loops: 默认是True，如果是False的话，则邻接矩阵不会加上自环。
normalize: 默认是True，给邻接矩阵加上自环并且对称归一化邻接矩阵。
bias：默认是True，如果是False的话，layer中没有bias项。

这里定义特征的线性变换self.lin时，使用的是 torch_geometric.nn.dense.linear.Linear，它类似于torch.nn.Linear，不过额外加上了weight和bias的初始化方式。 torch_geometric.nn.dense.linear.Linear中weight的默认初始化方式是glorot，bias的默认初始化方式是zeros。这里使用Linear的时候，将Linear自身的bias设为False，但是额外给GCNConv layer设置了一个bias。所以在reset_parameters的时候，不但需要reset self.lin的参数，还需要reset GCNConv layer的bias。

forward

下面我列出了forward函数的代码

    def forward(self, x: Tensor, edge_index: Adj,
                edge_weight: OptTensor = None) -> Tensor:
        """"""

        if self.normalize:
            if isinstance(edge_index, Tensor):
                cache = self._cached_edge_index
                if cache is None:
                    edge_index, edge_weight = gcn_norm(  # yapf: disable
                        edge_index, edge_weight, x.size(self.node_dim),
                        self.improved, self.add_self_loops)
                    if self.cached:
                        self._cached_edge_index = (edge_index, edge_weight)
                else:
                    edge_index, edge_weight = cache[0], cache[1]

            elif isinstance(edge_index, SparseTensor):
                cache = self._cached_adj_t
                if cache is None:
                    edge_index = gcn_norm(  # yapf: disable
                        edge_index, edge_weight, x.size(self.node_dim),
                        self.improved, self.add_self_loops)
                    if self.cached:
                        self._cached_adj_t = edge_index
                else:
                    edge_index = cache

        x = self.lin(x)

        # propagate_type: (x: Tensor, edge_weight: OptTensor)
        out = self.propagate(edge_index, x=x, edge_weight=edge_weight,
                             size=None)

        if self.bias is not None:
            out += self.bias

        return out

参数

x: 所有节点的特征或者隐含层的embedding
edge_index：边信息，这里可以是(2, N_edges)的Tensor，也可以是(N_nodes, N_nodes)的SparseTensor
edge_weight: 可选参数，如果不是空的话，邻接矩阵是带权重的。

forward主体

我们看到forward函数做了以下几件事情：

normalize邻接矩阵（如果normalize为True的话）。
这其中对edge_index为Tensor和SparseTensor两种情况分别处理。另外如果cache为True，则获取之前缓存的normalized的edge_index或adj_t (SparseTensor的edge_index会写作adj_t)。如果cache为False，则重新调用gcn_norm函数。
self.lin实现特征线性变换，也就是公式中的 $\mathbf{X} \mathbf{\Theta}$ 。
对第2步中得到的结果调用propagate 函数。
propagate我们前文提过，edge_index为Tensor的时候，会调用message和aggregate实现消息传递和更新。edge_index为SparseTensor的时候，则会在message_and_aggregate被定义的情况下优先调用message_and_aggregate。
跟新后的结果上加上bias。

gcn_norm函数这里就不细写了。

消息传递

这里详解一下GCN中的message函数。

    def message(self, x_j: Tensor, edge_weight: OptTensor) -> Tensor:
        return x_j if edge_weight is None else edge_weight.view(-1, 1) * x_j

    def message_and_aggregate(self, adj_t: SparseTensor, x: Tensor) -> Tensor:
        return matmul(adj_t, x, reduce=self.aggr)

一，edge_index为Tensor

这里不明白的小伙伴可以先看这篇博文（pytorch geometric 消息传递原理详解（MESSAGE PASSING）+实例）
edge_index为Tensor的时候，propagate调用message和aggregate实现消息传递和更新。
我们搞懂以下几个维度：

edge_index的shape是(2, N_edges)
邻居特征x_j的shape是(N_edges， N_features)
x_j是将x scatter到edge_index的第一个元素上，所以shape变为(N_edges， N_features)。
edge_weight.view(-1, 1) 的shape是(N_edges， 1)
所以可以进行edge_weight.view(-1, 1) * x_j，等同于根据每条边的权重对每个邻居加上了相应的权重。
message得到的结果维度是(N_edges， N_features)，会在aggregate函数中用pytorch的scatter，将message聚合到对应边中的target点
propagate输出的结果维度为(N_nodes，N_features)

二，edge_index为SparseTensor

edge_index为SparseTensor的时候，直接调用类似矩阵计算matmul(adj_t, x, reduce=self.aggr)。这里的matmul来自于torch_sparse，除了类似常规的矩阵相乘外，还给出了可选的reduce，所以除了add，mean和max也是可以在这里实现的。

实战

定义模型

pytorch geometric的卷积层调用还是挺简单的，下面是一个两层的GCN。

import torch
import torch.nn.functional as F
from torch_geometric.nn.conv import GCNConv

class GCN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, dropout=0.):
        super(GCN, self).__init__()
        
        self.convs = torch.nn.ModuleList()
        self.convs.append(GCNConv(in_channels, hidden_channels))
        self.convs.append(GCNConv(hidden_channels, out_channels))
        
        self.dropout = dropout
        
    def reset_parameters():
        for conv in self.convs:
            conv.reset_parameters()
            
    def forward(self, x, edge_index):
        x = self.convs[0](x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, p=self.dropout, training=self.training)
        x = self.convs[1](x, edge_index)
        
        return x.log_softmax(dim=-1)

模型调用

接下来，我们用Cora数据集尝试一下。

#读取数据
from torch_geometric.datasets import Planetoid
import torch_geometric.transforms as T

transform = T.ToSparseTensor()
# 这里加上了ToSparseTensor()，所以边信息是以adj_t形式存储的，如果没有这个变换，则是edge_index
dataset = Planetoid(name='Cora', root=r'./dataset/Cora', transform=transform)
data = dataset[0]
data.adj_t = data.adj_t.to_symmetric()

model = GCN(in_channels=dataset.num_features, hidden_channels=128, out_channels=dataset.num_classes)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

def train():
    model.train()
    
    optimizer.zero_grad()
    out = model(data.x, data.adj_t)[data.train_mask] #前面我们提到了，GCN是实现了edge_index和adj_t两种形式的
    loss = F.nll_loss(out, data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    
    return loss.item()

@torch.no_grad()
def test():
    model.eval()
    
    out = model(data.x, data.adj_t)
    y_pred = out.argmax(axis=-1)
    
    correct = y_pred == data.y
    train_acc = correct[data.train_mask].sum().float()/data.train_mask.sum()
    valid_acc = correct[data.val_mask].sum().float()/data.val_mask.sum()
    test_acc = correct[data.test_mask].sum().float()/data.test_mask.sum()
    
    return train_acc, valid_acc, test_acc 

#跑10个epoch看一下模型效果
for epoch in range(20):
    loss = train()
    train_acc, valid_acc, test_acc = test()
    print(f'Epoch: {epoch:02d}, '
                              f'Loss: {loss:.4f}, '
                              f'Train_acc: {100 * train_acc:.3f}%, '
                              f'Valid_acc: {100 * valid_acc:.3f}% '
                              f'Test_acc: {100 * test_acc:.3f}%')

Epoch: 00, Loss: 1.9558, Train_acc: 29.286%, Valid_acc: 21.200% Test_acc: 22.000%
Epoch: 01, Loss: 1.9081, Train_acc: 54.286%, Valid_acc: 32.600% Test_acc: 35.300%
Epoch: 02, Loss: 1.8619, Train_acc: 73.571%, Valid_acc: 44.000% Test_acc: 45.800%
Epoch: 03, Loss: 1.8163, Train_acc: 84.286%, Valid_acc: 51.400% Test_acc: 52.900%
Epoch: 04, Loss: 1.7703, Train_acc: 88.571%, Valid_acc: 55.800% Test_acc: 58.400%
Epoch: 05, Loss: 1.7235, Train_acc: 92.143%, Valid_acc: 60.800% Test_acc: 62.100%
Epoch: 06, Loss: 1.6756, Train_acc: 92.857%, Valid_acc: 63.400% Test_acc: 64.100%
Epoch: 07, Loss: 1.6265, Train_acc: 95.000%, Valid_acc: 65.400% Test_acc: 66.400%
Epoch: 08, Loss: 1.5761, Train_acc: 95.000%, Valid_acc: 66.200% Test_acc: 68.600%
Epoch: 09, Loss: 1.5245, Train_acc: 95.000%, Valid_acc: 67.800% Test_acc: 69.700%
Epoch: 10, Loss: 1.4717, Train_acc: 95.000%, Valid_acc: 69.000% Test_acc: 70.500%
Epoch: 11, Loss: 1.4179, Train_acc: 95.000%, Valid_acc: 70.000% Test_acc: 72.100%
Epoch: 12, Loss: 1.3634, Train_acc: 95.714%, Valid_acc: 71.400% Test_acc: 73.200%
Epoch: 13, Loss: 1.3086, Train_acc: 97.143%, Valid_acc: 72.000% Test_acc: 74.100%
Epoch: 14, Loss: 1.2536, Train_acc: 97.143%, Valid_acc: 72.800% Test_acc: 74.300%
Epoch: 15, Loss: 1.1987, Train_acc: 97.857%, Valid_acc: 73.400% Test_acc: 75.100%
Epoch: 16, Loss: 1.1442, Train_acc: 98.571%, Valid_acc: 73.600% Test_acc: 75.800%
Epoch: 17, Loss: 1.0905, Train_acc: 98.571%, Valid_acc: 74.400% Test_acc: 76.300%
Epoch: 18, Loss: 1.0377, Train_acc: 98.571%, Valid_acc: 75.200% Test_acc: 76.800%
Epoch: 19, Loss: 0.9861, Train_acc: 98.571%, Valid_acc: 75.800% Test_acc: 77.400%

这样我们一个GCN模型就初步完成啦！我们看到，在经过10个epoch后，test集的acc就达到了77.4%。
欢迎评论交流，转载请注明出处哦！

实战代码

https://github.com/DGraphXinye/2022_finvcup_baseline
这是我为我们公司比赛准备的baseline代码，里面包含了基本的图算法以及相应的mini-batch版本。

每天都想躺平的大喵

关注

17
点赞
踩
24

收藏

觉得还不错? 一键收藏
9
评论
pytorch geometric教程二 GCN源码详解+实战

pytorch geometric教程二 GCN代码详解+实战pytorch geometric教程二 GCN代码详解&实战原理回顾矩阵形式点维度GCN代码（GCNConv）__init__邻域聚合方式参数含义forward参数forward主体消息传递一，edge_index为Tensor二，edge_index为SparseTensor实战pytorch geometric教程二 GCN代码详解&实战这一篇是建立在你已经对pytorch geometric消息传递&跟新的原理
复制链接

扫一扫