GNN动手实践(三)：适用于同配图和异配图的高效图神经网络——H2GCN

斯曦巍峨

已于 2022-08-09 10:50:15 修改

阅读量3k

点赞数 10

分类专栏：深度学习实战文章标签：深度学习图神经网络 python

于 2022-08-09 10:48:14 首次发布

本文链接：https://blog.csdn.net/qq_42103091/article/details/126242752

版权

深度学习实战专栏收录该内容

13 篇文章

订阅专栏

一.前言

$\text{H}_{2}\text{GCN}$ 是NeurIPS 2020上发表的论文《Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs》所提出来的一个同时适用于同配图和异配图的GNN模型。该模型官方也开源了相应的源码（Github），但是作者是基于Tensorflow实现的，为此，本文基于Pytorch+PyG来对该模型进行复现。

论文解读可以参考：《Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs》阅读笔记

二.H2GCN理论基础

2.1 $i -$ hop邻居

在该论文中作者提出了 $i -$ hop邻居的概念，即 $\overline{N}_{i}(v)$ ，其指的是到节点 $v$ 最短路径为 $i$ 的节点的集合。例如便展示了 $\overline{N}_{i}(v)$ （黄色节点）和 $\overline{N}_{2}(v)$ （紫色节点）。

neighborhoods

那么如何计算图中所有节点的 $2 -$ hop邻居呢？作者在论文中也给出了计算公式：
$\begin{aligned} &\mathbf{A}_{0} \leftarrow \mathbf{I}_{n} \quad / * \mathbf{I}_{n} \text { is the } n \times n \text { identity matrix } * / \\ &\overline{\mathbf{A}}_{1} \leftarrow \mathbb{I}\left[\mathbf{A}-\mathbf{I}_{n}>0\right] \quad / * \mathbb{I} \text { is a element-wise indicator function for matrix } * / \\ &\overline{\mathbf{A}}_{2} \leftarrow \mathbb{I}\left[\mathbf{A}^{2}-\mathbf{A}-\mathbf{I}_{n}>0\right] ; \end{aligned}$
对于大图来说，如果是密集（dense）邻接矩阵，那 $\mathbf{A}^2$ 的计算代价的非常高的。为此，本文采用稀疏矩阵乘法的形式来计算 $\mathbf{A}^2$ ，而PyG中的SparseTensor刚好支持稀疏矩阵间的乘法。给定graph如下：

org_graph

下面是对其求图的2hop矩阵的示例代码：

import torch
from torch_sparse import SparseTensor
from torch_geometric.utils import to_undirected
import scipy.sparse as sp


def toCSR(spt):
    rowptr, col, value = spt.csr()
    mat = sp.csr_matrix((value, col, rowptr)).tolil()
    mat.setdiag(0)
    return mat.tocsr()


edge_index = torch.LongTensor([[0, 0, 0, 1, 1, 1, 2, 3],
                               [1, 2, 3, 2, 3, 5, 6, 4]])
edge_index = to_undirected(edge_index)
adj = SparseTensor(row=edge_index[0], col=edge_index[1],
                   sparse_sizes=(7, 7)).fill_value(1.0)
adj2 = adj.matmul(adj).fill_value(1.0)
adj_2hop = (toCSR(adj2) - toCSR(adj)) > 0
adj_2hop = SparseTensor.from_scipy(adj_2hop).fill_value(1.0)
print(adj_2hop.to_dense())
"""
tensor([[0., 0., 0., 0., 1., 1., 1.],
        [0., 0., 0., 0., 1., 0., 1.],
        [0., 0., 0., 1., 0., 1., 0.],
        [0., 0., 1., 0., 0., 1., 0.],
        [1., 1., 0., 0., 0., 0., 0.],
        [1., 0., 1., 1., 0., 0., 0.],
        [1., 1., 0., 0., 0., 0., 0.]])
"""

从最后输出的密集邻接矩阵可以验证上述求解的正确性，例如对于节点0，其2-hop邻居包括节点4、5、6。

2.2 框架详解

$\text{H}_{2}\text{GCN}$ 框架分为三个阶段：

阶段一：特征编码（feature embedding）。
阶段二：领域聚合，主要是聚合图节点的1-hop邻居和2-hop邻居特征（采用拼接效果最好），该阶段重复两轮。
阶段三：将第一阶段以及第二阶段的几轮得到的特征进行拼接，得到最终的节点特征，然后进行下游的分类任务。

framework

三.复现与实验

3.1 实验环境

本实验的环境如下所示：

Python: 3.7
Pytorch: 1.10.1
torch_geometric: 2.0.4
scipy: 1.7.3

3.2 模型复现

根据2.1节，下面给出的是根据图的邻接矩阵计算 $\mathbf{A}^{2}-\mathbf{A}-\mathbf{I}_{n}>0$ 的源码，以及对邻接矩阵进行正则化，即 $\mathbf{D}^{-1/2}\mathbf{A}\mathbf{D}^{-1/2}$ 的源码，注意操作的邻接矩阵数据存储都为SparseTensor格式：

from torch_sparse import SparseTensor, fill_diag, mul
from torch_sparse import sum as sparsesum
import scipy.sparse as sp


def toCSR(spt):
    if not spt.has_value():
        spt = spt.fill_value(1.)
    rowptr, col, value = spt.csr()
    mat = sp.csr_matrix((value, col, rowptr)).tolil()
    # remove self-loops
    mat.setdiag(0)
    return mat.tocsr()


def hopNeighborhood(adj):
    adj2 = adj.matmul(adj).fill_value(1.0)
    adj_2hop = (toCSR(adj2) - toCSR(adj)) > 0
    adj_2hop = SparseTensor.from_scipy(adj_2hop).fill_value(1.0)
    return adj2


def norm_adj(adj_t, add_self_loops=True):
    """
    normalization adj
    """
    if add_self_loops:
        adj_t = fill_diag(adj_t, 1.)
    deg = sparsesum(adj_t, dim=1)
    deg_inv_sqrt = deg.pow_(-0.5)
    deg_inv_sqrt.masked_fill_(deg_inv_sqrt == float('inf'), 0.)
    adj_t = mul(adj_t, deg_inv_sqrt.view(-1, 1))
    adj_t = mul(adj_t, deg_inv_sqrt.view(1, -1))
    return adj_t


if __name__ == "__main__":
    pass

根据第2.2节中的介绍，下面是 $\text{H}_{2}\text{GCN}$ 完整模型的源码：

import torch
import torch.nn as nn
import torch.nn.functional as F


class H2GCN(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, drop_prob,
                 round):
        super().__init__()
        self.round = round
        self.embed = nn.Sequential(nn.Linear(in_channels, hidden_channels),
                                   nn.ReLU())

        self.dropout = nn.Dropout(drop_prob)
        self.classification = nn.Linear(hidden_channels * (2**(round + 1) - 1),
                                        out_channels)

    def forward(self, x, adj, adj_2hop):
        hidden_reps = []
        x = self.embed(x)
        hidden_reps.append(x)
        for _ in range(self.round):
            r1 = adj.matmul(x)
            r2 = adj_2hop.matmul(x)
            x = torch.cat([r1, r2], dim=-1)
            hidden_reps.append(x)
        hf = self.dropout(torch.cat(hidden_reps, dim=-1))
        return F.log_softmax(self.classification(hf), dim=1)


if __name__ == "__main__":
    pass

3.2 实验及结果展示

本实验选取了原文中的三个同配率较高数据集Citeseer、Cora和PubMed和一个同配率较低的异配数据集Texas进行训练与测评。几个图数据集的统计特征如下所示：

数据集名称	节点数	边数	节点特征数	类别数	train/val/test划分
Cora	2,708	10,556	1,433	7	140/500/1000
Citeseer	3,327	9,104	3,703	6	120/500/1000
PubMed	19,717	88,648	500	3	60/500/1000
Texas	183	309	1703	5	87/59/37

作者在论文中将上述benchmarks中进行了10次随机划分，对于每个类别选取48%/32%/20%的节点分别作为训练集、验证集和测试集。限于时间原因，就直接按数据集默认的划分方式进行实验了，另外Texas数据集包含10组划分方式，本实验只取了第0组的划分结果进行实验。因此，实验结果应该会和作者配置的方式有出入。

在本实验中，选取了GCN作为baseline，实验中在训练集上进行模型的参数更新，然后用验证集来筛选最佳模型，最后在最佳模型上对测试集进行测评，实验过程中进行了小范围内的网格搜索调参，然后选取最佳的结果：

epochs: 2000
lr: 0.01, 0.001, 0.0001
drop_prob: 0, 0.6

# 对应的运行命令示例
python main.py --dataset cora --lr 0.001 --drop_prob 0.6 --model gcn
python main.py --dataset texas --lr 0.001 --drop_prob 0.6 --model h2gcn