Task02:消息传递图神经网络
一、MPNN
消息传递神经网络(Message Passing Neural Network,MPNN)首次提出于《Neural Message Passing for Quantum Chemistry》,在量子化学性质预测的任务中取得了不错的成绩,相关介绍可以见阿泽的学习笔记写的一篇文章。
MPNN框架:
MPNN主要包括两个阶段:消息传递阶段(Message Passing)、读出阶段(Readout),这里介绍消息传递阶段。
x
i
(
k
)
=
γ
(
k
)
(
x
i
(
k
−
1
)
,
□
j
∈
N
(
i
)
ϕ
(
k
)
(
x
i
(
k
−
1
)
,
x
j
(
k
−
1
)
,
e
j
,
i
)
)
,
\mathbf{x}_i^{(k)} = \gamma^{(k)} \left( \mathbf{x}_i^{(k-1)}, \square_{j \in \mathcal{N}(i)} \, \phi^{(k)}\left(\mathbf{x}_i^{(k-1)}, \mathbf{x}_j^{(k-1)},\mathbf{e}_{j,i}\right) \right),
xi(k)=γ(k)(xi(k−1),□j∈N(i)ϕ(k)(xi(k−1),xj(k−1),ej,i)),
-
消息的创建( ϕ \phi ϕ)
-
消息的聚合( □ \square □)
-
节点的更新( γ \gamma γ)
以下内容可详细阅读datawhale开源项目。
二、MessagePassing
基类分析
Pytorch Geometric(PyG)提供了MessagePassing
基类,它封装了“消息传递”的运行流程。通过继承MessagePassing
基类,可以方便地构造消息传递图神经网络。构造一个最简单的消息传递图神经网络类,我们只需定义**message()
方法(
ϕ
\phi
ϕ)、update()
方法(
γ
\gamma
γ),以及使用的消息聚合方案**(aggr="add"
、aggr="mean"
或aggr="max"
)。这一切是在以下方法的帮助下完成的:
MessagePassing(aggr="add", flow="source_to_target", node_dim=-2)
(对象初始化方法)MessagePassing.propagate(edge_index, size=None, **kwargs)
MessagePassing.message(...)
MessagePassing.aggregate(...)
MessagePassing.message_and_aggregate(...)
MessagePassing.update(aggr_out, ...)
在propagate()
方法中message()
(或message_and_aggregate()
)、update()
等方法被调用。propagate()
方法首先检查edge_index
是否为SparseTensor
类型以及是否子类实现了message_and_aggregate()
方法,如是就执行子类的message_and_aggregate
方法和update()
方法;否则依次执行子类的message(),aggregate(),update()
三个方法。
不同的图神经网络可以通过覆写message
方法、aggregate
方法的覆写、message_and_aggregate
方法、update
方法来实现。
三、MessagePassing
子类实例
我们以继承MessagePassing
基类的GCNConv
类为例,学习如何通过继承MessagePassing
基类来实现一个简单的图神经网络,可配合该文章食用。
GCNConv
的数学定义为
x
i
(
k
)
=
∑
j
∈
N
(
i
)
∪
{
i
}
1
deg
(
i
)
⋅
deg
(
j
)
⋅
(
Θ
⋅
x
j
(
k
−
1
)
)
,
\mathbf{x}_i^{(k)} = \sum_{j \in \mathcal{N}(i) \cup \{ i \}} \frac{1}{\sqrt{\deg(i)} \cdot \sqrt{\deg(j)}} \cdot \left( \mathbf{\Theta} \cdot \mathbf{x}_j^{(k-1)} \right),
xi(k)=j∈N(i)∪{i}∑deg(i)⋅deg(j)1⋅(Θ⋅xj(k−1)),
其中,邻接节点的表征
x
j
(
k
−
1
)
\mathbf{x}_j^{(k-1)}
xj(k−1)首先通过与权重矩阵
Θ
\mathbf{\Theta}
Θ相乘进行变换,然后按端点的度
deg
(
i
)
,
deg
(
j
)
\deg(i), \deg(j)
deg(i),deg(j)进行归一化处理,最后进行求和。这个公式可以分为以下几个步骤:
- 向邻接矩阵添加自环边。
- 对节点表征做线性转换。
- 计算归一化系数。
- 归一化邻接节点的节点表征。
- 将相邻节点表征相加("求和 "聚合)。
步骤1-3通常是在消息传递发生之前计算的。步骤4-5可以使用MessagePassing
基类轻松处理。该层的全部实现如下所示。
import torch
from torch_geometric.nn import MessagePassing
from torch_geometric.utils import add_self_loops, degree
class GCNConv(MessagePassing):
def __init__(self, in_channels, out_channels):
super(GCNConv, self).__init__(aggr='add', flow='source_to_target')
# "Add" aggregation (Step 5).
# flow='source_to_target' 表示消息从源节点传播到目标节点
self.lin = torch.nn.Linear(in_channels, out_channels)
def forward(self, x, edge_index):
# x has shape [N, in_channels]
# edge_index has shape [2, E]
# Step 1: Add self-loops to the adjacency matrix.
edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))
# Step 2: Linearly transform node feature matrix.
x = self.lin(x)
# Step 3: Compute normalization.
row, col = edge_index
deg = degree(col, x.size(0), dtype=x.dtype)
deg_inv_sqrt = deg.pow(-0.5)
norm = deg_inv_sqrt[row] * deg_inv_sqrt[col]
# Step 4-5: Start propagating messages.
return self.propagate(edge_index, x=x, norm=norm)
def message(self, x_j, norm):
# x_j has shape [E, out_channels]
# Step 4: Normalize node features.
return norm.view(-1, 1) * x_j
GCNConv
继承了MessagePassing
并以"求和"作为领域节点信息聚合方式。该层的所有逻辑都发生在其forward()
方法中。在这里,我们首先使用torch_geometric.utils.add_self_loops()
函数向我们的边索引添加自循环边(步骤1),以及通过调用torch.nn.Linear
实例对节点表征进行线性变换(步骤2)。propagate()
方法也在forward
方法中被调用,propagate()
方法被调用后节点间的信息传递开始执行。
归一化系数是由每个节点的节点度得出的,它被转换为每条边的节点度。结果被保存在形状为[num_edges,]
的变量norm
中(步骤3)。
在message()
方法中,我们需要通过norm
对邻接节点表征x_j
进行归一化处理。
通过以上内容的学习,我们便掌握了创建一个仅包含一次“消息传递过程”的图神经网络的方法。如下方代码所示,我们可以很方便地初始化和调用它:
from torch_geometric.datasets import Planetoid
dataset = Planetoid(root='dataset/Cora', name='Cora')
data = dataset[0]
net = GCNConv(data.num_features, 64)
h_nodes = net(data.x, data.edge_index)
print(h_nodes.shape)
通过串联多个这样的简单图神经网络,我们就可以构造复杂的图神经网络模型。
四、作业
- 请总结
MessagePassing
基类的运行流程。 - 请复现一个一层的图神经网络的构造,总结通过继承
MessagePassing
基类来构造自己的图神经网络类的规范。
答:
- 通过
MessagePassing()
或者MessagePassing.__init__()
进行初始化:aggr
:定义要使用的聚合方案(“add”、"mean "或 “max”);flow
:定义消息传递的流向("source_to_target "或 “target_to_source”);node_dim
:定义沿着哪个维度传播,默认值为-2
,也就是节点表征张量(Tensor)的哪一个维度是节点维度。- ……
- 实现
MessagePassing.propagate
,该方法的源码如下:
def propagate(self, edge_index: Adj, size: Size = None, **kwargs):
size = self.__check_input__(edge_index, size)
# Run "fused" message and aggregation (if applicable).
if (isinstance(edge_index, SparseTensor) and self.fuse
and not self.__explain__):
coll_dict = self.__collect__(self.__fused_user_args__, edge_index,
size, kwargs)
msg_aggr_kwargs = self.inspector.distribute(
'message_and_aggregate', coll_dict)
out = self.message_and_aggregate(edge_index, **msg_aggr_kwargs)
update_kwargs = self.inspector.distribute('update', coll_dict)
return self.update(out, **update_kwargs)
# Otherwise, run both functions in separation.
elif isinstance(edge_index, Tensor) or not self.fuse:
coll_dict = self.__collect__(self.__user_args__, edge_index, size,
kwargs)
msg_kwargs = self.inspector.distribute('message', coll_dict)
out = self.message(**msg_kwargs)
# For `GNNExplainer`, we require a separate message and aggregate
# procedure since this allows us to inject the `edge_mask` into the
# message passing computation scheme.
if self.__explain__:
edge_mask = self.__edge_mask__.sigmoid()
# Some ops add self-loops to `edge_index`. We need to do the
# same for `edge_mask` (but do not train those).
if out.size(self.node_dim) != edge_mask.size(0):
loop = edge_mask.new_ones(size[0])
edge_mask = torch.cat([edge_mask, loop], dim=0)
assert out.size(self.node_dim) == edge_mask.size(0)
out = out * edge_mask.view([-1] + [1] * (out.dim() - 1))
aggr_kwargs = self.inspector.distribute('aggregate', coll_dict)
out = self.aggregate(out, **aggr_kwargs)
update_kwargs = self.inspector.distribute('update', coll_dict)
return self.update(out, **update_kwargs)
- 即
propagate()
方法首先检查edge_index
是否为SparseTensor
类型以及是否子类实现了message_and_aggregate()
方法,如是就执行子类的message_and_aggregate
方法和update()
方法;否则依次执行子类的message(),aggregate(),update()
三个方法。
- 通过继承
MessagePassing
基类实现GATConv
:
class GATConv(MessagePassing):
def __init__(self, in_channels: Union[int, Tuple[int, int]],
out_channels: int, heads: int = 1, concat: bool = True,
negative_slope: float = 0.2, dropout: float = 0.0,
add_self_loops: bool = True, bias: bool = True, **kwargs):
kwargs.setdefault('aggr', 'add')
super(GATConv, self).__init__(node_dim=0, **kwargs)
self.in_channels = in_channels
self.out_channels = out_channels
self.heads = heads
self.concat = concat
self.negative_slope = negative_slope
self.dropout = dropout
self.add_self_loops = add_self_loops
if isinstance(in_channels, int):
self.lin_l = Linear(in_channels, heads * out_channels, bias=False)
self.lin_r = self.lin_l
else:
self.lin_l = Linear(in_channels[0], heads * out_channels, False)
self.lin_r = Linear(in_channels[1], heads * out_channels, False)
self.att_l = Parameter(torch.Tensor(1, heads, out_channels))
self.att_r = Parameter(torch.Tensor(1, heads, out_channels))
if bias and concat:
self.bias = Parameter(torch.Tensor(heads * out_channels))
elif bias and not concat:
self.bias = Parameter(torch.Tensor(out_channels))
else:
self.register_parameter('bias', None)
self._alpha = None
self.reset_parameters()
def forward(self, x: Union[Tensor, OptPairTensor], edge_index: Adj,
size: Size = None, return_attention_weights=None):
H, C = self.heads, self.out_channels
x_l: OptTensor = None
x_r: OptTensor = None
alpha_l: OptTensor = None
alpha_r: OptTensor = None
if isinstance(x, Tensor):
assert x.dim() == 2, 'Static graphs not supported in `GATConv`.'
x_l = x_r = self.lin_l(x).view(-1, H, C)
alpha_l = (x_l * self.att_l).sum(dim=-1)
alpha_r = (x_r * self.att_r).sum(dim=-1)
else:
x_l, x_r = x[0], x[1]
assert x[0].dim() == 2, 'Static graphs not supported in `GATConv`.'
x_l = self.lin_l(x_l).view(-1, H, C)
alpha_l = (x_l * self.att_l).sum(dim=-1)
if x_r is not None:
x_r = self.lin_r(x_r).view(-1, H, C)
alpha_r = (x_r * self.att_r).sum(dim=-1)
assert x_l is not None
assert alpha_l is not None
if self.add_self_loops:
if isinstance(edge_index, Tensor):
num_nodes = x_l.size(0)
if x_r is not None:
num_nodes = min(num_nodes, x_r.size(0))
if size is not None:
num_nodes = min(size[0], size[1])
edge_index, _ = remove_self_loops(edge_index)
edge_index, _ = add_self_loops(edge_index, num_nodes=num_nodes)
elif isinstance(edge_index, SparseTensor):
edge_index = set_diag(edge_index)
# propagate_type: (x: OptPairTensor, alpha: OptPairTensor)
out = self.propagate(edge_index, x=(x_l, x_r),
alpha=(alpha_l, alpha_r), size=size)
alpha = self._alpha
self._alpha = None
if self.concat:
out = out.view(-1, self.heads * self.out_channels)
else:
out = out.mean(dim=1)
if self.bias is not None:
out += self.bias
if isinstance(return_attention_weights, bool):
assert alpha is not None
if isinstance(edge_index, Tensor):
return out, (edge_index, alpha)
elif isinstance(edge_index, SparseTensor):
return out, edge_index.set_value(alpha, layout='coo')
else:
return out
def message(self, x_j: Tensor, alpha_j: Tensor, alpha_i: OptTensor,
index: Tensor, ptr: OptTensor,
size_i: Optional[int]) -> Tensor:
alpha = alpha_j if alpha_i is None else alpha_j + alpha_i
alpha = F.leaky_relu(alpha, self.negative_slope)
alpha = softmax(alpha, index, ptr, size_i)
self._alpha = alpha
alpha = F.dropout(alpha, p=self.dropout, training=self.training)
return x_j * alpha.unsqueeze(-1)
def __repr__(self):
return '{}({}, {}, heads={})'.format(self.__class__.__name__,
self.in_channels,
self.out_channels, self.heads)
规范总结:
__init__
中定义相关参数;forward
中进行特征预处理以及self.propagate
的调用;- 覆写
message
方法、aggregate
方法(或message_and_aggregate
方法)、update
方法来实现消息传递和更新的逻辑。