图分类之Hierarchical Graph Differentiable Pooling （下）

ZhengXinTang

已于 2024-05-31 17:35:46 修改

阅读量848

点赞数 29

分类专栏： # 图神经网络文章标签：分类数据挖掘人工智能

于 2024-05-30 11:54:45 首次发布

本文链接：https://blog.csdn.net/chumingqian/article/details/139316535

版权

图神经网络专栏收录该内容

22 篇文章 1 订阅

订阅专栏

作者代码链接

https://github.com/RexYing/diffpool

1. paper中介绍的图池化机制

SoftPoolingGcnEncoder 是为图结构数据设计的神经网络模型。它通过结合分层池机制扩展了传统图卷积网络（GCN）的功能。这种池化机制通过逐步减少节点数量，同时保留图的整体结构，帮助网络处理大型且复杂的图。

SoftPoolingGcnEncoder 通过引入软池机制来增强 GCN，该机制允许在大型复杂图中进行分层表示学习。它结合了多层图卷积和池化来创建输入图的鲁棒、多尺度表示，使其适用于各种图分类和预测任务。链接预测正则化进一步增强了其在池化过程中保持图的结构完整性的能力。

Graph Convolutional Layers:图卷积层：
- 该模型从传统的 GCN 层开始，将图卷积应用于输入节点特征。
- 这些层通过考虑邻居的特征来转换节点特征，捕获局部结构信息。
- GraphConv 类用于定义这些层，这些层在邻接矩阵和节点特征之间执行矩阵乘法，然后进行权重变换和可选的归一化。
Pooling Mechanism:
- 该encoder 的核心创新在于其分层池化机制，该机制是使用分配矩阵assigment matrix实现的。
- This matrix is computed through additional GCN layers specifically designed for this purpose.该矩阵是通过专门为此目的设计的附加 GCN 层计算的。
- The assignment matrix determines how nodes in the current graph level are grouped together to form nodes in the next, coarser level of the graph.分配矩阵确定当前图级别中的节点如何分组在一起以形成图的下一个较粗级别中的节点。
- This process is repeated for a specified number of pooling layers (num_pooling).对于指定数量的池化层 ( num_pooling )，重复此过程。
  
  池化的实现，是在代码 for i in range(self.num_pooling): 中进行
Node Feature Transformation Post-Pooling:池化后的节点特征转换：
- 每次池化操作后，使用 GCN 层再次转换节点特征，以细化新形成的节点特征。
- conv_first_after_pool 、 conv_block_after_pool 和 conv_last_after_pool 列表包含每次池化操作后应用的 GCN 层。
Prediction Layers:
self.pre_model
- The final node embeddings from all levels are aggregated (concatenated or otherwise) to form a comprehensive representation.来自各个级别的最终节点嵌入被聚合（串联或以其他方式）以形成综合的表示。
- 这种聚合表示通过一系列全连接的层（ pred_model ）进行预测。
- 可以使用 pred_hidden_dims 中指定的隐藏维度来自定义预测层。
Link Prediction Side Objective:链接预测的侧面目标：
- An optional link prediction task can be included to regularize the training.可以包括可选的链接预测任务来正则化训练。
- This task involves predicting the adjacency matrix of the pooled graph, encouraging the learned embeddings to preserve the graph structure.该任务涉及预测池化图的邻接矩阵，鼓励学习的嵌入保留图结构。

考虑音频中，使用数据增强时，会使用time shift 操作，因此预测池化图的邻接矩阵，是否应该有效，因为移位操作，会导致原始的各个节点之间的顺序发生改变，所以此处需要考虑。

Loss Calculation:
- 主要损失是使用节点或图分类任务的交叉熵计算的。
- 如果启用了 link prediction 链路预测，则会添加一个额外的损失项，用于测量池化图的预测邻接矩阵与实际邻接矩阵之间的差异。

1.2 Code

以下是 SoftPoolingGcnEncoder 代码实现的概述：

class SoftPoolingGcnEncoder(GcnEncoderGraph):
    def __init__(self, max_num_nodes, input_dim, hidden_dim, embedding_dim, label_dim, num_layers,
                 assign_hidden_dim, assign_ratio=0.25, assign_num_layers=-1, num_pooling=1,
                 pred_hidden_dims=[50], concat=True, bn=True, dropout=0.0, linkpred=True,
                 assign_input_dim=-1, args=None):
        super(SoftPoolingGcnEncoder, self).__init__(input_dim, hidden_dim, embedding_dim, label_dim,
                                                    num_layers, pred_hidden_dims=pred_hidden_dims,
                                                    concat=concat, args=args)
        self.num_pooling = num_pooling
        self.linkpred = linkpred
        self.assign_ent = True

        # Define GCN layers for use after each pooling step
        self.conv_first_after_pool = nn.ModuleList()
        self.conv_block_after_pool = nn.ModuleList()
        self.conv_last_after_pool = nn.ModuleList()
        for i in range(num_pooling):
            conv_first2, conv_block2, conv_last2 = self.build_conv_layers(
                self.pred_input_dim, hidden_dim, embedding_dim, num_layers, 
                not concat, normalize=True, dropout=dropout)
            self.conv_first_after_pool.append(conv_first2)
            self.conv_block_after_pool.append(conv_block2)
            self.conv_last_after_pool.append(conv_last2)

        # Define layers for generating assignment matrices
        assign_dims = [int(max_num_nodes * assign_ratio)]
        if assign_num_layers == -1:
            assign_num_layers = num_layers
        if assign_input_dim == -1:
            assign_input_dim = input_dim

        self.assign_conv_first_modules = nn.ModuleList()
        self.assign_conv_block_modules = nn.ModuleList()
        self.assign_conv_last_modules = nn.ModuleList()
        self.assign_pred_modules = nn.ModuleList()

        for i in range(num_pooling):
            assign_conv_first, assign_conv_block, assign_conv_last = self.build_conv_layers(
                assign_input_dim, assign_hidden_dim, assign_dims[i], assign_num_layers,
                not concat, normalize=True)
            assign_pred = self.build_pred_layers(assign_hidden_dim * (num_layers - 1) + assign_dims[i], [], assign_dims[i])
            self.assign_conv_first_modules.append(assign_conv_first)
            self.assign_conv_block_modules.append(assign_conv_block)
            self.assign_conv_last_modules.append(assign_conv_last)
            self.assign_pred_modules.append(assign_pred)
            assign_input_dim = self.pred_input_dim
            assign_dims.append(int(assign_dims[-1] * assign_ratio))

        # Define the final prediction model
        self.pred_model = self.build_pred_layers(self.pred_input_dim * (num_pooling + 1), pred_hidden_dims, 
                                                 label_dim, num_aggs=self.num_aggs)

        # Initialize weights
        for m in self.modules():
            if isinstance(m, GraphConv):
                m.weight.data = init.xavier_uniform_(m.weight.data, gain=nn.init.calculate_gain('relu'))
                if m.bias is not None:
                    m.bias.data = init.constant_(m.bias.data, 0.0)

    def loss(self, pred, label, adj=None, batch_num_nodes=None, adj_hop=1):
        loss = super(SoftPoolingGcnEncoder, self).loss(pred, label)
        if self.linkpred:
            max_num_nodes = adj.size()[1]
            pred_adj0 = self.assign_tensor @ torch.transpose(self.assign_tensor, 1, 2)
            pred_adj = torch.min(pred_adj0 + torch.matrix_power(pred_adj0, adj_hop), torch.ones_like(pred_adj0))
            self.link_loss = -adj * torch.log(pred_adj + 1e-7) - (1 - adj) * torch.log(1 - pred_adj + 1e-7)
            if batch_num_nodes is not None:
                num_entries = np.sum(batch_num_nodes ** 2)
                embedding_mask = self.construct_mask(max_num_nodes, batch_num_nodes)
                adj_mask = embedding_mask @ torch.transpose(embedding_mask, 1, 2)
                self.link_loss[~adj_mask.bool()] = 0.0
            else:
                num_entries = max_num_nodes ** 2 * adj.size()[0]
            self.link_loss = torch.sum(self.link_loss) / num_entries
            return loss + self.link_loss
        return loss

1.3 类中属性，以及foward 步骤

介绍该类中的一些属性，以及forward函数中的运行机制：

1.3.1 Key Attributes

num_pooling: 要应用的池化层数，确定分层池化的深度。
linkpred: 一个布尔标志，指示是否使用链接预测侧目标。
conv_first_after_pool, conv_block_after_pool, conv_last_after_pool: 每次池化操作后应用的卷积层的模块列表。
assign_conv_first_modules, assign_conv_block_modules, assign_conv_last_modules: 用于生成池化分配矩阵的卷积层的模块列表。
assign_pred_modules: 生成分配矩阵的预测层的模块列表。
pred_model: ：结合 combines 池化特征以产生输出的最终预测模型。

1.3.2 `forward` 函数中的运行机制

        def forward(self, x, adj, batch_num_nodes, **kwargs):

        #输入分配：如果 kwargs 中提供了分配特征矩阵（ assign_x ），则使用它；
        # 否则，使用输入特征 x 。
        if 'assign_x' in kwargs:   # (bt, max_nodes, fea_dim)
            x_a = kwargs['assign_x']
        else:
            x_a = x

        # mask，掩码构造：如果提供 batch_num_nodes ，则构造掩码以处理批次内的可变大小的图形。
        max_num_nodes = adj.size()[1]
        if batch_num_nodes is not None:
            embedding_mask = self.construct_mask(max_num_nodes, batch_num_nodes)
        else:
            embedding_mask = None

        out_all = []

        #self.assign_tensor = self.gcn_forward(x_a, adj, 
        #        self.assign_conv_first_modules[0], self.assign_conv_block_modules[0], self.assign_conv_last_modules[0],
        #        embedding_mask)
        ## [batch_size x num_nodes x next_lvl_num_nodes]
        #self.assign_tensor = nn.Softmax(dim=-1)(self.assign_pred(self.assign_tensor))
        #if embedding_mask is not None:
        #    self.assign_tensor = self.assign_tensor * embedding_mask
        # [batch_size x num_nodes x embedding_dim]


        # 输入特征 x 通过初始 GCN 层以生成初始节点嵌入（ embedding_tensor ）。
        # 初始嵌入被聚合（例如，使用最大池化）并存储在 out_all 中。
        embedding_tensor = self.gcn_forward(x, adj,
                self.conv_first, self.conv_block, self.conv_last, embedding_mask)

        out, _ = torch.max(embedding_tensor, dim=1)
        out_all.append(out)


        if self.num_aggs == 2:
            out = torch.sum(embedding_tensor, dim=1)
            out_all.append(out)

        for i in range(self.num_pooling):
            if batch_num_nodes is not None and i == 0:
                embedding_mask = self.construct_mask(max_num_nodes, batch_num_nodes)
            else:
                embedding_mask = None

            self.assign_tensor = self.gcn_forward(x_a, adj, 
                    self.assign_conv_first_modules[i], self.assign_conv_block_modules[i], self.assign_conv_last_modules[i],
                    embedding_mask)
            # [batch_size x num_nodes x next_lvl_num_nodes]
            self.assign_tensor = nn.Softmax(dim=-1)(self.assign_pred_modules[i](self.assign_tensor))
            if embedding_mask is not None:
                self.assign_tensor = self.assign_tensor * embedding_mask

            # update pooled features and adj matrix
            x = torch.matmul(torch.transpose(self.assign_tensor, 1, 2), embedding_tensor)
            adj = torch.transpose(self.assign_tensor, 1, 2) @ adj @ self.assign_tensor
            x_a = x
        
            embedding_tensor = self.gcn_forward(x, adj, 
                    self.conv_first_after_pool[i], self.conv_block_after_pool[i],
                    self.conv_last_after_pool[i])


            out, _ = torch.max(embedding_tensor, dim=1)
            out_all.append(out)
            if self.num_aggs == 2:
                #out = torch.mean(embedding_tensor, dim=1)
                out = torch.sum(embedding_tensor, dim=1)
                out_all.append(out)


        if self.concat:
            output = torch.cat(out_all, dim=1)
        else:
            output = out
        ypred = self.pred_model(output)
        return ypred

forward 函数定义 SoftPoolingGcnEncoder 模型的前向传递，详细说明输入数据如何流经模型。

这是一步一步的解释：

Initial Setup:
- Input Assignment:
输入分配：如果 kwargs 中提供了分配特征矩阵（ assign_x ），则使用它；否则，使用输入特征 x 。
- Mask Construction: 掩码构造：如果提供 batch_num_nodes ，则构造掩码以处理批次内的可变大小的图形。

Initial Embedding:初始嵌入：

这里注意，该阶段是先对输入节点特征X 进行编码，并没有对分配矩阵进行编码，后续在分层池化机制中，才对分配矩阵进行编码更新。

The input features x are passed through the initial GCN layers to generate the initial node embeddings (embedding_tensor).输入特征 x 通过初始 GCN 层以生成初始节点嵌入（ embedding_tensor ）。
* The initial embeddings are aggregated (e.g., using max pooling) and stored in out_all.
初始嵌入被聚合（例如，使用最大池化）并存储在 out_all 中。

Pooling Iterations:池化迭代：
- For each pooling layer:对于每个池化层：
  - Assignment Tensor Calculation: self.assign_tensor.分配张量计算：使用指定用于生成分配矩阵的 GCN 层来计算分配张量。分配张量通过x_a 分配矩阵和邻接矩阵的作用得来，而分配矩阵在分层池化中又是通过更新后的节点特征获得；
  - Softmax Normalization: Softmax 归一化：使用 softmax 函数对分配张量进行归一化。
  - Masked Assignment: 掩码分配：如果掩码可用，则将其应用于分配张量。 # 这里注意有且仅在第一层对分配矩阵进行掩码
  - Feature and Adjacency Update: 节点特征矩阵以及邻接矩阵更新，根据分配张量更新节点特征和邻接矩阵，更新的后便是经过池化后的粗粒度的节点特征以及粗粒度的邻接矩阵。
  - Post-Pooling Embedding: 更新后的特征和邻接矩阵通过 post-pooling GCN 层来生成新的嵌入。
  - Aggregation and Storage: 聚合和存储：新的嵌入被聚合并添加到 out_all 中。

Final Prediction:
- 所有池化层的输出被连接（如果 concat 是 True ）或选择作为最终输出。
- 连接/选择的输出通过预测模型 ( pred_model ) 传递以生成最终预测 ( ypred )。

1.4 self.gcn_forward() 的使用

这里需要注意的是，在forward 中调用了三次的 self.gcn_forward():

第一次调用用于获取对节点特征的编码张量，即对节点特征使用邻居节点进行更新，
输入的是原始节点特征矩阵与邻接矩阵进行运算，并需要 embeding mask 来应用实际有效的节点。

        embedding_tensor = self.gcn_forward(x, adj,self.conv_first, self.conv_block, self.conv_last, embedding_mask)

        out, _ = torch.max(embedding_tensor, dim=1)
        out_all.append(out)

这里需要注意，每一层级的输出都会被加入 out_all 中，并且最终会将他们拼接起来输入到 pred_layer 预测层中，
相当于最终预测时，考虑到了各个层级的输出；

第二次调用用于获得分配张量，输入分配矩阵，邻接矩阵；

            self.assign_tensor = self.gcn_forward(x_a, adj, 
                    self.assign_conv_first_modules[i], self.assign_conv_block_modules[i], self.assign_conv_last_modules[i],
                    embedding_mask)

第三次调用，获得新的编码特征，通过对更新后的节点特征以及邻接矩阵进行运算获得

            embedding_tensor = self.gcn_forward(x, adj, 
                    self.conv_first_after_pool[i], self.conv_block_after_pool[i],
                    self.conv_last_after_pool[i])

SoftPoolingGcnEncoder 是一个分层的GCN模型，它利用软池来有效地处理大型和复杂的图结构。前向函数通过初始嵌入、迭代池和最终预测来协调数据流，从而实现图数据的有效学习和表示。该模型设计灵活、模块化，允许各种配置和扩展。

2. Graph Conv 的作用

The multiplication of the adjacency matrix $\textbf{A}$ with the feature matrix $\textbf{X}$ in the GraphConv layer is a crucial operation in Graph Convolutional Networks (GCNs). This operation performs a localized, weighted aggregation of node features from each node’s neighbors. Here’s a detailed explanation of why this is done and what it accomplishes:

GraphConv 层中的邻接矩阵 $\textbf{A}$ 与特征矩阵 $\textbf{X}$ 的乘法是图卷积网络（GCN）中的关键操作。此操作对来自每个节点的邻居的节点特征执行局部加权聚合。以下详细解释了为什么这样做以及它实现了什么：

GraphConv 层中的邻接矩阵与节点特征矩阵的乘法执行 GCN 中邻居聚合的关键操作。

这允许每个节点根据其邻居的特征更新其特征，从而通过图有效地传播信息并捕获图的局部结构。
此操作与权重变换和可选的标准化相结合，使网络能够学习节点及其关系的有意义的表示。

Purpose of Adjacency Matrix Multiplication

Neighbor Aggregation:
- In a graph, the features of a node should be influenced by the features of its neighboring nodes. The adjacency matrix $\textbf{A}$ encodes the connections between nodes, where $\textbf{A}_{ij}$ is non-zero if there is an edge between node $i$ and node $j$ .在图中，节点的特征应该受到其相邻节点的特征的影响。邻接矩阵 $\textbf{A}$ 对节点之间的连接进行编码，如果节点 $i$ 和节点 $j$ 不为零> .
- When we multiply $\textbf{A}$ with $\textbf{X}$ , each node’s feature vector is updated to be a weighted sum of the feature vectors of its neighbors.当我们将 $\textbf{A}$ 与 $\textbf{X}$ 相乘时，每个节点的特征向量都会更新为其邻居特征向量的加权和。
Information Propagation:
- This operation allows information to propagate through the graph, enabling each node to gather information from its local neighborhood.此操作允许信息在图中传播，使每个节点能够从其本地邻居收集信息。
- This is essential for capturing the local structure and feature distribution within the graph.这对于捕获图中的局部结构和特征分布至关重要。

Mathematical Interpretation

我们来分解一下 GraphConv 层的操作：

Matrix Multiplication:
- The first operation $\textbf{Y} = \textbf{A} \cdot \textbf{X}$ where $\textbf{Y}$ is the intermediate result, $\textbf{A}$ is the adjacency matrix, and $\textbf{X}$ is the input feature matrix.第一个操作 $\textbf{Y} = \textbf{A} \cdot \textbf{X}$ ，其中 $\textbf{Y}$ 是中间结果， $\textbf{A}$ 是邻接矩阵， $\textbf{X}$ 是输入特征矩阵。
- For node $i$ , the feature vector $\textbf{Y}_i$ is computed as: 对于节点 $i$ ，特征向量 $\textbf{Y}_i$ 计算如下： $\textbf{Y}_i = \sum_{j \in \mathcal{N}(i)} \textbf{A}_{ij} \textbf{X}_j$ where $\mathcal{N}(i)$ denotes the neighbors of node $i$ including itself (if self-loops are added). 其中 $\mathcal{N}(i)$ 表示节点 $i$ 的邻居，包括其自身（如果添加了自循环）。
Self-Loop Addition:
- If add_self is True, $\textbf{X}$ is added to $\textbf{Y}$ . This ensures that the node’s own features are also included in the aggregation: 如果 add_self 是 True ，则 $\textbf{X}$ 将添加到 $\textbf{Y}$ 中。这确保了节点自身的特征也包含在聚合中： $\textbf{Y} = \textbf{A} \cdot \textbf{X} + \textbf{X}$
Weight Transformation:
- The intermediate result $\textbf{Y}$ is then transformed by a weight matrix $\textbf{W}$ : 然后将中间结果 $\textbf{Y}$ 通过权重矩阵 $\textbf{W}$ 进行转换： $\textbf{Z} = \textbf{Y} \cdot \textbf{W}$
- This operation applies a linear transformation to the aggregated features, which is essential for learning the appropriate feature representation.此操作对聚合特征应用线性变换，这对于学习适当的特征表示至关重要。
Bias Addition:
- If a bias term is included, it is added to $\textbf{Z}$ : 如果包含偏差项，则会将其添加到 $\textbf{Z}$ ： $\textbf{Z} = \textbf{Z} + \textbf{b}$
Normalization:
- If normalize_embedding is True, the features are normalized: 如果 normalize_embedding 是 True ，则特征被标准化： $\textbf{Z} = \frac{\textbf{Z}}{\|\textbf{Z}\|_2}$
- This ensures that the feature vectors have unit length, which can be useful in certain applications.这确保了特征向量具有单位长度，这在某些应用中很有用。

Example Code Walkthrough

以下是 GraphConv 类的简化演练：

class GraphConv(nn.Module):
    def __init__(self, input_dim, output_dim, add_self=False, normalize_embedding=False,
                 dropout=0.0, bias=True):
        super(GraphConv, self).__init__()
        self.add_self = add_self
        self.dropout = dropout
        if dropout > 0.001:
            self.dropout_layer = nn.Dropout(p=dropout)
        self.normalize_embedding = normalize_embedding
        self.input_dim = input_dim
        self.output_dim = output_dim

        device = 'cuda' if torch.cuda.is_available() else 'cpu'
        self.weight = nn.Parameter(torch.FloatTensor(input_dim, output_dim)).to(device)

        if bias:
            self.bias = nn.Parameter(torch.FloatTensor(output_dim).to(device))
        else:
            self.bias = None

    def forward(self, x, adj):
        if self.dropout > 0.001:
            x = self.dropout_layer(x)
        
        # Matrix multiplication with adjacency matrix
        y = torch.matmul(adj, x)

        # Optionally add self-loop
        if self.add_self:
            y += x
        
        # Linear transformation
        y = torch.matmul(y, self.weight)
        
        # Add bias if present
        if self.bias is not None:
            y = y + self.bias
        
        # Normalize if required
        if self.normalize_embedding:
            y = F.normalize(y, p=2, dim=2)
        
        return y