论文阅读和分析:Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification

KPer_Yang

已于 2023-04-05 12:24:30 修改

阅读量869

点赞数 1

分类专栏：机器学习文章标签：论文阅读深度学习人工智能

于 2023-02-04 15:33:04 首次发布

本文链接：https://blog.csdn.net/KPer_Yang/article/details/128882363

版权

机器学习专栏收录该内容

87 篇文章

订阅专栏

文章详细介绍了作者对EEG脑电数据的研究，包括数据集介绍、相关论文的阅读与分析，特别是图神经网络在EEG情绪识别中的应用。提出了统一消息传递模型（UniMP），解决了标签泄漏问题，通过在多个半监督分类数据集上的实验，展示其有效性和改进性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

下面所有博客是个人对EEG脑电的探索，项目代码是早期版本不完整，需要完整项目代码和资料请私聊。

数据集
1、脑电项目探索和实现(EEG) (上)：研究数据集选取和介绍SEED
相关论文阅读分析：
1、EEG-SEED数据集作者的—基线论文阅读和分析
2、图神经网络EEG论文阅读和分析：《EEG-Based Emotion Recognition Using Regularized Graph Neural Networks》
3、EEG-GNN论文阅读和分析：《EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks》
4、论文阅读和分析:Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification
5、论文阅读和分析：《DeepGCNs: Can GCNs Go as Deep as CNNs?》
6、论文阅读和分析： “How Attentive are Graph Attention Networks?”
7、论文阅读和分析：Simplifying Graph Convolutional Networks

8、论文阅读和分析：LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation
9、图神经网络汇总和总结
相关实验和代码实现：
1、用于图神经网络的脑电数据处理实现_图神经网络脑电
2、使用GCN训练和测试EEG的公开SEED数据集
3、使用GAT训练和测试EEG公开的SEED数据集
4、使用SGC训练和测试SEED数据集
5、使用Transformer训练和测试EEG的公开SEED数据集_eeg transformer
6、使用RGNN训练和测试EEG公开的SEED数据集
辅助学习资料：
1、官网三个简单Graph示例说明三种层次的应用_graph 简单示例
2、PPI数据集示例项目学习图神经网络
3、geometric库的数据处理详解
4、NetworkX的dicts of dicts以及解决Seven Bridges of Königsberg问题
5、geometric源码阅读和分析：MessagePassin类详解和使用
6、cora数据集示例项目学习图神经网络
7、Graph 聚合
8、QM9数据集示例项目学习图神经网络
9、处理图的开源库

提出了一个统一消息传递模型（UniMP）

两个简单但有效的想法：

（a）将节点特征传播与标签相结合；

UniMP在训练和推理阶段同时使用节点特征和标签。标签使用嵌入技术将部分节点标签从一个 one-hot类型标签转换为密集的类向量节点特征。多层Graph Transformer网络将节点特征和标签作为输入，在节点之间进行信息传播。因此，每个节点可以聚合来自其邻居的特征和标签信息。

（b）屏蔽标签预测。

由于将节点标签作为输入，因此将其用于监督训练将导致标签泄漏问题，在推理中表现不佳。为了解决这个问题，提出了一种屏蔽标签预测策略，该策略随机屏蔽一些训练实例的标签，然后预测它们以克服标签泄漏。这种简单而有效的训练方法从BERT中的屏蔽词预测中吸取了教训[Devlin等人，2018]，并模拟了将标记信息从图中的标记示例转换为未标记示例的过程。

实验结果：

在开放图基准（OGB）中的三个半监督分类数据集上评估了的UniMP模型，其中的新方法在所有任务中实现了最新的结果，在ogbn产品中获得82.56%的ACC，在ogbn蛋白质中获得86.42%的ROC-AUC，在ogbn-arxiv中获得73.11%的ACC。还对UniMP模型进行了消融研究，以评估统一方法的有效性。此外，对标签传播如何提高模型性能进行了最彻底的分析。

在这里插入图片描述

Graph Neural Networks：

在第 $l$ 层的特征传播：

在这里插入图片描述

其中 $D$ 是正则化邻接矩阵，A是邻接矩阵， $H^l$ 是 $l$ 层的特征表示， $\sigma$ 是激活函数， $W^l$ 是 $l$ 层的可学习权重；

Label propagation algorithms

像标签传播算法（LPA）这样的传统算法只利用标签和节点之间的关系来进行预测。LPA假设连接节点之间的标签相似，并在图中迭代传播标签。给定一个初始标签矩阵 $\hat{Y^{(0)}}$ ，它由一个one-hot标签指示向量 $\hat{y_i^{0}}$ （用于标记节点）或零向量（用于未标记节点）组成。LPA的简单迭代方程公式如下：

在这里插入图片描述

Combining GNN and LPA

将GNN和LPA结合在社区的半监督分类任务中。APPNP[Klicpera等人，2018]和TPN[Liu等人，2019]建议使用GCN来预测软标签，然后使用个性化Pagerank来传播它们。然而，这些工作仍然只考虑部分节点标签作为监督训练信号。GCN-LPA与的工作最相关，因为它们也将部分节点标签作为输入。然而，他们以更间接的方式结合了GNN和LPA，仅在训练中使用LPA来调整GAT模型的权重边。虽然的UniMP在网络中直接结合GNN和LPA，但在训练和预测中传播节点特征和标签。此外，与GCN-LPA不同，其正则化策略只能用于具有可训练权重边的GNN，如GAT[Velickovi´c´et al.，2017]、GAAN[Zhang et al.，2018]，训练策略可以很容易地扩展到各种GNN，例如GCN和GAT，以进一步提高其性能。

算法：geometric.nn开源实现

torch_geometric.nn — pytorch_geometric documentation (pytorch-geometric.readthedocs.io)
$\mathbf{x}^{\prime}_i = \mathbf{W}_1 \mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j} \mathbf{W}_2 \mathbf{x}_{j},$
where the attention coefficients $a_{i,j}$ are computed via multi-head dot product attention:
$\alpha_{i,j} = \textrm{softmax} \left( \frac{(\mathbf{W}_3\mathbf{x}_i)^{\top} (\mathbf{W}_4\mathbf{x}_j)} {\sqrt{d}} \right)$

in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.
out_channels (int) – Size of each output sample.
heads (int, optional) – Number of multi-head-attentions. (default: 1)
concat (bool, optional) – If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)
beta (bool, optional) –

If set, will combine aggregation and skip information via
$\mathbf{x}^{\prime}_i = \beta_i \mathbf{W}_1 \mathbf{x}_i + (1 - \beta_i) \underbrace{\left(\sum_{j \in \mathcal{N}(i)} \alpha_{i,j} \mathbf{W}_2 \vec{x}_j \right)}_{=\mathbf{m}_i}$
其中：
$\beta_i = \textrm{sigmoid}(\mathbf{w}_5^{\top} [ \mathbf{W}_1 \mathbf{x}_i, \mathbf{m}_i, \mathbf{W}_1 \mathbf{x}_i - \mathbf{m}_i ])$
dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)
edge_dim (int, optional) –

Edge feature dimensionality (in case there are any). Edge features are added to the keys after linear transformation, that is, prior to computing the attention dot product. They are also added to final values after the same linear transformation. The model is:
$\mathbf{x}^{\prime}_i = \mathbf{W}_1 \mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j} \left( \mathbf{W}_2 \mathbf{x}_{j} + \mathbf{W}_6 \mathbf{e}_{ij} \right),$
其中：
$\alpha_{i,j} = \textrm{softmax} \left( \frac{(\mathbf{W}_3\mathbf{x}_i)^{\top} (\mathbf{W}_4\mathbf{x}_j + \mathbf{W}_6 \mathbf{e}_{ij})} {\sqrt{d}} \right)$
(default None)
bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)
root_weight (bool, optional) – If set to False, the layer will not add the transformed root node features to the output and the option beta is set to False. (default: True)
**kwargs (optional) – Additional arguments of conv.MessagePassing.

       def __init__(
        self,
        in_channels: Union[int, Tuple[int, int]],
        out_channels: int,
        heads: int = 1,
        concat: bool = True,
        beta: bool = False,
        dropout: float = 0.,
        edge_dim: Optional[int] = None,
        bias: bool = True,
        root_weight: bool = True,
        **kwargs,
    ):
    
    def forward(self, x: Union[Tensor, PairTensor], edge_index: Adj,
                edge_attr: OptTensor = None, return_attention_weights=None):