[论文精读]Semisupervised Graph Neural Networks for Graph Classification

论文网址:Semisupervised Graph Neural Networks for Graph Classification | IEEE Journals & Magazine | IEEE Xplore

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 省流版

1.1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work and Motivation

2.3.1. Graph Neural Networks for Graph Classification

2.3.2. Semisupervised Learning in Graph Classification

2.3.3. Few-Shot Learning in Graph Classification

2.4. Methodology

2.4.1. Problem Formulation and Notations

2.4.2. Framework

2.4.3. Implementation on Typical Graph Classification

2.4.4. Implementation on Few-Shot Graph Classification

2.5. Experimental Study

2.5.1. Datasets

2.5.2. Configurations of Graph Neural Networks

2.5.3. Experiments on Typical Graph Classification

2.5.4. Experiments on Few-Shot Graph Classification

2.5.5. Ablation Study

2.5.6. Parameter Sensitivity Analysis

2.6. Conclusion

3. Reference


1. 省流版

1.1. 心得

(1)还行,不算是很晦涩的论文,只是没有代码会让这个设计较难复现

2. 论文逐段精读

2.1. Abstract

        ①GNNs have achieve SOTA results on pure supervised area

        ②Semi-supervise usually applied on node classification

        ③They trained 2 GNNs as complementary views

2.2. Introduction

        ①Graph kernel methods are two stage methods and cost time

        ②Existing performances rely on labeled data

        ③They propopsed a semisupervised GNN framework for graph classification based on co-training and self-training

arduous  adj.艰苦的;艰难的

2.3. Related Work and Motivation

2.3.1. Graph Neural Networks for Graph Classification

        ①The authors list PATCHY-SAN, MPNN, DGCNN, GAM, DIFFPOOL and GIN

        ②These experiments do not contain unlabeled data

2.3.2. Semisupervised Learning in Graph Classification

        ①Semi supervised methods: self-training, co-training, label propagation

        ②Assumption of semi-supervised method: smoothness assumption, the cluster assumption, and the manifold assumption

        ③⭐There is no connection between graphs (edge will connects nodes)

2.3.3. Few-Shot Learning in Graph Classification

        ①Examples a prototype model which extract the average of samples in each class:

Pr_n=\frac1{|\mathcal{S}_n|}\sum_{(x_i,y_i)\in\mathcal{S}_n}f(x_i)

        ②The probability that a graph belongs to a class:

P(y=n|x_i)=\frac{\exp(-d\left(f(x_i),Pr_n\right))}{\sum_{j=1}^N\exp(-d(f(x_i),Pr_j))}

2.4. Methodology

2.4.1. Problem Formulation and Notations

(1)Supervised Graph Classification:

        ①Training set: \mathcal{D}_{\mathrm{training}}=\{(x_1,y_1),\ldots,(x_l,y_l)\}

        ②Mapping test data \mathcal{D}_{test}=\{(x_{l+1}),\ldots,(x_{l+u})\} to label 

(2)Semisupervised Graph Classification:

        ①Training set: \mathcal{D}_{\mathrm{training}}=\{(x_1,y_1),\ldots,(x_l,y_l),(x_{l+1}),\ldots,(x_{l+u})\}

        ②Common notations:

2.4.2. Framework

        ①They adopt pre-trained strategy in the first num_{pre} epoch

        ②For the same graph, the 2 classifications measured by Jensen–Shannon divergence

\begin{aligned}\ell_{JS}(x;\Theta1,\Theta2)&=\sum_{x_{i}\in\mathcal{D}_{U}}\left(H\left(\frac{1}{2}(Z1(x_{i})+Z2(x_{i}))\right)\right)\\&- \frac{1}{2}(H(Z1(x_{i}))+H(Z2(x_{i})))\end{aligned}

where Z_1 and Z_2 denote 2 softmax score outputed by the 2 classifiers

        ③Supervised loss is applied on labeled data:

\mathcal{L}_s(x,y;\Theta) = \sum_{x_i\in\mathcal{D}_L}\ell_{CE}(\operatorname{argmax}(Z(x_i)),y_i)

        ④Total training loss for the two GNNs:

\begin{aligned}\mathcal{L}_{\mathrm{pre}}&=\mathcal{L}_{s}(x,y;\Theta1)+\mathcal{L}_{s}(x,y;\Theta2)\\&+ \lambda_{JS}\ell_{JS}(x;\Theta1,\Theta2)\end{aligned}

        ⑤Two GNNs from different views arrange pseudo labels for each other

        ⑥They assign weight to each unlabeled sample:

\omega_i=1-\frac{H(Z(x_i))}{\log(N)}

where H\left ( \right ) denotes the entropy function, log\left ( N \right ) is the maximum possible entropy in \mathbb{R}^N

        ⑦To weakening the impact of category imbalance, they add another weight \gamma_{j}, j=1,\ldots,N, and the weight is defined by:

\gamma_{j}=(|L_{j}|+|U_{j}|)^{-1}

where L_j denotes the number of labeled samples in \mathcal{D}_L and U_j denotes the number of pseudo labeled samples in \mathcal{D}_U

        ⑧Loss for unlabeled data (by minimizing the pseudo labeled samples loss):

\mathcal{L}_{\mathrm{pseudo}}(x,\widehat{y};\Theta) = \sum_{x_{i}\in\mathcal{D}_{U}}\omega_{i}\gamma_{\widehat{y}_{i}}\ell_{CE}(\mathrm{argmax}(Z(x_{i})),\widehat{y}_{i})

where \widehat{y_{i}} denotes the pseudo label for unlabeled data assigned by the other view\mathcal{L}_{\mathrm{self}}=\mathcal{L}_{\mathrm{pseudo_self}}(x,\widehat{y}1;\Theta1)+\mathcal{L}_{\mathrm{pseudo_self}}(x,\widehat{y}2;\Theta2)

        ⑨The overall loss function on co-training:

\begin{aligned}\mathcal{L}_{co}&=\lambda_{co}\mu\mathcal{L}_{s}(x,y;\Theta1)+\mathcal{L}_{\mathrm{pseudo}}(x,\widehat{y}2;\Theta1)\\&+ \lambda_{co}\mu\mathcal{L}_{s}(x,y; \Theta2)+\mathcal{L}_{\mathrm{pseudo}}(x,\widehat{y}1; \Theta2)\end{aligned}

where the \lambda _{co} denotes the tradeoff factor between true labels and pseudo samples, \mu denotes another weight for true labeled examples and \mu=|\mathcal{D}_{mb^{\prime}}|/|\mathcal{D}_{mb^{\prime}}\cap\mathcal{D}_{L}|

        ⑩They reset pseudo labeled samples every \beta epochs to impair the harm of accumulated errors

        ⑪The supervised loss on pseudo labeled samples on self-training:

\mathcal{L}_{\text{pseudo self}}(x,\widehat{y};\Theta)=\sum_{x_{i}\in\mathcal{D}_{L^{\prime}}}\ell_{CE}(\text{argmax}(Z(x_{i})),\widehat{y}_{i})

where \widehat{y_{i}} denotes the pseudo label for unlabeled data assigned by their own view

        ⑫The overall self training loss function:

\mathcal{L}_{\mathrm{self}}=\mathcal{L}_{\mathrm{pseudo_self}}(x,\widehat{y}1;\Theta1)+\mathcal{L}_{\mathrm{pseudo_self}}(x,\widehat{y}2;\Theta2)

        ⑬The overall loss in the model:

\mathcal{L}=\mathcal{L}_{co}+\mathcal{L}_{\mathrm{self}}

        ⑭The workflow of this model:

        ⑮Algorithm of this model:

ameliorate  vt.改善;改进;改良

2.4.3. Implementation on Typical Graph Classification

        ①This framework can corporate with any GNN

2.4.4. Implementation on Few-Shot Graph Classification

        ①They combine their framework with prototypical network:

\widehat{P}(y=n|x_i)=\frac{\exp\bigl(-d\bigl(f(x_i),\widehat{Pr}_n\bigr)\bigr)}{\sum_{j=1}^N\exp\bigl(-d\bigl(f(x_i),\widehat{Pr}_j\bigr)\bigr)}

        ②Framework applied in few shot classification:

        ③Pseudo label generation:

2.5. Experimental Study

2.5.1. Datasets

        ①7 classic graph classification datasets: NCI1, NC109, D&D, COLLAB, REDDIT-MULTI-12K, MiniGCDataset, and DBLP_v1. NCI1 and NCI109

        ②Statistics of classic graph classification datasets:

        ③2 small sample datasets: mini-REM12K and mini-MGCD

        ④Statistics of few shot graph classification datasets:

2.5.2. Configurations of Graph Neural Networks

        ①They chose DIFFPOOL and GIN as the two GNNs, which DIFFPOOL extracts the topological structure and GIN keeps the high order neighbor relationship

        ②Hyper-parameter optimization: grid search

2.5.3. Experiments on Typical Graph Classification

        ①Labeling rate: 0.5% and 1% on MiniGCDataset, 5% and 10% for others

        ②Evaluation: average performance over 10 runs

(1)Parameter Configurations

        ①Training epoch: 300 for original GNNs and 200 for their semisupervised GNNs

        ②\lambda _{co}=0.001

        ③num_{pre}=30

        ④If epoch < num_{wmup}\lambda_{JS}=\lambda_{JS_-\max} \exp(-5*(1-\mathrm{epoch/num}_{wmup})^2, otherwise \lambda_{JS}=\lambda_{JS_-\max}\lambda_{JS_-\max}=10num_{wmup}=30

        ⑤top_k=5

        ⑥\beta =5

        ⑦Learning rate=0.001, decrese 0.5 at each 80 epochs

(2)Baseline Methods

        ①DIFFPOOL+ and GIN+: generated by SVM

        ②Strong non-GNN methods: graph2vec, Skip-Gram, RGM

(3)Results and Analysis

        ①Experimental results:

2.5.4. Experiments on Few-Shot Graph Classification

(1)Parameter Configurations

        ①\lambda _{f_s}=20

        ②top_k=3

        ③\lambda _{JS} is the same as in the typical graph

        ④Training epoch: 40

        ⑤learning rate: 0.001

(2)Baseline Methods

        ①Similar to typical graph

(3)Results and Analysis

        ①Performance table:

2.5.5. Ablation Study

        ①Module ablation study:

2.5.6. Parameter Sensitivity Analysis

        ①\alpha_{DIFFPOOL} is the number of clusters after soft coarsening in DIFFPOOL and \alpha_{DIFFPOOL} \in \left [ 0.05,3 \right ] with an increment of 0.05

        ②\alpha_{GIN} denotes the number of GNN layers and it varies from 3 to 7

        ③Hyperparameter varing observation:

        ④\lambda _{co}\in\quad\{0.0001,0.0002,0.0005,0.001,0.002,0.005,0.01\} on NCI1:

        ⑤\lambda _{fs}\in\{1,2,5,10,20,50,100,200,500,1000\} on mini-REM12K and mini-MGCD:

2.6. Conclusion

        They want further explore the noisy labels

3. Reference

Xie, Y. et al. (2023) 'Semisupervised Graph Neural Networks for Graph Classification', IEEE Transactions on Cybernetics, 53(10): 6222-6235. doi:  10.1109/TCYB.2022.3164696

Semi-supervised classification with graph convolutional networks (GCNs) is a method for predicting labels for nodes in a graph. GCNs are a type of neural network that operates on graph-structured data, where each node in the graph represents an entity (such as a person, a product, or a webpage) and edges represent relationships between entities. The semi-supervised classification problem arises when we have a graph where only a small subset of nodes have labels, and we want to predict the labels of the remaining nodes. GCNs can be used to solve this problem by learning to propagate information through the graph, using the labeled nodes as anchors. The key idea behind GCNs is to use a graph convolution operation to aggregate information from a node's neighbors, and then use this aggregated information to update the node's representation. This operation is then repeated over multiple layers, allowing the network to capture increasingly complex relationships between nodes. To train a GCN for semi-supervised classification, we use a combination of labeled and unlabeled nodes as input, and optimize a loss function that encourages the network to correctly predict the labels of the labeled nodes while also encouraging the network to produce smooth predictions across the graph. Overall, semi-supervised classification with GCNs is a powerful and flexible method for predicting labels on graph-structured data, and has been successfully applied to a wide range of applications including social network analysis, drug discovery, and recommendation systems.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值