论文网址:[2007.02265] AM-GCN: Adaptive Multi-channel Graph Convolutional Networks (arxiv.org)
论文代码:GitHub - zhumeiqiBUPT/AM-GCN: AM-GCN: 自适应多通道图卷积网络
英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
2.3. Fusion Capability of GCNs: an Experimental Investigation
2.4. AM-GCN: the Proposed Model
2.4.1. Specific Convolution Module
2.4.2. Common Convolution Module
2.5.5. Analysis of Attention Mechanism
3.2. Hilbert-Schmidt Independence Criterion (HSIC)
1. 省流版
1.1. 心得
(1)case 1和2挺奇怪的,要让GCN去发现“拓扑没有用而节点有用”“节点有用但拓扑没有用”这样的规律而给到一定的权重。其实也是蛮有趣的点吧
(2)很难得看到数学行文这么清晰的,没啥我不懂的
(3)dataset statistics table好评
(4)⭐注意力分布来分析拓扑和特征重要程度从而讨论表现。真是妙啊
(5)好活当赏,值得一品
1.2. 论文总结图
2. 论文逐段精读
2.1. Abstract
①Even the most SOTA GCN fails to present the node feature fusion and topological structure
②They proposed an adaptive multi-channel graph convolutional networks for semi-supervised classification (AM-GCN) to solve this problem
2.2. Introduction
①GCNs do not posses the ability that appropriately fusing node features and topological structure(所以可能才更提倡GAT?)
2.3. Fusion Capability of GCNs: an Experimental Investigation
2.3.1. Case 1: Random Topology and Correlated Node Features
①Generating a network randomly:
Number of node | 900 |
Edge probability of any two nodes | 0.03 |
Eigenvector of nodes (by Guassian distribution) | 50 dimension |
Label (node classification) (with the same covariance matrix) | 3 |
⭐while the label mainly related to node features rather than topological structure. It can be seen that GCNs are influenced by topological information.(但是我也想问,既然你就节点包含有用信息了,为啥非要连接呢?这种虚构的连接本来就是没有意义的。是你强行去干扰了他。随机建变也太不合理了,现实中的人也不是随机认识啊)
②GNN: GCN
③Traning set: random 20 nodes with each label
④Test set: another random 200 nodes with each label
⑤Performance of GCN/MLP: 75.2%/100%
2.3.2. Case 2: Correlated Topology and Random Node Features
①Generating a network randomly:
Number of node | 900 |
Edge probability of any two nodes in the same community | 0.03 |
Edge probability of any two nodes between different community | 0.0015 |
Eigenvector of nodes (randomly) | 50 dimension |
Label (node classification/community) (Stochastic Blockmodel (SBM)) | 3 |
while the label mainly related to topological structure rather thannode features
②Performance of GCN/DeepWalk: 87%/100%, cuz DeepWalk ignores the node features(⭐其实这里就非常奇怪了,作者把融合节点特征和拓扑结构的GCN和单节点特征分类器及单拓扑分类器做比较,认为GCN不能自适应地极端偏向某一边。虽然说这样是有一点小小的道理的,但总感觉也违背了GCN本来就想融合多重特征的初心)
2.4. AM-GCN: the Proposed Model
①Problem settings: semi-supervised node classification graph , where
denotes the symmetric adjacency matrix,
denotes the number of nodes,
denotes the node feature matrix,
denotes the node feature dimension
②The schematic of AM-GCN:
2.4.1. Specific Convolution Module
(1)Feature graph
①Based on node feature matrix , the authors construct a k-nearest neighbor (kNN) graph
②Calculating similarity matrix by following two ways ( are feature vector):
③Cosine Similarity (their choise):
④Heat Kernel:
where the denotes time parameter in heat conduction equation, they manually set it to 2
⑤The output in the -th layer with input can be:
and "as we all know", . Moreover, the final output is which is regarded the feature graph
(2)Topology graph
①Similar to structure graph, but (no screen)
2.4.2. Common Convolution Module
①The topological and structural information is not completely independent. Accordingly, it is neccessary to combine them in Common-GCN
②They just share the weights...
③Then combining the two outputs and to common embedding :
2.4.3. Attention Mechanism
①The attention value:
where each
②For the -th row of , namely the -th node , transforming it nonlinearly. Then get its attention value by shared attention vector :
the same as and
③Normalization:
the same as and
④Denoting , the final embedding will be:
2.4.4. Objective Function
(1)Consistency Constraint
①To normalize and to and
②The similarity matrix can be:
③For two similar matrices, the loss/constraint can be:
(2)Disparity Constraint
①They apply Hilbert-Schmidt Independence Criterion (HSIC) to calculate the independent loss:
where both and are Gram matrices, which and ;
, denotes identity matrix and denotes a column vector with all-one
②The same as:
③Combining two loss together:
(3)Optimization Objective
①The predictions are:
②Cross entropy loss between ground truth and predicted :
③The total loss:
2.5. Experiments
2.5.1. Experimental Setup
(1)Datasets
①The statistics of their datasets:
(2)Baselines
①Network embedding algorithms: DeepWalk andLINE
②Graph neural network: Chebyshev, GCN, kNN-GCN, GAT, DEMO-Net, and MixHop
(3)Parameters Setting
①Label rate for semi-supervised learning: 20, 40, 60 per class
②Training 3 2-layer GCNs with the same hidden layer dimension () and the same output dimension ()
③Learning rate: 0.0001-0.0005
④Optimizer: Adam
⑤Dropout rate: 0.5
⑥Weight decay: 5e-3 or 5e-4
⑦k of kNN: 2,3,...,10
⑧Searching paramaters of and
⑨Running times: 5 for average
⑩Metrices: ACC and F1
2.5.2. Node Classification
①Performance comparison:
where L/C denotes the number of labeled nodes per class
2.5.3. Analysis of Variants
①The ablation study:
AM-GCN-w/o | AM-GCN without and |
AM-GCN-c | AM-GCN with |
AM-GCN-d | AM-GCN with |
②The results table:
2.5.4. Visualization
①They visualize the last layer of AM-GCN/GCN/GAT/DeepWalk on BlogCatalog by t-SNE:
2.5.5. Analysis of Attention Mechanism
(1)Analysis of attention distributions
①Attention distributions under 20 label rate:
(2)Analysis of attention trends
①Attention trends with x-axis denoting epoch and y-axis denoting average attention value under 20 label rate
2.5.6. Parameter Study
(1)Analysis of consistency coefficient γ
①Varying test of γ:
(2)Analysis of disparity constraint coefficient β
①Varying test of β:
(3)Analysis of k-nearest neighbor graph k
①Varying test of k:
2.6. Related Work
①?差点怀疑看错了,怎么放到这来了
②Listing some related works one by one
2.7. Conclusion
Fine
2.8. Supplement
2.8.1. Experiments Settings
=。=介绍了一下电脑配置
2.8.2. Baselines and Datasets
①Code:
DeepWalk, LINE | https://github.com/thunlp/OpenNE |
Chebyshev | https://github.com/tkipf /gcn |
GCN (Pytorch) | https://github.com/tkipf /pygcn |
GAT (Pytorch) | https://github.com/Diego999/pyGAT/ |
DEMO-Net | https://github.com/jwu4sml/DEMO-Net |
MixHop | https://github.com/samihaija/mixhop |
②Dataset:
Citeseer | https://github.com/tkipf /pygcn |
UAI2010 | http://linqs.umiacs.umd.edu/projects//projects/lb c/index.html |
ACM | https://github.com/Jhy1993/HAN |
BlogCatalog | https://github.com/mengzaiqiao/CAN |
Flickr | https://github.com/mengzaiqiao/CAN |
CoraFull | https://github.com/abojchevski/graph2gauss/ |
2.8.3. Implementation Details
啥也没说
2.8.4. Additional Results
(1)Analysis of attention trends
①Attention trends:
②Varying test of γ:
③Varying test of β:
④Varying test of k:
(2)Parameters Study
3. 知识补充
3.1. Stochastic Blockmodel
参考学习1:图生成模型之随机块模型(stochastic block model)学习笔记 - 知乎 (zhihu.com)
参考学习2:数据包络分析(超效率-SBM模型)附python代码_超效率sbm模型-CSDN博客
3.2. Hilbert-Schmidt Independence Criterion (HSIC)
(1)简介:Hilbert-Schmidt Independence Criterion(HSIC)是一种用于衡量两个变量之间独立性的准则。它与互信息类似,通过度量两个变量之间的分布差异来评估它们的独立性。HSIC基于协方差构建,可以视为协方差的一种扩展,用于更广泛地描述变量之间的关系。
(2)参考学习:HSIC简介:一个有意思的判断相关性的思路-CSDN博客
3.3. Gram matrix
(1)简介:Gram matrix(格拉姆矩阵)是一个在内积空间中,由一组向量的内积组成的矩阵。具体来说,对于一组向量[v1, v2, … , vn],格拉姆矩阵是一个埃尔米特矩阵,其元素Gij由向量vi和vj的内积给出,即Gij = 〈vi, vj〉。在图像处理中,对于给定的矩阵A,其列向量的格拉姆矩阵是ATA,而行向量的格拉姆矩阵是AAT。
(2)参考学习:格拉姆矩阵(Gram matrix)详细解读 - 知乎 (zhihu.com)
3.4. t-SNE
参考学习:t-SNE 原理及Python实例 - 知乎 (zhihu.com)
4. Reference List
Wang, X. et al. (2020) 'AM-GCN: Adaptive Multi-channel Graph Convolutional Networks', KDD2020. doi: https://doi.org/10.48550/arXiv.2007.02265