[论文精读]Multi-view graph convolutional networks with attention mechanism

论文网址:Multi-view graph convolutional networks with attention mechanism - ScienceDirect

论文代码:Multi-View GCNs with Attention Mechanism (MAGCN)

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用!

目录

1. 省流版

1.1. 心得

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related work

2.4. Preliminaries

2.5. Multi-view graph convolutional network with attention mechanism

2.6. Theoretical analysis

2.7. Experiments

2.8. Conclusion

3. 知识补充

3.1. Discrete convolution

3.2. Information theory

4. Reference List


1. 省流版

1.1. 心得

(1)是节点分类任务,和脑一般应用的图分类不一样,省时间我感觉可以略过了...

(2)2022年真是GNN发展迅速的一年...

(3)倒是很难用于脑网络了...脑网络怎么能拥有多视图的,不会除了功能连接矩阵还有结构连接矩阵吧...虽然说是可以的但是医学上好像说了那些ASD,AD之类的和脑结构没啥关联...应该是不会改变脑结构的

(4)多视图倒是很有趣的想法,虽然不知道在生化领域算不算新颖,脑网络这还暂时没遇到过。只是感觉这种东西很需要数据啊,有这么多原始信息吗?

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

        ①Most of the GCN based models rely on fixed adjacency matrices, namely single view topology of the underlying graph(底层图的单视图拓扑...emm...我不能非常好地解释这玩意儿

        ②⭐However, it limits and goes wrong when there are collection issues

        ③They provide a Multi-View Graph Convolutional Networks with Attention Mechanism (MAGCN) model combined with topological multi-view graph and attention based feature aggregation approach

        ④MAGCN handles node classification problem...(完,一下给我干沉默了,算了看都看了继续看吧

error-prone  容易出错;易错;易于出错的;错误倾向

2.2. Introduction

        ①For node classification, it might be different in structure between training domain and target domain

        ②They aim to base on multi-view graphs namely with multi adjacency matrix to construct an approximate graph structure

        ③Briefly introduce their model

2.3. Related work

        ①Spatial based methods such as diffusion convolutional neural networks (DCNN), Graph-SAGE, MoNet, MPNN, graph isomorphism networks (GIN)

        ②Spectral based approaches such as GCN, ChebNet

        ③Those models which consider the topology and attention mechanism do not contain multi-view graph

disentangle  vt.使解脱;使摆脱;分清,清理出(混乱的论据、想法等);理顺;解开…的结;使脱出

2.4. Preliminaries

        ①An undirected graph G=\left \{ V,E,A \right \}, where V denotes the node set and the number of node is NE denotes the set of edges, A represents the adjacency matrix

        ②Then a multi-view graph can be G=\{V,(E_1,A_1),(E_2,A_2),\ldots,(E_n,A_n)\}, where n denotes the number of views. Furthermore, the representation can be simplified as G=\{V,A_1,A_2,\ldots,A_n\}

        ③The graph Fourier transform (GFT) is \hat{x}=U^\mathrm{T}x, where x\in\mathbb{R}^NU denotes the eigenvector matrix of L=I-D^{-\frac{1}{2}}AD^{-\frac{1}{2}}, a the normalized graph Laplacian matrix. D denotes the degree matrix

        ④The graph convolution operator \star_G in Fourier domain with:

x\star_{G}y=U((U^\mathrm{T}x)\odot(U^\mathrm{T}y))

        ⑤The graph convolution with convolution operator \star:

g_\theta\star x=g_\theta(L)x=g_\theta(U\Lambda U^\mathrm{T})x=Ug_\theta(\Lambda)U^\mathrm{T}x

and it can be approximated by:

 g_\theta\star x=\sum_{k=0}^{K-1}\theta_kT_k(\tilde{L})x 

to reduce the computation complexity, where \tilde{L}=2L/\lambda _{max}-I represents scaled Laplacian matrix, \lambda _{max} represents the largest eigenvalue, T_{k}\in\mathbb{R}^{N\times N} denotes the Chebyshev polynomial of order k\theta\in\mathbb{R}^{K} denotes the Chebyshev coefficient

        ⑥The filter F:

\tilde{X}=\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}XW=\hat{A}XW

where \hat{A}=\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}};

\tilde{A}=A+I means it contains the self-loops;

\tilde{D}_{ii}=\sum_{j}\tilde{A}_{ij};

W\in\mathbb{R}^{M\times F} is the trainable weight matrix of F

2.5. Multi-view graph convolutional network with attention mechanism

        ①Define the graph G=\{V,X,A\},where V denotes N nodes with feature x\in\mathbb{R}^{M} respectively.Combining all the features, there are feature matrix X\in\mathbb{R}^{N\times M}

        ②The traditional GCN can be Y=f(\hat{A}XW), Y\in\mathbb{R}^{N\times F}, where f\left ( \right ) denotes the designed activation function

        ③They change the sigle view graph G to G^{*}=\{V,X,A_{1},A_{2},\cdots,A_{n}\} by information theory I(S;P_n|P_{n-1},\ldots,P_1)\geq\varepsilon_{\mathrm{info}}

        ④The overall framework, which contains two multi-GCN blocks and one multiview attention block: 

为什么多视图图有5个节点捏?N个拓扑,也有特征矩阵。

作者意思是最开始是M,然后经过unfold变成F,即\mathcal{X}=\{\hat{X}_{1},\hat{X}_{2},\cdots,\hat{X}_{n}\}\in\mathbb{R}^{n\times5\times F}

GAP之后变成\bar{X}\in\mathbb{R}^{5\times F}

merge with softmax之后变成X^{*}\in\mathbb{R}^{5\times C}

conundrum  n.难题;谜语;复杂难解的问题;令人迷惑的难题

(1)Multi-GCN(unfold)block

        ①Input: G^*=\{V,X,A_1,A_2,\cdots,A_n\}

        ②Output: \mathcal{X}=\{\hat{X}_{1},\hat{X}_{2},\cdots,\hat{X}_{n}\}\in\mathbb{R}^{n\times N\times F} with approach \hat{X}_{i}=f_\mathrm{GCN}(X,A_{i})=\mathrm{ReLU}(\hat{A}_{i}XW_{i}),\hat{X}_{i}\in\mathbb{R}^{N\times F}, where n denotes the number of views

(2)Attention block

        ①Attention block combined with identity stage and attention distribution learning stage. Identity stage maps \mathcal{X} to F_{scale}(\cdot,\cdot) by \bar{X}=F_{scale}(\mathcal{X},C)=\sum_{i=1}^{n}c_{i}\hat{X}_{i}. Attention distribution learning stage includes global average pooling (GAP) and a MLP

        ②The schematic of GAP:

        ③The traditional GAP is: \mathrm{f}_{i}=f_{\mathrm{GAP}}(\mathrm{F}_{i})=\frac{1}{h\times w}\sum_{j=1}^{h}\sum_{k=1}^{w}\mathrm{F}_{i,jk} where i denotes the layer and \mathbf{F}_{i}\in\mathbb{R}^{h\times w}

        ④In order to change the weight of each F, the authors propose a graph GAP: 

\hat{x}_i=\frac{1}{N}\sum_{j=1}^{N}\frac{1}{|\mathcal{N}_{i,j}|}\sum_{k=1}^{|\mathcal{N}_{i,j}|}(I+A_i)_{jk}\hat{X}_{i,j,k}

where \mathcal{N}_{i,j} represents the neighbors of the j-th node on the i-th view;

\sum_{k=1}^{|\mathcal{N}_{i,j}|}\left(I+A_{i}\right)_{jk}\hat{X}_{i,j,k} denotes the graph aggregation and reflectes the improvement of the model

        ⑤Then, learn the weights C=\{c_{1},c_{2},\cdots,c_{n}\}\in\mathbb{R}^{n} through MLP

        ⑥With C, mapping the \mathcal{X} with \bar{X}=F_{scale}(\mathcal{X},C)=\sum_{i=1}^{n}c_{i}\hat{X}_{i}

(3)Multi-GCN(merge)

        ①Classify the \bar{X} with X^*=\sum\limits_{i=1}^nf_\text{GCN}(\bar{X},A_i)=\text{softmax}(\sum\limits_{i=1}^n\hat{A}_i\bar{X}W_i), X\in\mathbb{R}^{N\times C} , where C is the number of classes

        ②According to the semi-supervised method, they apply cross-entropy error as the loss:

L=-\sum_{k\in V_L}\sum_{j=1}^CY_{kj}\ln X_{kj}^*

where V_L is the set of labeled nodes, Y\in\mathbb{R}^{|V_{L}|\times C} is the label indicator matrix

2.6. Theoretical analysis

        严谨地用数学论证了为啥他们提出来的是好的,超过了我的数学能力,暂时不看

2.7. Experiments

        ①They apply attack simulations with different levels of topology perturbations to prove the robustness of MAGCN

        ②The datasets:

        ③The output dimension F of multi-GCN (unfold): 16

        ④Layers of MLP in attention: 3

        ⑤Numbers of neurons in the first, second and last layers: 6, 3 and the number equals to the views respectively

        ⑥Optimizer: Adam

        ⑦Learning rate: 0.01

        ⑧Weight decay: 0.0005

        ⑨Weight initialization: Glorot uniform initializer

        ⑩Dropout rate: 0.5

        ⑪⭐Number of views: all apply 3 in 3 datasets, topology, feature similarity between nodes and text similarity (值大于某个阈值则加边)

        ⑫Comparisons with 10 runs:

        ⑬Choices in ablation study:

GCN+View 1: GCN with view 1 (the given adjacency matrix)
GCN+View 2: GCN with view 2 (the similarity-based graph)
GCN+View 3: GCN with view 3 (the b-matching graph)
MLP+GCN+View 1,2,3: GCN with three views via a standard MLP
MAGCN+View 1,2,3: Our MAGCN with three views

and the comparison:

        ⑭Visualize the result by t-SNE (the left is GCN and the right one is MAGCN):

        ⑮Robustness analysis with random topology attack (RTA): randomly delete some edges with rate from 0.1 to 1:

        ⑯Robustness analysis with low label rates (LLR): label rate sets are {0.025, 0.02, 0.015, 0.01, 0.005}:

        ⑰The other MAGCN with cosine similarity: s_{ij}=\cos(\boldsymbol{w}\odot\boldsymbol{v}_i,\boldsymbol{w}\odot\boldsymbol{v}_j). There are ablation choices as well:

GCN+View 1: GCN with view 1, i.e., the given adjacency matrix
GCN+View 2: GCN with view 2, i.e., the similarity-based graph

GCN+View 2^⁎: GCN with view 2^⁎, i.e., the weighted trainable similarity-based graph based on cosine similarity
MAGCN+View 1,2: MAGCN with the view 1 and view 2

MAGCN+View 1,2^⁎: MAGCN with the view 1 and view 2^⁎

and the comparison:

2.8. Conclusion

        The MAGCN is able to capture the node features from different hops of neighbors

3. 知识补充

3.1. Discrete convolution

参考学习:连续卷积和离散卷积定义及积分计算-CSDN博客

3.2. Information theory

参考学习:信息论入门教程 - 阮一峰的网络日志 (ruanyifeng.com)

4. Reference List

Yao K. et al. (2022) 'Multi-view graph convolutional networks with attention mechanism', Artificial Intelligence, 307. doi: Redirecting

  • 25
    点赞
  • 21
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值