[论文精读]Multi-view graph convolutional networks with attention mechanism

夏莉莉iy

已于 2024-02-01 19:14:52 修改

阅读量1k

点赞数 26

分类专栏：论文精读文章标签：深度学习人工智能机器学习学习计算机视觉笔记

于 2024-01-29 23:18:35 首次发布

本文链接：https://blog.csdn.net/Sherlily/article/details/135898073

版权

论文精读专栏收录该内容

59 篇文章 8 订阅

订阅专栏

论文网址：Multi-view graph convolutional networks with attention mechanism - ScienceDirect

论文代码：Multi-View GCNs with Attention Mechanism (MAGCN)

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用！

2.5. Multi-view graph convolutional network with attention mechanism

2.6. Theoretical analysis

2.7. Experiments

2.8. Conclusion

3. 知识补充

3.1. Discrete convolution

3.2. Information theory

4. Reference List

1. 省流版

1.1. 心得

（1）是节点分类任务，和脑一般应用的图分类不一样，省时间我感觉可以略过了...

（2）2022年真是GNN发展迅速的一年...

（3）倒是很难用于脑网络了...脑网络怎么能拥有多视图的，不会除了功能连接矩阵还有结构连接矩阵吧...虽然说是可以的但是医学上好像说了那些ASD，AD之类的和脑结构没啥关联...应该是不会改变脑结构的

（4）多视图倒是很有趣的想法，虽然不知道在生化领域算不算新颖，脑网络这还暂时没遇到过。只是感觉这种东西很需要数据啊，有这么多原始信息吗？

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

①Most of the GCN based models rely on fixed adjacency matrices, namely single view topology of the underlying graph（底层图的单视图拓扑...emm...我不能非常好地解释这玩意儿）

②⭐However, it limits and goes wrong when there are collection issues

③They provide a Multi-View Graph Convolutional Networks with Attention Mechanism (MAGCN) model combined with topological multi-view graph and attention based feature aggregation approach

④MAGCN handles node classification problem...（完，一下给我干沉默了，算了看都看了继续看吧）

error-prone 容易出错;易错;易于出错的;错误倾向

2.2. Introduction

①For node classification, it might be different in structure between training domain and target domain

②They aim to base on multi-view graphs namely with multi adjacency matrix to construct an approximate graph structure

③Briefly introduce their model

2.3. Related work

①Spatial based methods such as diffusion convolutional neural networks (DCNN), Graph-SAGE, MoNet, MPNN, graph isomorphism networks (GIN)

②Spectral based approaches such as GCN, ChebNet

③Those models which consider the topology and attention mechanism do not contain multi-view graph

disentangle vt.使解脱;使摆脱;分清，清理出(混乱的论据、想法等);理顺;解开…的结;使脱出

2.4. Preliminaries

①An undirected graph $G=\left \{ V,E,A \right \}$ , where $V$ denotes the node set and the number of node is $N$ , $E$ denotes the set of edges, $A$ represents the adjacency matrix

②Then a multi-view graph can be $G=\{V,(E_1,A_1),(E_2,A_2),\ldots,(E_n,A_n)\}$ , where $n$ denotes the number of views. Furthermore, the representation can be simplified as $G=\{V,A_1,A_2,\ldots,A_n\}$

③The graph Fourier transform (GFT) is $\hat{x}=U^\mathrm{T}x$ , where $x\in\mathbb{R}^N$ , $U$ denotes the eigenvector matrix of $L=I-D^{-\frac{1}{2}}AD^{-\frac{1}{2}}$ , a the normalized graph Laplacian matrix. $D$ denotes the degree matrix

④The graph convolution operator $\star_G$ in Fourier domain with:

$x\star_{G}y=U((U^\mathrm{T}x)\odot(U^\mathrm{T}y))$

⑤The graph convolution with convolution operator $\star$ :

$g_\theta\star x=g_\theta(L)x=g_\theta(U\Lambda U^\mathrm{T})x=Ug_\theta(\Lambda)U^\mathrm{T}x$

and it can be approximated by:

$g_\theta\star x=\sum_{k=0}^{K-1}\theta_kT_k(\tilde{L})x$

to reduce the computation complexity, where $\tilde{L}=2L/\lambda _{max}-I$ represents scaled Laplacian matrix, $\lambda _{max}$ represents the largest eigenvalue, $T_{k}\in\mathbb{R}^{N\times N}$ denotes the Chebyshev polynomial of order $k$ , $\theta\in\mathbb{R}^{K}$ denotes the Chebyshev coefficient

⑥The filter $F$ :

$\tilde{X}=\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}XW=\hat{A}XW$

where $\hat{A}=\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}$ ;

$\tilde{A}=A+I$ means it contains the self-loops;

$\tilde{D}_{ii}=\sum_{j}\tilde{A}_{ij}$ ;

$W\in\mathbb{R}^{M\times F}$ is the trainable weight matrix of $F$

2.5. Multi-view graph convolutional network with attention mechanism

①Define the graph $G=\{V,X,A\}$ ,where $V$ denotes $N$ nodes with feature $x\in\mathbb{R}^{M}$ respectively.Combining all the features, there are feature matrix $X\in\mathbb{R}^{N\times M}$

②The traditional GCN can be $Y=f(\hat{A}XW), Y\in\mathbb{R}^{N\times F}$ , where $f\left ( \right )$ denotes the designed activation function

③They change the sigle view graph $G$ to $G^{*}=\{V,X,A_{1},A_{2},\cdots,A_{n}\}$ by information theory $I(S;P_n|P_{n-1},\ldots,P_1)\geq\varepsilon_{\mathrm{info}}$

④The overall framework, which contains two multi-GCN blocks and one multiview attention block:

（为什么多视图图有5个节点捏？N个拓扑，也有特征矩阵。

作者意思是最开始是M，然后经过unfold变成F，即 $\mathcal{X}=\{\hat{X}_{1},\hat{X}_{2},\cdots,\hat{X}_{n}\}\in\mathbb{R}^{n\times5\times F}$ ，

GAP之后变成 $\bar{X}\in\mathbb{R}^{5\times F}$ ，

merge with softmax之后变成 $X^{*}\in\mathbb{R}^{5\times C}$ ）

conundrum n.难题;谜语;复杂难解的问题;令人迷惑的难题

（1）Multi-GCN(unfold)block

①Input: $G^*=\{V,X,A_1,A_2,\cdots,A_n\}$

②Output: $\mathcal{X}=\{\hat{X}_{1},\hat{X}_{2},\cdots,\hat{X}_{n}\}\in\mathbb{R}^{n\times N\times F}$ with approach $\hat{X}_{i}=f_\mathrm{GCN}(X,A_{i})=\mathrm{ReLU}(\hat{A}_{i}XW_{i}),\hat{X}_{i}\in\mathbb{R}^{N\times F}$ , where $n$ denotes the number of views

（2）Attention block

①Attention block combined with identity stage and attention distribution learning stage. Identity stage maps $\mathcal{X}$ to $F_{scale}(\cdot,\cdot)$ by $\bar{X}=F_{scale}(\mathcal{X},C)=\sum_{i=1}^{n}c_{i}\hat{X}_{i}$ . Attention distribution learning stage includes global average pooling (GAP) and a MLP

②The schematic of GAP:

③The traditional GAP is: $\mathrm{f}_{i}=f_{\mathrm{GAP}}(\mathrm{F}_{i})=\frac{1}{h\times w}\sum_{j=1}^{h}\sum_{k=1}^{w}\mathrm{F}_{i,jk}$ where $i$ denotes the layer and $\mathbf{F}_{i}\in\mathbb{R}^{h\times w}$

④In order to change the weight of each $F$ , the authors propose a graph GAP:

$\hat{x}_i=\frac{1}{N}\sum_{j=1}^{N}\frac{1}{|\mathcal{N}_{i,j}|}\sum_{k=1}^{|\mathcal{N}_{i,j}|}(I+A_i)_{jk}\hat{X}_{i,j,k}$

where $\mathcal{N}_{i,j}$ represents the neighbors of the $j$ -th node on the $i$ -th view;

$\sum_{k=1}^{|\mathcal{N}_{i,j}|}\left(I+A_{i}\right)_{jk}\hat{X}_{i,j,k}$ denotes the graph aggregation and reflectes the improvement of the model

⑤Then, learn the weights $C=\{c_{1},c_{2},\cdots,c_{n}\}\in\mathbb{R}^{n}$ through MLP

⑥With $C$ , mapping the $\mathcal{X}$ with $\bar{X}=F_{scale}(\mathcal{X},C)=\sum_{i=1}^{n}c_{i}\hat{X}_{i}$

（3）Multi-GCN(merge)

①Classify the $\bar{X}$ with $X^*=\sum\limits_{i=1}^nf_\text{GCN}(\bar{X},A_i)=\text{softmax}(\sum\limits_{i=1}^n\hat{A}_i\bar{X}W_i), X\in\mathbb{R}^{N\times C}$ , where $C$ is the number of classes

②According to the semi-supervised method, they apply cross-entropy error as the loss:

$L=-\sum_{k\in V_L}\sum_{j=1}^CY_{kj}\ln X_{kj}^*$

where $V_L$ is the set of labeled nodes, $Y\in\mathbb{R}^{|V_{L}|\times C}$ is the label indicator matrix

2.6. Theoretical analysis

严谨地用数学论证了为啥他们提出来的是好的，超过了我的数学能力，暂时不看

2.7. Experiments

①They apply attack simulations with different levels of topology perturbations to prove the robustness of MAGCN

②The datasets:

③The output dimension $F$ of multi-GCN (unfold): 16

④Layers of MLP in attention: 3

⑤Numbers of neurons in the first, second and last layers: 6, 3 and the number equals to the views respectively

⑥Optimizer: Adam

⑦Learning rate: 0.01

⑧Weight decay: 0.0005

⑨Weight initialization: Glorot uniform initializer

⑩Dropout rate: 0.5

⑪⭐Number of views: all apply 3 in 3 datasets, topology, feature similarity between nodes and text similarity （值大于某个阈值则加边）

⑫Comparisons with 10 runs:

⑬Choices in ablation study:

GCN+View 1: GCN with view 1 (the given adjacency matrix)

GCN+View 2: GCN with view 2 (the similarity-based graph)

GCN+View 3: GCN with view 3 (the b-matching graph)

MLP+GCN+View 1,2,3: GCN with three views via a standard MLP

MAGCN+View 1,2,3: Our MAGCN with three views

and the comparison:

⑭Visualize the result by t-SNE (the left is GCN and the right one is MAGCN):

⑮Robustness analysis with random topology attack (RTA): randomly delete some edges with rate from 0.1 to 1:

⑯Robustness analysis with low label rates (LLR): label rate sets are {0.025, 0.02, 0.015, 0.01, 0.005}:

⑰The other MAGCN with cosine similarity: $s_{ij}=\cos(\boldsymbol{w}\odot\boldsymbol{v}_i,\boldsymbol{w}\odot\boldsymbol{v}_j)$ . There are ablation choices as well: