[论文精读]Multi-View Attribute Graph Convolution Networks for Clustering

夏莉莉iy

于 2024-03-18 17:15:21 发布

阅读量979

点赞数 17

分类专栏：论文精读文章标签：机器学习人工智能深度学习学习计算机视觉分类 python

本文链接：https://blog.csdn.net/Sherlily/article/details/136778283

版权

论文精读专栏收录该内容

62 篇文章 8 订阅

订阅专栏

论文网址：用于聚类的多视图属性图卷积网络 |IJCAI公司

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

2.4. Proposed Methodology

2.4.1. Notation

2.4.2. The Framework of MAGCN

2.4.3. Task for Clustering

2.5. Task for Clustering

2.5.1. Experimental Setting

2.5.2. Experimental Results

2.6. Conclusions

3. 知识补充

3.1. Discrete Data

3.2. Multi-modality and multi-view

4. Reference List

1. 省流版

1.1. 心得

（1）走错频道了，怎么是机器学习，那就看看创新点吧

（2）感觉多模态很多都是说smri, fmri, DTI三个的。但是好像EEG也能变成脑图诶，为什么似乎到目前为止没有看到把EEG来作为第四个模态的呢？而且我发现这几个模态的脑图谱不是共享的（好像AAL是可以用作fmri和smri），那这个节点数要怎么对齐啊？

（3）但其实这篇文章用不同的脑图谱也是在解决（2）这个问题诶，等我看看

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

①Existing GNNs ignore the node feature （其实有些还是带了） and graph reconstruction （看完后面的，作者似乎认为一次编码一次解码就是图重构）

②They proposed a novel Multi-View Attribute Graph Convolution Networks (MAGCN) with two-pathway encoders for clustering. The first pathway is multiview attribute graph attention networks that can reduces noise, redundancy and learns the embedding features of multi-view graph data. The second pathway is consistent embedding encoders, which is able to capture the geometric relationship and the consistency of probability distribution among different views

2.2. Introduction

①Transform graph data into low dimensional (减少特征数量~), compact (不那么分散) and continuous (不是离散数据) feature space is the capability of graph embedding

②GNN is suitable for handling single-view data rather than multi-view

③The limits of existing multi-view models: a) can not assign different weights for different neighbors, b) might ignore the node feature or graph reconstruction, c) do not consider the similarities between different views（啊啊这个也可以考虑的吗我要看看你是怎么考虑的）

④Existing GNN mostly focus on multi-graph（又是什么的多图？）instead of multi-attribute（你举社交网络的例子干嘛？说人们可以有多个属性，如工作、爱好等，那脑图的属性是啥？）

paragon n. 完美典范，尽善尽美的人（或物）；（100克拉以上的）无暇钻石

2.3. Related Work

The authors enumerate some neighbor aggregation, attention based and multi view models

2.4. Proposed Methodology

2.4.1. Notation

①Defining a graph $\mathbf{G}=\mathbf{(V,E)}(\mathbf{G}\in\mathbb{R}^{n\times n})$ , where $\mathbf{V}=\{v_{1},v_{2},...,v_{n}\}$ denotes node set, $\mathbf{E}$ denotes edge set, $n$ denotes the number of nodes

②The attribute feature of nodes: $\mathbf{X}_{m}=\{x_{m}^{1},...,x_{m}^{i},...,x_{m}^{n}\}(\mathbf{X}_{m}\in \mathbb{R}^{n\times d_{m}}),m=1,2,...,M$ , where $M$ denotes the number of views

2.4.2. The Framework of MAGCN

①The overall framework:

they first encode $\mathbf{X}_{m}$ to graph embedding $\mathbf{H}_{m}=\{h_{m}^{1},...,h_{m}^{i},...,h_{m}^{n}\}(\mathbf{H}_{m}\in \mathbb{R}^{n\times d})$ by multi-view attribute graph convolution encoders (green).Then transforming $\mathbf{H}_{m}$ to consistent clustering embedding $\mathbf{Z}$ by consistent embedding encoders (purple)

（1）Multi-view Attribute Graph Convolution Encoder

①The graph embedding function can be simply expressed as $f_m(\mathbf{G},\mathbf{X}_m;\theta)\to\mathbf{H}_m$ , where $\theta$ denotes the auto-encoder parameter

② Part of Multi-view Attribute Graph Convolution Encoder (MAGCE) for view $m$ :

③The $l$ -th output of MAGCE:

$\mathbf{H}_{m}^{(l)}=\sigma\left(\mathbf{D}^{-\frac12}\mathbf{G}^{\prime}\mathbf{D}^{-\frac12}\mathbf{H}_{m}^{(l-1)}\mathbf{W}^{(l)}\right)$

where $\mathbf{G^{\prime}}=\mathbf{G}+\mathbf{I}_{N}$ denotes the " relevance coefficient matrix with added self-connection"（这是功能连接矩阵加了I还是邻接矩阵加了I还是别的啊）;

$\mathbf{D}_{ii}=\sum_{j}\mathbf{G}^{\prime}{}_{ij}$ ;

$\sigma$ denotes the activate function

④The $l$ starts from $0$ and end with $L$

⑤The learnable relevance matrix $\mathbf{S}$ in the $l$ -th layer:

$\mathbf{S}=\varphi\left(\mathbf{G}\odot t_s^{(l)}\mathbf{H}_m^{(l)}\mathbf{W}^{(l)}+\mathbf{G}\odot t_r^{(l)}\mathbf{H}_m^{(l)}\mathbf{W}^{(l)}\right)$

where $t_s^{(l)}$ and $t_n^{(l)}\in\mathbb{R}^{1\times d_l}$ denote the trainable parameters, $\varphi$ denote activation function

⑥Normalizing $\mathbf{S}$ to get the final relevance coefficient $\mathbf{G}$ :

$\mathbf{G}_{ij}=\frac{\exp\left(\mathbf{S}_{ij}\right)}{\sum_{k\in\mathbf{N}_{i}}\exp\left(\mathbf{S}_{ik}\right)}$

where $\mathbf{N}_{i}$ denotes the neighbors of node $i$

⑦The output of the $\left ( i-1 \right )$ -th layer multi-view attribute graph convolution decoders:

$\hat{\mathbf{H}}_{m}^{(l-1)}=\sigma\left(\mathbf{\hat{D}}^{-\frac{1}{2}}\mathbf{\hat{G}}^{\prime}\mathbf{\hat{D}}^{-\frac{1}{2}}\mathbf{\hat{H}}_{m}^{(l)}\mathbf{\hat{W}}^{(l)}\right)$

⑧The reconstructed graph structure $\mathbf{\hat{G}}_{m}^{ij}=\phi(-h_{m}^{i}\stackrel{\top}{h_{m}^{j}})$ and $\phi \left ( \cdot \right )$ denotes the inner product operator

⑨The reconstruction loss:

$\mathcal{L}_{re}=\min_{\theta}\sum_{i=1}^{M}\left\|\mathbf{X}_{i}-\hat{\mathbf{X}}_{i}\right\|_{F}^{2}+\lambda_{1}\sum_{i=1}^{M}\left\|\mathbf{G}-\hat{\mathbf{G}}_{i}\right\|_{F}^{2}$

（2）Consistent Embedding Encoders

①Reducing the dimensionality of graph embeddings by mapping function:

$g_{m}(\mathbf{H}_{m};\eta)\to\mathbf{Z}_{m}$

where $\eta$ denotes the encoder parameter

②The similarity between two views can be Manhattan Distance, Euclidean distance, cosine similarity, etc.

③The loss function of geometric relationship consistency:

$\mathcal{L}_{geo}=\min_{\eta}\sum_{i\neq j}^{M}\left\|\mathbf{Z}_{i}-\mathbf{Z}_{j}\right\|_{F}^{2}$

④Defining the adaptive fusion $\mathbf{Z}\mathrm{=}\sum_{m=1}^{M}\beta_{i}\mathbf{Z}_{i}$

⑤The original probability distribution $\mathbf{Q}$ of $\mathbf{Z}$ with t-distribution:

$q_{ij}=\frac{(1+\|z_i-\mu_j\|^2/\alpha)^{-\frac{\alpha+1}{2}}}{\sum_{j'}\left(1+\|z_i-\mu_{j'}\|^2/\alpha\right)^{-\frac{\alpha+1}{2}}}$

where $\{\mu_{j}\}_{j=1}^{k}$ denotes the $k$ initial clauster centroids, $\alpha$ denotes the degree of freedom, $q_{ij}$ denotes the probability of assigning node $i$ to cluster $j$

⑥The target probability distribution $\mathbf{P}$ of $\mathbf{Z}$ :

$p_{ij}=\frac{q_{ij}^{2}/f_{i}}{\sum_{j'}q_{ij'}^{2}/f_{j'}}$

where $f_{j}=\sum_{i}q_{ij}$ denotes soft cluster frequencies

⑦Loss（我真看不懂前面叭叭的一大堆，但这loss看上去其实也就是L2）:

$\mathcal{L}_{pro}=\min_{\eta}\sum_{m=1}^{M}\rho_{m}\left\|\mathbf{Q}_{m}-\mathbf{P}\right\|_{F}^{2}$

2.4.3. Task for Clustering

①Total loss function:

${\mathcal L}=\min_{g,c,\mathbf{P}}{\mathcal L}_{re}+\lambda_{2}{\mathcal L}_{geo}+\lambda_{3}{\mathcal L}_{pro}$

②Clustering label of node $i$ : $y_i=\arg_k\max_k\left(p_{ik}\right)$

2.5. Task for Clustering

2.5.1. Experimental Setting

（1）Metrics and Databases

①Dataset: Cora, Citeseer, Pubmed

②Evaluation metrics: clustering accuracy (ACC), normalized mutual information (NMI) and average rand index (ARI)

③View 2 creating: adopting Fast Fourier Transform (FFT), Gabor transform, Euler transform and Cartesian product in view 1

（2）Implementation Details

①The node representation dimensions of the two layer in Cora is [512, 512], in Citeseer is [2000, 512], in Pubmed is [128, 64].

②They adopt fully connected layer in in integrate-encoder in all datasets

③Activate function: ReLU

④ $\lambda _1=1,\lambda _2=10^{-2},\lambda _3=10^2$

（3）Comparison Algorithms

①node attribute: K-Means

②graph structure: Graph Encoder, DeepWalk, denoising autoencoder for graph embedding (DNGR) and modularized nonnegative matrix factorization (M-NMF)

③graph structure & node attribute: graph autoencoders (GAE) and variational graph auto-encoders (VGAE), marginalized graph autoencoder (MGAE), adversarial regularized graph autoencoder (ARGAE) and adversarial variational regularized graph autoencoder (ARVGAE), deep attentional embedding graph clustering (DAEGA) and graph attention auto-encoders (GATE)

④deep multi-view clustering: deep canonical correlation analysis (DCCA) and deep typical correlated autoencoder (DCCAE)

2.5.2. Experimental Results

（1）Evaluation Metrics with Comparison Algorithms

①Comparison table:

（2）Analysis of Probability Distribution Consistency

①Through iterations, $\mathbf{Q}_1$ , $\mathbf{Q}_2$ and $\mathbf{P}$ steadily learn more accurate prediction capability:

where the x-axis denotes the clusters and the y-axis denotes the cluster probability

（3）Impact of Parameters

①Controlling variables method is used for analyzing the three regularization parameters:

（4）Analyzing Different View 2

①Comparison that using different methods to construct view 2

2.6. Conclusions

As a model which contains dual encoders, MAGCN reconstructs the high dimensional features and integrates low dimensional consistent information

3. 知识补充

3.1. Discrete Data

（1）定义：

不连续的特征空间主要指的是在特征空间中，某些特征值或特征组合之间存在间隙或跳跃，没有形成连续的分布或变化。这样的特征空间在实际应用中很常见，尤其当处理离散数据、分类数据或具有明显边界的数据时。以下是一些不连续特征空间的例子：

分类数据：例如，在描述一个人的性别时，通常使用“男”或“女”这样的标签，这些标签之间是不连续的。类似地，描述血型（A、B、AB、O）或民族时也存在不连续性。
整数特征：当特征值只能取整数时，特征空间也是不连续的。例如，描述一个物体的数量或某个指标的评分等级时。
二进制特征：二进制特征只能取0或1，这种特征空间显然是不连续的。这在很多计算机视觉和机器学习的应用中都很常见，比如某些特征是否被激活或存在。
时间戳数据：虽然时间本身是连续的，但当我们以特定的时间间隔（如小时、天、月等）来记录数据时，特征空间就变得不连续了。
地理数据：在地理信息系统中，地理坐标（经度和纬度）虽然理论上可以是连续的，但由于数据获取的限制或处理的需要，可能只记录特定地点的数据，从而形成不连续的特征空间。
基因序列数据：在生物信息学中，基因序列由一系列离散的碱基对（A、T、C、G）组成，这些碱基对之间的变化是不连续的。
文本数据：在处理文本数据时，词语或短语作为特征，它们之间的转换通常也是不连续的。尽管可以通过词嵌入等方法将文本数据映射到连续空间，但原始的词或短语空间仍然是不连续的。

（2）例子：

脑图里面如果要用到什么性别年龄来做特征就是不连续的。智商和时间序列算连续吗？

3.2. Multi-modality and multi-view

（1）Multi-modality

感觉在脑图这边，不同的成像方式（EEG, fmri, smri, DTI, CT）之类的叫多模态

（2）Multi-view

文中给的multi-view是不同的脑图谱，AAL-90，AAL-120之类的

（3）？

那么问题来了，叠皮尔逊FC+其他的FC+邻接矩阵这样的通道算什么呢

4. Reference List

Cheng, J. et al. (2020) 'Multi-View Attribute Graph Convolution Networks for Clustering', Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pp. 2973-2979. doi: Multi-View Attribute Graph Convolution Networks for Clustering | IJCAI