[论文精读]Multi-View Attribute Graph Convolution Networks for Clustering

论文网址:用于聚类的多视图属性图卷积网络 |IJCAI公司

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 省流版

1.1. 心得

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

2.2. Introduction 

2.3. Related Work

2.4. Proposed Methodology

2.4.1. Notation

2.4.2. The Framework of MAGCN

2.4.3. Task for Clustering

2.5. Task for Clustering

2.5.1.  Experimental Setting

2.5.2. Experimental Results

2.6. Conclusions

3. 知识补充

3.1. Discrete Data

3.2. Multi-modality and multi-view

4. Reference List


1. 省流版

1.1. 心得

(1)走错频道了,怎么是机器学习,那就看看创新点吧

(2)感觉多模态很多都是说smri, fmri, DTI三个的。但是好像EEG也能变成脑图诶,为什么似乎到目前为止没有看到把EEG来作为第四个模态的呢?而且我发现这几个模态的脑图谱不是共享的(好像AAL是可以用作fmri和smri),那这个节点数要怎么对齐啊?

(3)但其实这篇文章用不同的脑图谱也是在解决(2)这个问题诶,等我看看

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

        ①Existing GNNs ignore the node feature (其实有些还是带了) and graph reconstruction (看完后面的,作者似乎认为一次编码一次解码就是图重构

        ②They proposed a novel Multi-View Attribute Graph Convolution Networks (MAGCN) with two-pathway encoders for clustering. The first pathway is multiview attribute graph attention networks that can reduces noise, redundancy and learns the embedding features of multi-view graph data. The second pathway is consistent embedding encoders, which is able to capture the geometric relationship and the consistency of probability distribution among different views

2.2. Introduction 

        ①Transform graph data into low dimensional (减少特征数量~), compact (不那么分散) and continuous (不是离散数据) feature space is the capability of graph embedding

        ②GNN is suitable for handling single-view data rather than multi-view

        ③The limits of existing multi-view models: a) can not assign different weights for different neighbors, b) might ignore the node feature or graph reconstruction, c) do not consider the similarities between different views(啊啊这个也可以考虑的吗我要看看你是怎么考虑的)

        ④Existing GNN mostly focus on multi-graph(又是什么的多图?)instead of multi-attribute(你举社交网络的例子干嘛?说人们可以有多个属性,如工作、爱好等,那脑图的属性是啥?

 paragon  n. 完美典范,尽善尽美的人(或物);(100克拉以上的)无暇钻石

2.3. Related Work

        The authors enumerate some neighbor aggregation, attention based and multi view models

2.4. Proposed Methodology

2.4.1. Notation

        ①Defining a graph \mathbf{G}=\mathbf{(V,E)}(\mathbf{G}\in\mathbb{R}^{n\times n}), where \mathbf{V}=\{v_{1},v_{2},...,v_{n}\} denotes node set, \mathbf{E} denotes edge set, n denotes the number of nodes

        ②The attribute feature of nodes: \mathbf{X}_{m}=\{x_{m}^{1},...,x_{m}^{i},...,x_{m}^{n}\}(\mathbf{X}_{m}\in \mathbb{R}^{n\times d_{m}}),m=1,2,...,M, where M denotes the number of views

2.4.2. The Framework of MAGCN

        ①The overall framework:

they first encode \mathbf{X}_{m} to graph embedding \mathbf{H}_{m}=\{h_{m}^{1},...,h_{m}^{i},...,h_{m}^{n}\}(\mathbf{H}_{m}\in \mathbb{R}^{n\times d}) by multi-view attribute graph convolution encoders (green).Then transforming \mathbf{H}_{m} to consistent clustering embedding \mathbf{Z} by consistent embedding encoders (purple)

(1)Multi-view Attribute Graph Convolution Encoder

        ①The graph embedding function can be simply expressed as f_m(\mathbf{G},\mathbf{X}_m;\theta)\to\mathbf{H}_m, where \theta denotes the auto-encoder parameter

        ② Part of Multi-view Attribute Graph Convolution Encoder (MAGCE) for view m:

        ③The l-th output of MAGCE:

\mathbf{H}_{m}^{(l)}=\sigma\left(\mathbf{D}^{-\frac12}\mathbf{G}^{\prime}\mathbf{D}^{-\frac12}\mathbf{H}_{m}^{(l-1)}\mathbf{W}^{(l)}\right)

where \mathbf{G^{\prime}}=\mathbf{G}+\mathbf{I}_{N} denotes the " relevance coefficient matrix with added self-connection"(这是功能连接矩阵加了I还是邻接矩阵加了I还是别的啊);

\mathbf{D}_{ii}=\sum_{j}\mathbf{G}^{\prime}{}_{ij};

\sigma denotes the activate function

        ④The l starts from 0 and end with L

        ⑤The learnable relevance matrix \mathbf{S} in the l-th layer:

\mathbf{S}=\varphi\left(\mathbf{G}\odot t_s^{(l)}\mathbf{H}_m^{(l)}\mathbf{W}^{(l)}+\mathbf{G}\odot t_r^{(l)}\mathbf{H}_m^{(l)}\mathbf{W}^{(l)}\right)

where t_s^{(l)} and t_n^{(l)}\in\mathbb{R}^{1\times d_l} denote the trainable parameters, \varphi denote activation function 

        ⑥Normalizing \mathbf{S} to get the final relevance coefficient \mathbf{G}:

\mathbf{G}_{ij}=\frac{\exp\left(\mathbf{S}_{ij}\right)}{\sum_{k\in\mathbf{N}_{i}}\exp\left(\mathbf{S}_{ik}\right)}

where \mathbf{N}_{i} denotes the neighbors of node i

        ⑦The output of the \left ( i-1 \right )-th layer multi-view attribute graph convolution decoders:

\hat{\mathbf{H}}_{m}^{(l-1)}=\sigma\left(\mathbf{\hat{D}}^{-\frac{1}{2}}\mathbf{\hat{G}}^{\prime}\mathbf{\hat{D}}^{-\frac{1}{2}}\mathbf{\hat{H}}_{m}^{(l)}\mathbf{\hat{W}}^{(l)}\right)

        ⑧The reconstructed graph structure \mathbf{\hat{G}}_{m}^{ij}=\phi(-h_{m}^{i}\stackrel{\top}{h_{m}^{j}}) and \phi \left ( \cdot \right ) denotes the inner product operator

        ⑨The reconstruction loss:

\mathcal{L}_{re}=\min_{\theta}\sum_{i=1}^{M}\left\|\mathbf{X}_{i}-\hat{\mathbf{X}}_{i}\right\|_{F}^{2}+\lambda_{1}\sum_{i=1}^{M}\left\|\mathbf{G}-\hat{\mathbf{G}}_{i}\right\|_{F}^{2}

(2)Consistent Embedding Encoders

        ①Reducing the dimensionality of graph embeddings by mapping function:

g_{m}(\mathbf{H}_{m};\eta)\to\mathbf{Z}_{m}

where \eta denotes the encoder parameter

        ②The similarity between two views can be Manhattan Distance, Euclidean distance, cosine similarity, etc.

        ③The loss function of geometric relationship consistency:

\mathcal{L}_{geo}=\min_{\eta}\sum_{i\neq j}^{M}\left\|\mathbf{Z}_{i}-\mathbf{Z}_{j}\right\|_{F}^{2}

        ④Defining the adaptive fusion \mathbf{Z}\mathrm{=}\sum_{m=1}^{M}\beta_{i}\mathbf{Z}_{i}

        ⑤The original probability distribution \mathbf{Q} of \mathbf{Z} with t-distribution: 

q_{ij}=\frac{(1+\|z_i-\mu_j\|^2/\alpha)^{-\frac{\alpha+1}{2}}}{\sum_{j'}\left(1+\|z_i-\mu_{j'}\|^2/\alpha\right)^{-\frac{\alpha+1}{2}}}

where \{\mu_{j}\}_{j=1}^{k} denotes the k initial clauster centroids, \alpha denotes the degree of freedom, q_{ij} denotes the probability of assigning node i to cluster j

        ⑥The target probability distribution \mathbf{P} of \mathbf{Z}:

p_{ij}=\frac{q_{ij}^{2}/f_{i}}{\sum_{j'}q_{ij'}^{2}/f_{j'}}

where f_{j}=\sum_{i}q_{ij} denotes soft cluster frequencies

        ⑦Loss(我真看不懂前面叭叭的一大堆,但这loss看上去其实也就是L2):

\mathcal{L}_{pro}=\min_{\eta}\sum_{m=1}^{M}\rho_{m}\left\|\mathbf{Q}_{m}-\mathbf{P}\right\|_{F}^{2}

2.4.3. Task for Clustering

        ①Total loss function:

{\mathcal L}=\min_{g,c,\mathbf{P}}{\mathcal L}_{re}+\lambda_{2}{\mathcal L}_{geo}+\lambda_{3}{\mathcal L}_{pro}

        ②Clustering label of node iy_i=\arg_k\max_k\left(p_{ik}\right)

2.5. Task for Clustering

2.5.1.  Experimental Setting

(1)Metrics and Databases

        ①Dataset: Cora, Citeseer, Pubmed

        ②Evaluation metrics: clustering accuracy (ACC), normalized mutual information (NMI) and average rand index (ARI)

        ③View 2 creating: adopting Fast Fourier Transform (FFT), Gabor transform, Euler transform and Cartesian product in view 1

(2)Implementation Details

        ①The node representation dimensions of the two layer in Cora is [512, 512], in Citeseer is [2000, 512], in Pubmed is [128, 64].    

        ②They adopt fully connected layer in in integrate-encoder in all datasets

        ③Activate function: ReLU

        ④\lambda _1=1,\lambda _2=10^{-2},\lambda _3=10^2

(3)Comparison Algorithms

        ①node attribute: K-Means

        ②graph structure: Graph Encoder, DeepWalk, denoising autoencoder for graph embedding (DNGR) and modularized nonnegative matrix factorization (M-NMF)

        ③graph structure & node attribute: graph autoencoders (GAE) and variational graph auto-encoders (VGAE), marginalized graph autoencoder (MGAE), adversarial regularized graph autoencoder (ARGAE) and adversarial variational regularized graph autoencoder (ARVGAE), deep attentional embedding graph clustering (DAEGA) and graph attention auto-encoders (GATE)

         ④deep multi-view clustering: deep canonical correlation analysis (DCCA) and deep typical correlated autoencoder (DCCAE)

2.5.2. Experimental Results

(1)Evaluation Metrics with Comparison Algorithms

        ①Comparison table:

(2)Analysis of Probability Distribution Consistency

        ①Through iterations, \mathbf{Q}_1\mathbf{Q}_2 and \mathbf{P} steadily learn more accurate prediction capability:

where the x-axis denotes the clusters and the y-axis denotes the cluster probability

(3)Impact of Parameters

        ①Controlling variables method is used for analyzing the three regularization parameters:

(4)Analyzing Different View 2

        ①Comparison that using different methods to construct view 2

2.6. Conclusions

        As a model which contains dual encoders, MAGCN reconstructs the high dimensional features and integrates low dimensional consistent information

3. 知识补充

3.1. Discrete Data

(1)定义:

不连续的特征空间主要指的是在特征空间中,某些特征值或特征组合之间存在间隙或跳跃,没有形成连续的分布或变化。这样的特征空间在实际应用中很常见,尤其当处理离散数据、分类数据或具有明显边界的数据时。以下是一些不连续特征空间的例子:

  1. 分类数据:例如,在描述一个人的性别时,通常使用“男”或“女”这样的标签,这些标签之间是不连续的。类似地,描述血型(A、B、AB、O)或民族时也存在不连续性。

  2. 整数特征:当特征值只能取整数时,特征空间也是不连续的。例如,描述一个物体的数量或某个指标的评分等级时。

  3. 二进制特征:二进制特征只能取0或1,这种特征空间显然是不连续的。这在很多计算机视觉和机器学习的应用中都很常见,比如某些特征是否被激活或存在。

  4. 时间戳数据:虽然时间本身是连续的,但当我们以特定的时间间隔(如小时、天、月等)来记录数据时,特征空间就变得不连续了。

  5. 地理数据:在地理信息系统中,地理坐标(经度和纬度)虽然理论上可以是连续的,但由于数据获取的限制或处理的需要,可能只记录特定地点的数据,从而形成不连续的特征空间。

  6. 基因序列数据:在生物信息学中,基因序列由一系列离散的碱基对(A、T、C、G)组成,这些碱基对之间的变化是不连续的。

  7. 文本数据:在处理文本数据时,词语或短语作为特征,它们之间的转换通常也是不连续的。尽管可以通过词嵌入等方法将文本数据映射到连续空间,但原始的词或短语空间仍然是不连续的。

(2)例子:

脑图里面如果要用到什么性别年龄来做特征就是不连续的。智商和时间序列算连续吗?

3.2. Multi-modality and multi-view

(1)Multi-modality

感觉在脑图这边,不同的成像方式(EEG, fmri, smri, DTI, CT)之类的叫多模态

(2)Multi-view

文中给的multi-view是不同的脑图谱,AAL-90,AAL-120之类的

(3)?

那么问题来了,叠皮尔逊FC+其他的FC+邻接矩阵这样的通道算什么呢

4. Reference List

Cheng, J. et al. (2020) 'Multi-View Attribute Graph Convolution Networks for Clustering', Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pp. 2973-2979. doi: Multi-View Attribute Graph Convolution Networks for Clustering | IJCAI

  • 17
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值