[论文精读]Multi-View Multi-Graph Embedding for Brain Network Clustering Analysis

论文原文:3504035.3504050 (acm.org)

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 省流版

1.1. 心得

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related work

2.4. Preliminaries

2.5. Methodology

2.5.1. Problem definition

2.5.2. M2E approach

2.5.3. Optimization framework

2.6. Experiments and evaluation

2.6.1. Data collection and preprocessing

2.6.2. Baselines and metrics

2.6.3. Clustering results

2.6.4. Parameter sensitivity analysis

2.6.5. Factor analysis

2.7. Conclusion

3. 知识补充

3.1. 偏对称张量

4. Reference


1. 省流版

1.1. 心得

(1)这个好像不是深度学习捏~

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

        ①They proposed a Multi-view Multigraph Embedding (M2E) to get information from different views

2.2. Introduction

        ①The conceptual view of M2E:

2.3. Related work

        ①Introducing graph embedding methods

        ②Compared with multi-view clustering and multi-view embedding

2.4. Preliminaries

        ①Notations:

        ②Definition 1: introducing partial symmetric tensor(不过我觉得作者没有解释地很清楚,他说“如果一个M阶张量在模态1到M上偏对称,那么它就是秩一偏对称张量”。不如看看我的知识补充)

        ③Definition 2: matricize tensor \mathcal{X}\in\mathbb{R}^{I_{1}\times\cdots\times I_{M}} to \mathbf{X}_{(m)}\in \mathbb{R}^{I_m\times J}, where 

\begin{aligned}&j=1+\sum_{p=1,p\neq m}^{M}(i_{p}-1)J_{p}, with\\&J_{p}=\begin{cases}1,&if p=1 or (p=2 and m=1)\\\Pi_{q=1,q\neq m}^{p-1}I_q,&otherwise.\end{cases}\end{aligned}

        ④Definition 3: factorize \mathcal{X}\in\mathbb{R}^{I_{1}\times\cdots\times I_{M}} to:

\mathcal{X}\approx\sum_{r=1}^R\mathbf{x}_r^{(1)}\circ\cdots\circ\mathbf{x}_r^{(M)}\equiv[[\mathbf{X}^{(1)},...,\mathbf{X}^{(M)}]]

which needs to minimize the estimation error:

\mathcal{L}=\min_{\mathbf{X}^{(1)},\cdots,\mathbf{X}^{(M)}}\lVert\mathcal{X}-[[\mathbf{X}^{(1)},\cdots,\mathbf{X}^{(M)}]]\rVert_F^2

and, to solve non convex optimization problems:

\mathbf{X}^{(k)}\leftarrow\arg\min_{\mathbf{X}^{(k)}}\|\mathbf{X}_{(k)}-\mathbf{X}^{(k)}(\odot_{i\neq k}^n\mathbf{X}^{(i)})^\mathrm{T}\|_F^2

where \odot_{i\neq k}^{M}\mathbf{X}^{(i)}=\mathbf{X}^{(M)}\odot\cdots\mathbf{X}^{(k-1)}\odot\mathbf{X}^{(k+1)}\cdots\odot\mathbf{X}^{(1)}

2.5. Methodology

2.5.1. Problem definition

        ①For N samples with V views, they have brain connectivity \mathbf{W}\in\mathbb{R}^{M\times M} each with M nodes

        ②For each view, the whole graph set is \mathcal{D}^{(v)}=\{\mathbf{W}_{1}^{(v)},\mathbf{W}_{2}^{(v)},\cdots,\mathbf{W}_{N}^{(v)}\}

        ③All the views: \mathcal{D} = \{\mathcal{D}^{(1)},\mathcal{D}^{(2)},\cdots,\mathcal{D}^{(V)}\}

        ④To learn an embedding \mathbf{F}^*\in\mathbb{R}^{N\times R} for each participant 

2.5.2. M2E approach

        ①Concatenated third-order tensor: 

\mathcal{X}^{(v)}=[\mathbf{W}_1^{(v)},\mathbf{W}_2^{(v)},\cdots,\mathbf{W}_N^{(v)}]\in \mathbb{R}^{M\times M\times N},v \in [1 : V]

        ②Embedding function:

\min_{\mathbf{H}^{(v)},\mathbf{F}^{(v)}}\sum_{v=1}^V||\mathcal{X}^{(v)}-[[\mathbf{H}^{(v)},\mathbf{H}^{(v)},\mathbf{F}^{(v)}]]||_F^2

where \mathbf{H}^{(v)}\in\mathbb{R}^{M\times R} and \mathbf{F}^{(v)}\in\mathbb{R}^{N\times R} calculated by CP factorization:

        ③Common embedding learning:

\min_{\mathbf{F}^*}\sum_{v=1}^V\lambda_v||\mathbf{F}^{(v)}-\mathbf{F}^*||_F^2

        ④Combining them to optimize M2E:

\begin{aligned}\mathcal{O}&=\min_{\mathbf{H}^{(v)},\mathbf{F}^{*},\mathbf{F}^{(v)}}\sum_{v=1}^{V}||\mathcal{X}^{(v)}-[[\mathbf{H}^{(v)},\mathbf{H}^{(v)},\mathbf{F}^{(v)}]]||_{F}^{2}\\&+\sum_{v=1}^{V}\lambda_{v}||\mathbf{F}^{(v)}-\mathbf{F}^{*}||_{F}^{2}\end{aligned}

where the first term is for minimize the dependence of multi-graphs and the second is for multi-views

2.5.3. Optimization framework

        ①Parameter needs estimate: \mathbf{H}^{(v)}\in\mathbb{R}^{M\times R}\mathbf{F }^{(v)}\in\mathbb{R}^{N\times R}, and \mathbf{F}^{*}\in\mathbb{R}^{N\times R}. Due to they are not convex, no closed-form adopted. Then they introduced an iteration method, Alternating Direction Method of Multipliers (ADMM) approach.

        ②They use variable substitution technique, fixing \mathbf{F }^{(v)} and \mathbf{F}^{*}, compute \mathbf{H}^{(v)}:

\begin{aligned}&\min_{\mathbf{H}^{(v)},\mathbf{P}^{(v)}}||\mathcal{X}^{(v)}-[[\mathbf{H}^{(v)},\mathbf{P}^{(v)},\mathbf{F}^{(v)}]]||_{F}^{2}\\&s.t. \mathbf{H}^{(v)}= \mathbf{P}^{(v)}\end{aligned}

the Lagragian function:

\mathcal{L}(\mathbf{H}^{(v)},\mathbf{P}^{(v)})=\|\mathcal{X}^{(v)}-[\mathbf{H}^{(v)},\mathbf{P}^{(v)},\mathbf{F}^{(v)}]\|_{F}^{2}\\+tr(\mathbf{U}^{(v)T}(\mathbf{H}^{(v)}-\mathbf{P}^{(v)}))+\frac{\mu}{2}\|\mathbf{H}^{(v)}-\mathbf{P}^{(v)}\|_{F}^{2}

where \mathbf{U}^{(v)}\in\mathbb{R}^{M\times R} denotes Lagrange multipliers, \mu denotes penalty parameter. Optimization problem:

\min_{\mathbf{H}^{(v)}}||\mathbf{X}_{(1)}^{(v)}-\mathbf{H}^{(v)}\mathbf{D}^{(v)\text{T}}||_F^2+\frac{\mu}{2}||\mathbf{H}^{(v)}-\mathbf{P}^{(v)}+\frac{1}{\mu}\mathbf{U}^{(v)}||_F^2

they transfer \mathcal{X}^{(v)} to \mathbf{X}_{(1)}^{(v)}\in\mathbb{R}^{M\times(MN)}, and define \mathbf{D}^{(v)}=\mathbf{F}^{(v)}\odot\mathbf{P}^{(v)}\in\mathbb{R}^{(NM)\times R}

. Further changing the minimizing function:

\min_{\mathbf{H}^{(v)}}tr(\mathbf{H}^{(v)}\mathbf{A}^{(v)}\mathbf{H}^{(v)^{\mathrm{T}}})-tr(\mathbf{B}^{(v)^{\mathrm{T}}}\mathbf{H}^{(v)})

where \mathbf{A}^{(v)}=\mathbf{D}^{(v)^{\mathrm{T}}}\mathbf{D}^{(v)}+\frac{\mu}{2}\mathbf{I} and \mathbf{B}^{(v)}=2\mathbf{X}_{(1)}^{(v)}\mathbf{D}^{(v)}+\mu\mathbf{P}^{(v)}-\mathbf{U}^{(v)}. Solving it by update \mathbf{H}^{(v)}

\mathbf{H}_{t+1}^{(v)}\leftarrow\mathbf{H}_t^{(v)}-\frac1{L^{(v)}}(2\mathbf{H}^{(v)^\mathrm{T}}\mathbf{A}^{(v)}-\mathbf{B}^{(v)})

where L^{(v)} denotes Lipschitz coefficient and equals to the maximum eigenvalue of 2\mathbf{A}^{(v)}. They applied Khatri-Rao product to calculate \mathbf{D}^{(v)^\mathrm{T}}\mathbf{D}^{(v)}:

\begin{aligned} \mathbf{D}^{(v)^{\mathrm{T}}}\mathbf{D}^{(v)}& =(\mathbf{F}^{(v)}\odot\mathbf{P}^{(v)^{\mathrm{T}}})(\mathbf{F}^{(v)}\odot\mathbf{P}^{(v)}) \\ &=(\mathbf{F}^{(v)^{\mathrm{T}}}\mathbf{F}^{(v)})*(\mathbf{P}^{(v)^{\mathrm{T}}}\mathbf{P}^{(v)}) \end{aligned}

where \ast denotes Hadamard product. The updating function of \mathrm{P}^{(v)}:

\mathbf{P}_{t+1}^{(v)}\leftarrow\mathbf{P}_t^{(v)}-\frac1{L^{(v)}}(2\mathbf{P}_t^{(v)}\mathbf{A}^{(v)}-\mathbf{B}^{(v)})

where \mathbf{A}^{(v)}=\mathbf{E}^{(v)^{\mathrm{T}}}\mathbf{E}^{(v)}+\frac\mu2(\mathbf{I})\mathbf{B}^{(v)}=2\mathbf{X}_{(2)}^{(v)}\mathbf{E}^{(v)}+\mu\mathbf{H}^{(v)}+\mathbf{U}^{(v)}\mathbf{E}^{(v)}=\mathbf{F}^{(v)}\odot\mathbf{H}^{(v)}\in\mathbb{R}^{(NM)\times R}. Lastly update \mathrm{U}(v):

\mathbf{U}_t^{(v)}\leftarrow\mathbf{U}_t^{(v)}+\mu(\mathbf{H}^{(v)}-\mathbf{P}^{(v)})

        ③Then they fix \mathbf{F}^{*} and \mathbf{H}^{(v)} to compute \mathbf{F }^{(v)} by minimize:

\min_{\mathbf{F}^{(v)}} ||\mathbf{X}_{(3)}^{(v)}-\mathbf{F}^{(v)}\mathbf{J}^{(v)^{\mathrm{T}}}||_{F}^{2}+\lambda_{(v)}||\mathbf{F}^{(v)}-\mathbf{F}^{*}||_{F}^{2}

where \mathbf{J}^{(v)}=\mathbf{P}^{(v)}\odot\mathbf{H}^{(v)}\in\mathbb{R}^{(MM)\times R}. The updating function of \mathbf{F }^{(v)}:

\mathbf{F}_{t+1}^{(v)}\leftarrow\mathbf{F}_t^{(v)}-\frac{1}{L^{(v)}}(2\mathbf{F}_t^{(v)}\mathbf{A}^{(v)}-\mathbf{B}^{(v)})

where \mathbf{A}^{(v)} = \mathbf{J}^{(v)^\mathrm{T}}\mathbf{J}^{(v)} + \lambda_{(v)}(\mathbf{I})\mathbf{B}^{v} = 2\mathbf{X}_{(3)}^{(v)}\mathbf{J}^{(v)} +2\lambda_{(v)}\mathbf{F}^*

        ④Finally, they fix \mathbf{H}^{(v)} and \mathbf{F }^{(v)} to minimize {\mathcal{O}} over \mathbf{F}^{*}:

\mathbf{F}^*=\frac{\sum_{v=1}^V\lambda_{(v)}\mathbf{F}^{(v)}}{\sum_{v=1}^V\lambda_{(v)}}

        ⑤Overall time complexity: 

O(MaxIter(R^{3}+R^{2}(2M+N+1)+(M^{2}N+M+NV)R)V)

2.6. Experiments and evaluation

2.6.1. Data collection and preprocessing

(1)Human Immunodeficiency Virus Infection (HIV)

        ①Sample: randomly select 35 patients and 35 controls from dataset due to the data imbalance

        ②Atlas: AAL 90

(2)Bipolar Disorder (BP)

        ①Sample: 52 BP and 45 controls

        ②Atlas: self-generated 82 regions

euthymia  n. 情感正常

2.6.2. Baselines and metrics

        ①Introducing compared models

        ②Grid search for hyper-parameters: \lambda _1,\lambda _2\in\{10^{-4},10^{-2},...,10^{4}\}R form \{1,2,...,20\}

2.6.3. Clustering results

        ①Performance comparison table:

2.6.4. Parameter sensitivity analysis

        ①Ablation on \lambda:

        ②Ablation on R:

2.6.5. Factor analysis

        ①The activity intensity of the brain region and the embedded feature \mathbf{F }^{(v)}:

2.7. Conclusion

        They design a novel multi-view multi-graph embedding framework based on partially-symmetric tensor factorization

3. 知识补充

3.1. 偏对称张量

(1)定义:偏对称张量是指张量中的某些分量在特定的下标重排后,其值保持不变。这种性质与张量的对称性有关,但与完全对称的张量(即所有下标重排后元素都相等的张量)不同,偏对称张量只要求部分下标重排后元素相等。

(2)示例:以三阶张量为例,如果满足以下条件之一或多个,则可以称为偏对称张量:

        ①x_{ijk}=x_{jik}(第一个和第二个下标互换)

        ②x_{ijk}=x_{kji}(第一个和第三个下标互换)

        ③x_{ijk}=x_{jik}=x_{kij}(同时满足前两个条件)

4. Reference

Liu, Y. et al. (2018) 'Multi-View Multi-Graph Embedding for Brain Network Clustering Analysis', AAAI. doi: https://doi.org/10.48550/arXiv.1806.07703

  • 18
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值