相关系数及其在fMRI数据中的应用
背景知识
皮尔逊相关系数(Pearson correlation coefficient)
ρ i j = ρ ( X i , X j ) = c o v ( X i , X j ) D ( X i ) D ( X j ) \begin{aligned} \rho_{ij} &= \rho \left( \mathbf{X}_i, \mathbf{X}_j \right) \\ &= \frac{cov \left( \mathbf{X}_i, \mathbf{X}_j \right)} {\sqrt{D(\mathbf{X}_i)} \sqrt{D(\mathbf{X}_j)}} \\ \end{aligned} ρij=ρ(Xi,Xj)=D(Xi)D(Xj)cov(Xi,Xj)
肯德尔和谐系数(Kendall’s W)
Kendall’s W解决的原始问题是在不同人对多个事物进行评分时, 如何对这些人评分结果的一致性进行评价.
假设存在
n
n
n件物体, 并由
m
m
m个不同的人进行评分或排序, 且第
j
j
j个人对第
i
i
i件物体的评分为
r
i
j
r_{ij}
rij(在该问题中, 每个评分实际表示的是排序, 为1到n之间的整数且不重复), 则这些评分可表示为
R
=
{
r
i
j
}
n
×
m
\mathbf{R}=\{ r_{ij} \}_{n \times m}
R={rij}n×m
物体
i
i
i的总分为
R
i
=
∑
j
=
1
m
r
i
j
R_i=\sum_{j=1}^m r_{ij}
Ri=j=1∑mrij
所有物体的平均得分为
R
ˉ
=
1
n
∑
i
=
1
n
R
i
=
1
n
∑
i
=
1
n
∑
j
=
1
m
r
i
j
=
1
n
∑
j
=
1
m
n
(
n
+
1
)
2
=
m
(
n
+
1
)
2
\begin{aligned} \bar{R} &=\frac{1}{n} \sum_{i=1}^n R_i \\ &= \frac{1}{n} \sum_{i=1}^n \sum_{j=1}^m r_{ij} \\ &= \frac{1}{n} \sum_{j=1}^m \frac{n(n+1)}{2} \\ &= \frac{m(n+1)}{2} \end{aligned}
Rˉ=n1i=1∑nRi=n1i=1∑nj=1∑mrij=n1j=1∑m2n(n+1)=2m(n+1)
所有物体得分的方差之和为
S
=
∑
i
=
1
n
(
R
i
−
R
ˉ
)
2
S=\sum_{i=1}^n \left( R_i -\bar{R} \right)^2
S=∑i=1n(Ri−Rˉ)2, 则Kendall’s W定义为
W
=
12
S
m
2
(
n
3
−
n
)
W=\frac{12S}{m^2 (n^3-n)}
W=m2(n3−n)12S
其中
m
2
(
n
3
−
n
)
12
\frac{m^2 (n^3-n)}{12}
12m2(n3−n)表示
S
S
S可能出现的最大值, 当且仅当所有人对同一个物体的评分一致时(即评分一致)得到, 而最小值则在
R
i
=
R
ˉ
R_i = \bar{R}
Ri=Rˉ时得到.
max
S
=
∑
i
=
1
n
(
m
i
−
R
ˉ
)
2
=
∑
i
=
1
n
[
m
2
i
2
−
m
2
i
(
n
+
1
)
+
m
2
(
n
+
1
)
2
4
]
=
m
2
n
(
n
+
1
)
⋅
2
n
+
1
6
−
m
2
n
(
n
+
1
)
⋅
n
+
1
4
=
m
2
n
(
n
+
1
)
⋅
n
−
1
12
=
m
2
(
n
3
−
n
)
12
\begin{aligned} \max{S} &= \sum_{i=1}^n \left( mi-\bar{R} \right)^2 \\ &= \sum_{i=1}^n \left[ m^2 i^2 - m^2 i (n+1) + \frac{m^2 (n+1)^2}{4} \right] \\ &= m^2 n (n+1) \cdot \frac{2n+1}{6} - m^2 n (n+1) \cdot \frac{n+1}{4} \\ &= m^2 n (n+1) \cdot \frac{n-1}{12} \\ &= \frac{m^2(n^3-n)}{12} \end{aligned}
maxS=i=1∑n(mi−Rˉ)2=i=1∑n[m2i2−m2i(n+1)+4m2(n+1)2]=m2n(n+1)⋅62n+1−m2n(n+1)⋅4n+1=m2n(n+1)⋅12n−1=12m2(n3−n)
使得
W
W
W的范围缩小至
[
0
,
1
]
\left[0, 1\right]
[0,1].
fMRI数据
功能磁共振成像数据的基本元素为体素的时间序列
X
i
=
[
x
i
1
,
x
i
2
,
⋯
 
,
x
i
T
]
T
\mathbf{X}_i=\left[x_{i1}, x_{i2}, \cdots, x_{iT} \right]^T
Xi=[xi1,xi2,⋯,xiT]T, 则一个被试的全脑数据可表示为
X
=
[
X
1
,
X
2
,
⋯
 
,
X
R
]
T
=
[
x
11
x
12
⋯
x
1
T
x
21
x
22
⋯
x
2
T
⋮
⋮
⋱
⋮
x
R
1
x
R
2
⋯
x
R
T
]
R
×
T
\begin{aligned} \mathbf{X} &=\left[\mathbf{X}_1, \mathbf{X}_2, \cdots, \mathbf{X}_{R} \right]^T \\ &= \left[ \begin{matrix} x_{11} & x_{12} & \cdots & x_{1T} \\ x_{21} & x_{22} & \cdots & x_{2T} \\ \vdots & \vdots & \ddots & \vdots \\ x_{R1} & x_{R2} & \cdots & x_{RT} \end{matrix} \right]_{R \times T} \end{aligned}
X=[X1,X2,⋯,XR]T=⎣⎢⎢⎢⎡x11x21⋮xR1x12x22⋮xR2⋯⋯⋱⋯x1Tx2T⋮xRT⎦⎥⎥⎥⎤R×T
其中
R
=
W
×
H
×
D
R=W \times H \times D
R=W×H×D表示体素的数量,
W
,
H
,
D
W, H, D
W,H,D分别表示脑影像数据在三个维度上的大小,
T
T
T表示时间序列的长度,
X
r
∈
R
T
\mathbf{X}_r \in \mathbb{R}^T
Xr∈RT表示第
R
R
R个体素的时间序列.
根据AAL(Anatominal Atlas Label)模版, 全脑可分为116个脑区, 即将所有体素划分至116个集合中
R
O
I
i
=
{
j
1
,
j
2
,
⋯
 
,
j
R
i
}
i
=
1
,
2
,
⋯
 
,
116
ROI_i=\{ j_1, j_2, \cdots, j_{R_i} \} \quad i=1,2,\cdots,116
ROIi={j1,j2,⋯,jRi}i=1,2,⋯,116
其中,
R
i
R_i
Ri表示第i个脑区包含的体素数量.
功能连接
功能连接的计算步骤如下
- 计算各脑区的平均时间序列
X ˉ i = 1 R i ∑ j ∈ R O I i X j \mathbf{\bar{X}}_i = \frac{1}{R_i} \sum_{j \in ROI_i} \mathbf{X}_j Xˉi=Ri1j∈ROIi∑Xj - 计算两个脑区之间的Pearson相关系数
ρ i j = ρ ( X ˉ i , X ˉ j ) = c o v ( X ˉ i , X ˉ j ) D ( X ˉ i ) D ( X ˉ j ) \begin{aligned} \rho_{ij} &= \rho \left( \bar{\mathbf{X}}_i, \bar{\mathbf{X}}_j \right) \\ &= \frac{cov \left( \bar{\mathbf{X}}_i, \bar{\mathbf{X}}_j \right)} {\sqrt{D(\bar{\mathbf{X}}_i)} \sqrt{D(\bar{\mathbf{X}}_j)}} \\ \end{aligned} ρij=ρ(Xˉi,Xˉj)=D(Xˉi)D(Xˉj)cov(Xˉi,Xˉj)
由于
ρ
(
X
ˉ
i
,
X
ˉ
j
)
=
ρ
(
α
^
i
X
i
,
α
^
j
X
j
)
\rho \left( \bar{\mathbf{X}}_i, \bar{\mathbf{X}}_j \right)= \rho \left(\boldsymbol{\hat{\alpha}}_i \mathbf{X}_i, \boldsymbol{\hat{\alpha}}_j \mathbf{X}_j \right)
ρ(Xˉi,Xˉj)=ρ(α^iXi,α^jXj), 其中
α
^
i
\boldsymbol{\hat{\alpha}}_i
α^i为长度为
R
i
R_i
Ri, 所有元素为
1
R
i
\frac{1}{R_i}
Ri1的向量. 因此有
ρ
i
j
=
ρ
(
X
ˉ
i
,
X
ˉ
j
)
=
ρ
(
α
i
X
i
,
α
j
X
j
)
=
α
^
i
T
Σ
X
i
X
j
α
^
j
α
^
i
T
Σ
X
i
X
i
α
^
i
α
^
j
T
Σ
X
j
X
j
α
^
j
=
α
i
T
Σ
X
i
X
j
α
j
α
i
T
Σ
X
i
X
i
α
i
α
j
T
Σ
X
j
X
j
α
j
\begin{aligned} \rho_{ij} &= \rho \left( \bar{\mathbf{X}}_i, \bar{\mathbf{X}}_j \right) \\ &= \rho \left( \boldsymbol{\alpha}_i \mathbf{X}_i, \boldsymbol{\alpha}_j \mathbf{X}_j \right) \\ &= \frac{\boldsymbol{\hat{\alpha}}_i^T \Sigma_{\mathbf{X}_i \mathbf{X}_j} \boldsymbol{\hat{\alpha}}_j} {\sqrt{\boldsymbol{\hat{\alpha}}_i^T \Sigma_{\mathbf{X}_i \mathbf{X}_i} \boldsymbol{\hat{\alpha}}_i} \sqrt{\boldsymbol{\hat{\alpha}}_j^T \Sigma_{\mathbf{X}_j \mathbf{X}_j} \boldsymbol{\hat{\alpha}}_j}} \\ &= \frac{\boldsymbol{\alpha}_i^T \Sigma_{\mathbf{X}_i \mathbf{X}_j} \boldsymbol{\alpha}_j} {\sqrt{\boldsymbol{\alpha}_i^T \Sigma_{\mathbf{X}_i \mathbf{X}_i} \boldsymbol{\alpha}_i} \sqrt{\boldsymbol{\alpha}_j^T \Sigma_{\mathbf{X}_j \mathbf{X}_j} \boldsymbol{\alpha}_j}} \\ \end{aligned}
ρij=ρ(Xˉi,Xˉj)=ρ(αiXi,αjXj)=α^iTΣXiXiα^iα^jTΣXjXjα^jα^iTΣXiXjα^j=αiTΣXiXiαiαjTΣXjXjαjαiTΣXiXjαj
其中
Σ
X
i
X
j
\Sigma_{\mathbf{X}_i \mathbf{X}_j}
ΣXiXj表示随机向量
X
i
\mathbf{X}_i
Xi与
X
j
\mathbf{X}_j
Xj的协方差矩阵,
α
i
\boldsymbol{\alpha}_i
αi与
α
j
\boldsymbol{\alpha}_j
αj分别表示所有元素为1, 长度为
R
i
,
R
j
R_i, R_j
Ri,Rj的向量.
局部一致性(Regional Homogeneity)(此章节有误)
局部一致性的计算过程如下
- 选取中心体素 X i = [ x i 1 , x i 2 , ⋯   , x i T ] T \mathbf{X}_i=[x_{i1}, x_{i2}, \cdots, x_{iT}]^T Xi=[xi1,xi2,⋯,xiT]T
- 以体素
X
i
\mathbf{X}_i
Xi为中心选取周围的体素(在DPARSF standard processing pipeline中, 为相邻的
K
=
27
K=27
K=27个体素)
X ~ i = [ X i 1 , X i 2 , ⋯   , X i K ] T = [ x i 1 1 x i 1 2 ⋯ x i 1 T x i 2 1 x i 2 2 ⋯ x i 1 T ⋮ ⋮ ⋱ ⋮ x i K 1 x i K 2 ⋯ x i K T ] \begin{aligned} \widetilde{\mathbf{X}}_i &=\left[ \mathbf{X}_{i_1}, \mathbf{X}_{i_2}, \cdots, \mathbf{X}_{i_{K}} \right]^T \\ &= \left[ \begin{matrix} x_{i_1 1} & x_{i_1 2} & \cdots & x_{i_1 T} \\ x_{i_2 1} & x_{i_2 2} & \cdots & x_{i_1 T} \\ \vdots & \vdots & \ddots & \vdots \\ x_{i_{K} 1} & x_{i_{K} 2} & \cdots & x_{i_{K} T} \end{matrix} \right] \end{aligned} X i=[Xi1,Xi2,⋯,XiK]T=⎣⎢⎢⎢⎡xi11xi21⋮xiK1xi12xi22⋮xiK2⋯⋯⋱⋯xi1Txi1T⋮xiKT⎦⎥⎥⎥⎤ - 根据Kendall’s W的计算, 得到该体素的局部一致性
(1) 计算所有时间点的和
X ~ i t S = ∑ j = 1 K x i j t t = 1 , 2 , ⋯   , T \widetilde{\mathbf{X}}_{it}^S=\sum_{j=1}^{K} x_{i_j t} \quad t=1,2, \cdots, T X itS=j=1∑Kxijtt=1,2,⋯,T
也可以表示成为矩阵与向量的形式
X ~ i S = ∑ j = 1 K X i j = α X ~ i \begin{aligned} \widetilde{\mathbf{X}}_i^S &= \sum_{j=1}^{K} \mathbf{X}_{i_j} \\ &= \boldsymbol{\alpha} \widetilde{\mathbf{X}}_i \end{aligned} X iS=j=1∑KXij=αX i
其中 α = [ 1 , 1 , ⋯   , 1 ] \boldsymbol{\alpha}=[1, 1, \cdots, 1] α=[1,1,⋯,1]是维数为 1 × 27 1 \times 27 1×27的向量.
(2) 计算所有体素的平均值
X ˉ i = 1 T ∑ j = 1 K ∑ t = 1 T x i j t = 1 T ∑ t = 1 T X ~ i t S \begin{aligned} \bar{\mathbf{X}}_i &= \frac{1}{T} \sum_{j=1}^{K} \sum_{t=1}^T x_{i_j t} \\ &= \frac{1}{T} \sum_{t=1}^T \widetilde{\mathbf{X}}_{it}^S \end{aligned} Xˉi=T1j=1∑Kt=1∑Txijt=T1t=1∑TX itS
(3) 计算Kendall’s W
W i = 12 S i K 2 ( T 3 − T ) W_i = \frac{12S_i}{K^2 (T^3 - T)} Wi=K2(T3−T)12Si
其中 S S S表示所有时间序列和的方差
S i = D ( X ~ i S ) = D ( α X ~ i ) = ∑ j = 1 K D ( X i j ) = α T Σ X ~ i X ~ i α \begin{aligned} S_i &= D(\widetilde{\mathbf{X}}_i^S ) = D(\boldsymbol{\alpha} \widetilde{\mathbf{X}}_i) \\ &= \sum_{j=1}^K D(\mathbf{X}_{i_j}) \\ &= \boldsymbol{\alpha}^T \Sigma_{\widetilde{\mathbf{X}}_i \widetilde{\mathbf{X}}_i} \boldsymbol{\alpha} \end{aligned} Si=D(X iS)=D(αX i)=j=1∑KD(Xij)=αTΣX iX iα
因此
W i = 12 K 2 ( T 3 − T ) S i = 12 K 2 ( T 3 − T ) ∑ j = 1 K D ( X i j ) = 12 K 2 ( T 3 − T ) α T Σ X ~ i X ~ i α \begin{aligned} W_i &= \frac{12}{K^2(T^3-T)} S_i \\ &= \frac{12}{K^2(T^3-T)} \sum_{j=1}^K D(\mathbf{X}_{i_j}) \\ &= \frac{12}{K^2(T^3-T)} \boldsymbol{\alpha}^T \Sigma_{\widetilde{\mathbf{X}}_i \widetilde{\mathbf{X}}_i} \boldsymbol{\alpha} \end{aligned} Wi=K2(T3−T)12Si=K2(T3−T)12j=1∑KD(Xij)=K2(T3−T)12αTΣX iX iα