独立成分分析 Independent Component Analysis
独立成分分析 Independent Component Analysis
一. 问题引入
盲源分离问题(Blind Source Separation)是ICA算法最著名的应用场景。
- 盲源分离问题:有
L
L
L个信号源,
D
D
D个传感器。
D
D
D个传感器接收了
L
L
L个信号源混叠的信号,采样
m
m
m次,得到一组数据
x
=
{
x
(
i
)
,
i
=
1
,
2
,
⋯
 
,
m
}
x=\{x^{(i)}, i = 1, 2, \cdots, m\}
x={x(i),i=1,2,⋯,m}. 问题的目标是从
x
x
x分离出
L
L
L个信号源发出的独立信号
s
=
{
s
(
i
)
,
i
=
1
,
2
,
⋯
 
,
m
}
s=\{s^{(i)}, i = 1, 2, \cdots, m\}
s={s(i),i=1,2,⋯,m}. 其中,
x
(
i
)
x^{(i)}
x(i)是
D
×
1
D\times 1
D×1维信号,
s
(
i
)
s^{(i)}
s(i)是
L
×
1
L\times 1
L×1维信号. 混合过程可以用以下方程描述:
X D × m = A D × L S L × m X_{D\times m} = A_{D\times L}S_{L \times m} XD×m=AD×LSL×m
矩阵 A A A称为mixing matrix.
二. ICA – 最大似然估计
这里考虑信号源和传感器数量相等的情况,即 D = L D = L D=L. 此时矩阵 A A A为方阵, A A A的逆 W = A − 1 W = A^{-1} W=A−1称为unmixing matrix. 盲源分离的目标就是找到 W W W, 使 S = W X S=WX S=WX. 为了方便,这里把 W W W的列向量记为 w i T , i = 1 , ⋯   , L w_i^{T}, i=1, \cdots, L wiT,i=1,⋯,L.
2.1 ICA Ambiguities
- ICA算法有一个特性,就是源信号的幅度是无法恢复的.
- 高斯分布的信号是无法盲分离的,因为Gauss分布式旋转对称的,无法分辨旋转变量.
The reason the Gaussian distribution is disallowed as a source prior in ICA is that it does not permit unique recovery of the sources, as illustrated in Figure 12.20©. This is because the PCA likelihood is invariant to any orthogonal transformation of the sources zt and mixing matrix W. PCA can recover the best linear subspace in which the signals lie, but cannot uniquely recover the signals themselves. – MLAPP
2.2 ICA Algorithm
假设第
i
i
i个信号源的分布为
p
i
(
s
)
p_i(s)
pi(s),由于源信号各自独立,因此
j
j
j时刻源信号
s
j
s_j
sj的联合概率分布为
p
(
s
j
)
=
∏
i
=
1
L
p
i
(
s
i
,
j
)
p(s_j) = \prod_{i=1}^{L}p_i(s_{i, j})
p(sj)=i=1∏Lpi(si,j)
则
j
j
j时刻接受信号
x
j
x_j
xj的联合概率分布为
p
(
x
j
)
=
∏
i
=
1
L
p
i
(
w
i
T
x
j
)
∣
W
∣
p(x_j) = \prod_{i=1}^L p_i(w_i^Tx_j)|W|
p(xj)=i=1∏Lpi(wiTxj)∣W∣
那么似然函数为:
L
(
W
)
=
∏
j
=
1
m
p
(
x
j
)
=
∏
j
=
1
m
(
∏
i
=
1
L
p
i
(
w
i
T
x
j
)
∣
W
∣
)
L(W)=\prod_{j=1}^{m}p(x_j)=\prod_{j=1}^{m}(\prod_{i=1}^L p_i(w_i^Tx_j)|W|)
L(W)=j=1∏mp(xj)=j=1∏m(i=1∏Lpi(wiTxj)∣W∣)
一般用对数似然函数作为优化目标:
J
(
W
)
=
log
(
L
(
W
)
)
=
∑
j
=
1
m
(
∑
i
=
1
L
log
p
i
(
w
i
T
x
j
)
+
log
∣
W
∣
)
\begin{aligned} J(W) & =\log(L(W)) \\ & = \sum_{j = 1}^{m}(\sum_{i=1}^L \log p_i(w_i^Tx_j) + \log|W|) \end{aligned}
J(W)=log(L(W))=j=1∑m(i=1∑Llogpi(wiTxj)+log∣W∣)
由于我们事先不知道信号源的分布,因此需要做一个假设. 一个合理的假设是认为信号源分布的累积概率密度函数(CDF)是sigmoid函数
F
(
s
)
=
1
1
+
exp
(
−
s
)
F(s) = \frac{1}{1 + \exp{(-s)}}
F(s)=1+exp(−s)1, 因此概率密度函数(PDF)是:
f
(
s
)
=
d
d
s
F
(
s
)
=
F
(
s
)
(
1
−
F
(
s
)
)
f(s) = \frac{d}{ds}F(s)=F(s)(1-F(s))
f(s)=dsdF(s)=F(s)(1−F(s))
为了后续计算方便,计算得:
f
′
(
s
)
=
f
(
s
)
(
1
−
2
F
(
s
)
)
f^{'}(s) = f(s)(1 - 2 F(s))
f′(s)=f(s)(1−2F(s))
利用随机梯度下降法和公式
∇
W
∣
W
∣
=
∣
W
∣
(
W
−
1
)
T
\nabla_{W}|W|=|W|(W^{-1})^T
∇W∣W∣=∣W∣(W−1)T进行优化学习计算:
∂
J
(
W
)
∂
w
i
j
=
∂
∑
j
=
1
m
(
∑
i
=
1
L
log
f
(
w
i
T
x
j
)
+
log
∣
W
∣
)
∂
w
i
j
=
1
f
(
w
i
T
x
j
)
f
(
w
i
T
x
j
)
(
1
−
2
F
(
w
i
T
x
j
)
)
x
i
j
+
1
∣
W
∣
∣
W
∣
(
W
−
1
)
i
j
T
=
(
1
−
2
F
(
w
i
T
x
t
)
)
x
i
j
+
(
W
−
1
)
i
j
T
\begin{aligned} \frac{\partial{J(W)}}{\partial w_{ij}} &= \frac{\partial \sum_{j = 1}^{m}(\sum_{i=1}^L \log f(w_i^Tx_j) + \log|W|)}{\partial{w_{ij}}}\\ & = \frac{1}{f(w_i^Tx_j)}f(w_i^Tx_j)(1-2F(w_i^Tx_j))x_{ij} + \frac{1}{|W|}|W|(W^{-1})^T_{ij}\\ & =(1-2F(w_i^Tx_t))x_{ij} + (W^{-1})^T_{ij} \end{aligned}
∂wij∂J(W)=∂wij∂∑j=1m(∑i=1Llogf(wiTxj)+log∣W∣)=f(wiTxj)1f(wiTxj)(1−2F(wiTxj))xij+∣W∣1∣W∣(W−1)ijT=(1−2F(wiTxt))xij+(W−1)ijT
写成向量形式为:
∂
J
(
W
)
∂
W
=
[
1
−
2
F
(
w
1
T
x
j
)
1
−
2
F
(
w
2
T
x
j
)
⋮
1
−
2
F
(
w
L
T
x
j
)
]
x
j
T
+
(
W
−
1
)
T
\frac{\partial{J(W)}}{\partial W} = \begin{bmatrix} 1 - 2F(w_1^Tx_j) \\ 1 - 2F(w_2^Tx_j) \\ \vdots \\ 1 - 2F(w_L^Tx_j) \end{bmatrix} x_j^T + (W^{-1})^T
∂W∂J(W)=⎣⎢⎢⎢⎡1−2F(w1Txj)1−2F(w2Txj)⋮1−2F(wLTxj)⎦⎥⎥⎥⎤xjT+(W−1)T
梯度下降的公式为:
W
=
W
+
α
∂
J
(
W
)
∂
W
W = W + \alpha \frac{\partial{J(W)}}{\partial W}
W=W+α∂W∂J(W)
参考文献
- Christopher M. Bishop. Pattern Recognition and Machine Learning
- Kevin P. Murphy. Machine Learning A Probabilistic Perspective
- Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes11.pdf