谱聚类
谱聚类思想
谱聚类的思想来源于图论,它把待聚类的数据集中的每一个样本看做是图中一个顶点,这些顶点连接在一起,连接的这些边上有权重,权重的大小表示这些样本之间的相似程度。同一类的顶点它们的相似程度很高,在图论中体现为同一类的顶点中连接它们的边的权重很大,不在同一类的顶点连接它们的边的权重很小。于是谱聚类的最终目标就是找到一种切割图的方法,使得切割之后的各个子图内的权重很大,子图之间的权重很小。
---------------------------------------------------------------------------------------------------------------------------------------
输入:样本集合 X = { x 1 , x 2 , . . . , x N } X=\{x_1,x_2,...,x_N \} X={x1,x2,...,xN}、聚类数量K
输出:样本集合的聚类C*
优化问题:
m
i
n
{
A
k
}
k
=
1
K
N
c
u
t
(
V
)
=
m
i
n
{
A
k
}
k
=
1
K
∑
k
=
1
K
W
(
A
k
,
A
‾
k
)
∑
i
∈
A
k
d
i
→
{
A
k
}
k
=
1
K
=
a
r
g
m
i
n
{
A
k
}
k
=
1
K
N
c
u
t
(
V
)
=
a
r
g
m
i
n
{
A
k
}
k
=
1
K
∑
k
=
1
K
W
(
A
k
,
A
‾
k
)
∑
i
∈
A
k
d
i
→
Y
^
=
a
r
g
m
i
n
Y
∑
k
=
1
K
y
k
T
L
y
k
y
k
T
D
y
k
这里
y
k
是
Y
的第
k
列
Y
∈
R
N
×
K
,
b
e
t
h
e
c
l
u
s
t
e
r
i
n
d
i
c
a
t
o
r
m
a
t
r
i
x
,
i
n
w
h
i
c
h
y
i
l
=
1
i
n
d
i
c
a
t
e
s
t
h
a
t
x
i
i
s
a
s
s
i
g
n
e
d
t
o
t
h
e
l
t
h
c
l
u
s
t
e
r
.
{
y
i
∈
{
0
,
1
}
K
∑
j
=
1
K
y
i
j
=
1
y
i
=
[
y
i
1
y
i
2
.
.
.
y
i
K
]
\underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ →\{A_k\}_{k=1}^K=arg \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=arg\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i} \\ \\ →\hat Y=arg\ \underset{Y}{min}\sum_{k=1}^K\frac{y_k^TLy_k}{y_k^TDy_k}\\ \\ 这里y_k是Y的第k列\\ \\ Y∈R^{N×K},be\ the\ cluster\ indicator\ matrix,\\ in\ which\ y_{il}=1\ indicates\ that\ x_i\ is\ assigned\ to\ the\ l_{th}\ cluster.\\ \begin{cases} y_i ∈\{0,1\}^K \\ \\ \sum_{j=1}^Ky_{ij}=1 \\ \end{cases}\ \ \ \ \ \ \ \ \ y_i= \begin{bmatrix} y_{i1}\\ y_{i2}\\ ... \\ y_{iK} \end{bmatrix}
{Ak}k=1Kmin Ncut(V)={Ak}k=1Kmin k=1∑K∑i∈AkdiW(Ak,Ak)→{Ak}k=1K=arg{Ak}k=1Kmin Ncut(V)=arg{Ak}k=1Kmin k=1∑K∑i∈AkdiW(Ak,Ak)→Y^=arg Ymink=1∑KykTDykykTLyk这里yk是Y的第k列Y∈RN×K,be the cluster indicator matrix,in which yil=1 indicates that xi is assigned to the lth cluster.⎩
⎨
⎧yi∈{0,1}K∑j=1Kyij=1 yi=
yi1yi2...yiK
m i n Y T r ( Y T L Y ( Y T D Y ) − 1 ) Y ^ = a r g m i n Y T r ( Y T L Y ( Y T D Y ) − 1 ) \underset{Y}{min}\ Tr(Y^TLY(Y^TDY)^{-1})\\ \\ \hat{Y}=arg\ \underset{Y}{min}\ Tr(Y^TLY(Y^TDY)^{-1}) \\ \\ Ymin Tr(YTLY(YTDY)−1)Y^=arg Ymin Tr(YTLY(YTDY)−1)
这里L=D-W是拉普拉斯矩阵
---------------------------------------------------------------------------------------------------------------------------------------
推导过程
符号说明:Graph-based(带权重的无向图)
样本数据:
X
=
(
x
1
,
.
.
.
,
x
N
)
⊤
\ X=(x_1,...,x_N)^\top
X=(x1,...,xN)⊤
无向图:
G
=
{
V
,
E
}
G=\{V,E\}
G={V,E}
顶点集:
V
=
{
1
,
2
,
.
.
.
,
N
}
⇔
X
V=\{1,2,...,N\}⇔X
V={1,2,...,N}⇔X
边集:
E
:
s
i
m
i
l
a
r
i
t
y
m
a
t
r
i
x
(
a
f
f
i
m
t
y
m
a
t
r
i
x
)
E:similarity\ \ matrix(affimty\ \ matrix)
E:similarity matrix(affimty matrix)
权重矩阵:W
W
=
[
w
11
w
12
.
.
.
w
1
N
w
21
w
22
.
.
.
w
2
N
.
.
.
.
.
.
.
.
.
.
.
.
w
N
1
w
N
2
.
.
.
w
N
N
]
=
[
w
i
j
]
,
1
≤
i
,
j
≤
N
其中
w
i
j
=
{
K
(
x
i
,
x
j
)
=
e
x
p
{
−
∣
∣
x
i
−
x
j
∣
∣
2
2
2
θ
2
}
if
(
i
,
j
)
∈
E
0
if
(
i
,
j
)
∉
E
W= \begin{bmatrix} w_{11} & w_{12} & ... & w_{1N} \\ w_{21} & w_{22} & ... & w_{2N}\\ ... & ... & ... & ...\\ w_{N1} & w_{N2} & ... & w_{NN} \end{bmatrix} =[w_{ij}],1≤i,j≤N\\ \\ 其中w_{ij}= \begin{cases} K(x_i,x_j)=exp\{-\frac{||x_i-x_j||_2^2}{2\theta ^2} \} & \text{if } (i,j)∈E \\ \\ 0 & \text{if } (i,j)∉E \\ \end{cases}\\ \\
W=
w11w21...wN1w12w22...wN2............w1Nw2N...wNN
=[wij],1≤i,j≤N其中wij=⎩
⎨
⎧K(xi,xj)=exp{−2θ2∣∣xi−xj∣∣22}0if (i,j)∈Eif (i,j)∈/E
顶点i的度: d i = ∑ j = 1 N w i j d_i=\sum_{j=1}^Nw_{ij} di=∑j=1Nwij
度矩阵:
D
=
d
i
a
g
(
W
⋅
1
N
)
D=diag(W⋅\mathbf{1}_N)
D=diag(W⋅1N)
D
=
d
i
a
g
(
W
⋅
1
N
)
=
[
d
1
0
.
.
.
0
0
d
2
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
d
N
]
=
[
∑
j
=
1
N
w
1
j
0
.
.
.
0
0
∑
j
=
1
N
w
2
j
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
∑
j
=
1
N
w
N
j
]
D=diag(W⋅\mathbf{1}_N)= \begin{bmatrix} d_1 & 0 & ... & 0 \\ 0 & d_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & d_N \end{bmatrix}= \begin{bmatrix} \sum_{j=1}^Nw_{1j} & 0 & ... & 0 \\ 0 & \sum_{j=1}^Nw_{2j} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{j=1}^Nw_{Nj} \end{bmatrix}\\ \\
D=diag(W⋅1N)=
d10...00d2...0............00...dN
=
∑j=1Nw1j0...00∑j=1Nw2j...0............00...∑j=1NwNj
Laplacian Matrix:
L
=
D
−
W
L=D-W
L=D−W
相关定义:
对于集合A和集合B,A和B是顶点集V的子集且A和B的交集为空,即:
A
⊂
V
,
B
⊂
V
,
A
∩
B
=
∅
→
W
(
A
,
B
)
=
∑
i
∈
A
,
j
∈
B
w
i
j
\\A \subset V,B \subset V,A\cap B=\emptyset\\ \\ →W(A,B)=\sum_{i∈A,j∈B} w_{ij}
A⊂V,B⊂V,A∩B=∅→W(A,B)=i∈A,j∈B∑wij
这个就是说对于A和B两个类别,计算一个类的节点到另一个类的节点所有边的权重和。
更严格的切图的定义如下:假如一共K个类别也就是将顶点集合分为K个子集合,即:
V
=
∪
k
=
1
K
A
k
=
A
1
∪
A
2
∪
A
3
∪
.
.
.
∪
A
K
V=∪_{k=1}^K A_k=A_1∪A_2∪A_3∪...∪A_K
V=∪k=1KAk=A1∪A2∪A3∪...∪AK
C
u
t
(
V
)
=
C
u
t
(
A
1
,
.
.
.
,
A
K
)
=
∑
k
=
1
K
W
(
A
k
,
A
‾
k
)
=
∑
k
=
1
K
W
(
A
k
,
V
)
−
∑
k
=
1
K
W
(
A
k
,
A
k
)
Cut(V)=Cut(A_1,...,A_K)=\sum_{k=1}^K W(A_k,\overline A_k)=\sum_{k=1}^K W(A_k,V)-\sum_{k=1}^K W(A_k,A_k)
Cut(V)=Cut(A1,...,AK)=k=1∑KW(Ak,Ak)=k=1∑KW(Ak,V)−k=1∑KW(Ak,Ak)
我们的目标是
m
i
n
{
A
k
}
k
=
1
K
C
u
t
(
V
)
\underset{\{A_k\}_{k=1}^K}{min}Cut(V)
{Ak}k=1KminCut(V)(切割之后的各个子图内的权重很大,子图之间的权重很小。),但是直接拿Cut作为目标函数会有问题,如下图我们选择一个权重最小的边缘的点,比如C和H之间进行cut,这样可以最小化
C
u
t
(
A
1
,
A
2
,
.
.
.
A
k
)
Cut(A_1,A_2,...A_k)
Cut(A1,A2,...Ak), 但是却不是最优的切图。
对Cut做一个normalize。
Ncut的定义:
除以度来做一个Normalize,度用
d
e
g
r
e
e
(
A
k
)
degree(A_k)
degree(Ak)表示。
c
u
t
(
V
)
=
∑
k
=
1
K
W
(
A
k
,
A
‾
k
)
→
N
c
u
t
=
∑
k
=
1
K
W
(
A
k
,
A
‾
k
)
Δ
Δ
=
d
e
g
r
e
e
(
A
k
)
=
∑
i
∈
A
k
d
i
d
i
=
∑
j
=
1
N
w
i
j
→
N
c
u
t
=
∑
k
=
1
K
W
(
A
k
,
A
‾
k
)
∑
i
∈
A
k
d
i
d
i
=
∑
j
=
1
N
w
i
j
=
∑
k
=
1
K
W
(
A
k
,
V
)
−
W
(
A
k
,
A
k
)
∑
i
∈
A
k
d
i
=
∑
k
=
1
K
W
(
A
k
,
V
)
−
W
(
A
k
,
A
k
)
∑
i
∈
A
k
∑
j
=
1
N
w
i
j
cut(V)=\sum_{k=1}^KW(A_k,\overline A_k)\\ \\ \\ →Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\Delta}\\ \\ \Delta=degree(A_k)=\sum_{i∈A_k}d_i\ \ \ \ \ d_i=\sum_{j=1}^Nw_{ij}\\ \\ \\ →Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\ \ \ \ \ d_i=\sum_{j=1}^Nw_{ij}\\ \\ \\ =\sum_{k=1}^K\frac{W(A_k,V)-W(A_k,A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ =\sum_{k=1}^K\frac{W(A_k,V)-W(A_k,A_k)}{\sum_{i∈A_k}\sum_{j=1}^Nw_{ij}}
cut(V)=k=1∑KW(Ak,Ak)→Ncut=k=1∑KΔW(Ak,Ak)Δ=degree(Ak)=i∈Ak∑di di=j=1∑Nwij→Ncut=k=1∑K∑i∈AkdiW(Ak,Ak) di=j=1∑Nwij=k=1∑K∑i∈AkdiW(Ak,V)−W(Ak,Ak)=k=1∑K∑i∈Ak∑j=1NwijW(Ak,V)−W(Ak,Ak)
优化目标:
m
i
n
{
A
k
}
k
=
1
K
N
c
u
t
(
V
)
\underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)
{Ak}k=1Kmin Ncut(V)
Model
m
i
n
{
A
k
}
k
=
1
K
N
c
u
t
(
V
)
=
m
i
n
{
A
k
}
k
=
1
K
∑
k
=
1
K
W
(
A
k
,
A
‾
k
)
∑
i
∈
A
k
d
i
→
{
A
k
}
k
=
1
K
=
a
r
g
m
i
n
{
A
k
}
k
=
1
K
N
c
u
t
(
V
)
=
a
r
g
m
i
n
{
A
k
}
k
=
1
K
∑
k
=
1
K
W
(
A
k
,
A
‾
k
)
∑
i
∈
A
k
d
i
\underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ →\{A_k\}_{k=1}^K=arg \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=arg\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}
{Ak}k=1Kmin Ncut(V)={Ak}k=1Kmin k=1∑K∑i∈AkdiW(Ak,Ak)→{Ak}k=1K=arg{Ak}k=1Kmin Ncut(V)=arg{Ak}k=1Kmin k=1∑K∑i∈AkdiW(Ak,Ak)
引入指示向量indicator vector:
{
y
i
∈
{
0
,
1
}
K
∑
j
=
1
K
y
i
j
=
1
y
i
=
[
y
i
1
y
i
2
.
.
.
y
i
K
]
y
i
j
=
1
⇔
第
i
个样本属于第
j
个类别
\begin{cases} y_i ∈\{0,1\}^K \\ \\ \sum_{j=1}^Ky_{ij}=1 \\ \end{cases}\ \ \ \ \ \ \ \ \ y_i= \begin{bmatrix} y_{i1}\\ y_{i2}\\ ... \\ y_{iK} \end{bmatrix} \ \ \ \ \ \ \ \ \\ y_{ij}=1⇔第\ i个样本属于第\ j个类别
⎩
⎨
⎧yi∈{0,1}K∑j=1Kyij=1 yi=
yi1yi2...yiK
yij=1⇔第 i个样本属于第 j个类别
Y = [ y 1 , . . . y K ] N × K ⊤ 将问题模型转换 : Y ^ = a r g m i n Y ^ N c u t ( V ) Y=[y_1,...y_K]^\top_{N×K}\\ \\ 将问题模型转换: \hat Y=arg \underset{\hat Y}{min}Ncut(V) Y=[y1,...yK]N×K⊤将问题模型转换:Y^=argY^minNcut(V)
目的是将问题模型转换: Y ^ = a r g m i n Y ^ N c u t ( V ) \hat Y=arg \underset{\hat Y}{min}Ncut(V) Y^=argY^minNcut(V)
将Ncut转换成矩阵的形式:
N
c
u
t
=
∑
k
=
1
K
W
(
A
k
,
A
‾
k
)
∑
i
∈
A
k
d
i
=
T
r
[
W
(
A
1
,
A
‾
1
)
∑
i
∈
A
1
d
i
0
.
.
.
0
0
W
(
A
2
,
A
‾
2
)
∑
i
∈
A
2
d
i
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
W
(
A
K
,
A
‾
K
)
∑
i
∈
A
K
d
i
]
=
T
r
[
W
(
A
1
,
A
‾
1
)
0
.
.
.
0
0
W
(
A
2
,
A
‾
2
)
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
W
(
A
K
,
A
‾
K
)
]
[
∑
i
∈
A
1
d
i
0
.
.
.
0
0
∑
i
∈
A
2
d
i
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
∑
i
∈
A
K
d
i
]
−
1
Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ =Tr \begin{bmatrix} \frac{W(A_1,\overline A_1)}{\sum_{i∈A_1}d_i} & 0 & ... & 0 \\ 0 & \frac{W(A_2,\overline A_2)}{\sum_{i∈A_2}d_i} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \frac{W(A_K,\overline A_K)}{\sum_{i∈A_K}d_i} \end{bmatrix}\\ \\ \\ =Tr \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix} \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}^{-1}
Ncut=k=1∑K∑i∈AkdiW(Ak,Ak)=Tr
∑i∈A1diW(A1,A1)0...00∑i∈A2diW(A2,A2)...0............00...∑i∈AKdiW(AK,AK)
=Tr
W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK)
∑i∈A1di0...00∑i∈A2di...0............00...∑i∈AKdi
−1
记 O K × K = [ W ( A 1 , A ‾ 1 ) 0 . . . 0 0 W ( A 2 , A ‾ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ] P K × K = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] 记O_{K×K}= \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix}\\ \\ \\ P_{K×K}= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix} 记OK×K= W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK) PK×K= ∑i∈A1di0...00∑i∈A2di...0............00...∑i∈AKdi
现在问题转化为: m i n { A k } k = 1 K N c u t ( V ) = m i n { A k } k = 1 K T r ( O P − 1 ) \underset{\{A_k\}_{k=1}^K}{min}Ncut(V)=\underset{\{A_k\}_{k=1}^K}{min}Tr(OP^{-1}) {Ak}k=1KminNcut(V)={Ak}k=1KminTr(OP−1)
已知W、Y,求O、P,我们要将O和P用Y和W表示:
先求解P:
Y
⊤
Y
=
[
y
1
,
.
.
.
y
N
]
[
y
1
T
y
2
T
.
.
.
y
N
T
]
Y^\top Y=[y_1,...y_N] \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}
Y⊤Y=[y1,...yN]
y1Ty2T...yNT
=
∑
i
=
1
N
y
i
y
i
T
=
[
N
1
0
.
.
.
0
0
N
2
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
N
K
]
K
×
K
=\sum_{i=1}^Ny_iy_i^T= \begin{bmatrix} N_1 & 0 & ... & 0 \\ 0 & N_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & N_K \end{bmatrix}_{K×K}
=i=1∑NyiyiT=
N10...00N2...0............00...NK
K×K
=
[
∑
i
∈
A
1
1
0
.
.
.
0
0
∑
i
∈
A
2
1
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
∑
i
∈
A
K
1
]
K
×
K
=\begin{bmatrix} \sum_{i∈A_1}1 & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}1 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}1 \end{bmatrix}_{K×K}
=
∑i∈A110...00∑i∈A21...0............00...∑i∈AK1
K×K
N k N_k Nk 的含义:在N个样本中,属于类别k的样本个数。 ∑ k = 1 N N k = N , N k = ∣ A k ∣ = ∑ i ∈ A k 1 \sum_{k=1}^NN_k=N,N_k=|A_k|=\sum_{i∈A_k}1 ∑k=1NNk=N,Nk=∣Ak∣=∑i∈Ak1
∑
i
=
1
N
y
i
d
i
y
i
T
=
y
1
d
1
y
1
T
+
y
2
d
2
y
2
T
.
.
.
+
y
N
d
N
y
N
T
=
Y
T
D
Y
P
K
×
K
=
[
∑
i
∈
A
1
d
i
0
.
.
.
0
0
∑
i
∈
A
2
d
i
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
∑
i
∈
A
K
d
i
]
=
Y
T
D
Y
其中
,
D
=
[
d
1
0
.
.
.
0
0
d
2
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
d
N
]
=
d
i
a
g
(
W
⋅
1
N
)
=
[
∑
j
=
1
N
w
1
j
0
.
.
.
0
0
∑
j
=
1
N
w
2
j
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
∑
j
=
1
N
w
N
j
]
\sum_{i=1}^Ny_id_iy_i^T=y_1d_1y_1^T+y_2d_2y_2^T...+y_Nd_Ny_N^T=Y^TDY \\ \\ \\ P_{K×K}= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}=Y^TDY\\ \\ \\ 其中,D= \begin{bmatrix} d_1 & 0 & ... & 0 \\ 0 & d_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & d_N \end{bmatrix}=diag(W·\mathbf{1}_N)= \begin{bmatrix} \sum_{j=1}^Nw_{1j} & 0 & ... & 0 \\ 0 & \sum_{j=1}^Nw_{2j} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{j=1}^Nw_{Nj} \end{bmatrix}\\ \\ \\
i=1∑NyidiyiT=y1d1y1T+y2d2y2T...+yNdNyNT=YTDYPK×K=
∑i∈A1di0...00∑i∈A2di...0............00...∑i∈AKdi
=YTDY其中,D=
d10...00d2...0............00...dN
=diag(W⋅1N)=
∑j=1Nw1j0...00∑j=1Nw2j...0............00...∑j=1NwNj
所以我们求解的P为:
P
=
Y
T
D
Y
其中
,
D
=
d
i
a
g
(
W
⋅
1
N
)
P=Y^TDY\\ \\ \\ 其中,D=diag(W·\mathbf{1}_N)
P=YTDY其中,D=diag(W⋅1N)
再求解O:
O
K
×
K
=
[
W
(
A
1
,
A
‾
1
)
0
.
.
.
0
0
W
(
A
2
,
A
‾
2
)
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
W
(
A
K
,
A
‾
K
)
]
W
(
A
k
,
A
k
‾
)
=
W
(
A
k
,
V
)
⏟
∑
i
∈
A
k
d
i
−
W
(
A
k
,
A
k
)
⏟
∑
i
∈
A
k
∑
j
∈
A
k
w
i
j
→
O
=
[
∑
i
∈
A
1
d
i
0
.
.
.
0
0
∑
i
∈
A
2
d
i
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
∑
i
∈
A
K
d
i
]
−
[
W
(
A
1
,
A
1
)
0
.
.
.
0
0
W
(
A
2
,
A
2
)
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
W
(
A
K
,
A
K
)
]
O_{K×K}= \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix}\\ \\ \\ W(A_k,\overline{A_k})=\underbrace{W(A_k,V)}_{\sum_{i∈A_k}d_i}-\underbrace{W(A_k,A_k)}_{\sum_{i∈A_k}\sum_{j∈A_k}w_{ij}}\\ \\ \\ →O= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}- \begin{bmatrix} W(A_1,A_1) & 0 & ... & 0 \\ 0 & W(A_2, A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K, A_K) \end{bmatrix}\\ \\ \\
OK×K=
W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK)
W(Ak,Ak)=∑i∈Akdi
W(Ak,V)−∑i∈Ak∑j∈Akwij
W(Ak,Ak)→O=
∑i∈A1di0...00∑i∈A2di...0............00...∑i∈AKdi
−
W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK)
前面的矩阵我们知道:
Y
T
D
Y
Y^TDY
YTDY, 再来看后面部分:
[ W ( A 1 , A 1 ) 0 . . . 0 0 W ( A 2 , A 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A K ) ] \begin{bmatrix} W(A_1,A_1) & 0 & ... & 0 \\ 0 & W(A_2, A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K, A_K) \end{bmatrix} W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK)
= [ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j 0 . . . 0 0 ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . 0 . . . . . . . . . . . . 0 . . . . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] =\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & 0\\ ... & ... & ... & ...\\ 0 & ... & ...& \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} = ∑i∈A1∑j∈A1wij0...00∑i∈A2∑j∈A2wij..................00...∑i∈AK∑j∈AKwij
猜想后半部分是否等于
Y
T
W
Y
Y^TWY
YTWY 验证一下:
Y
T
W
Y
维度是
K
×
K
维
Y
T
W
Y
=
[
y
1
,
.
.
.
y
N
]
[
w
11
w
12
.
.
.
w
1
N
w
21
w
22
.
.
.
w
2
N
.
.
.
.
.
.
.
.
.
.
.
.
w
N
1
w
N
2
.
.
.
w
N
N
]
[
y
1
T
y
2
T
.
.
.
y
N
T
]
=
[
∑
i
=
1
N
y
i
w
i
1
,
.
.
.
,
∑
i
=
1
N
y
i
w
N
i
]
[
y
1
T
y
2
T
.
.
.
y
N
T
]
=
∑
i
=
1
N
∑
j
=
1
N
y
i
w
i
j
y
i
T
=
∑
i
=
1
N
∑
j
=
1
N
y
i
y
i
T
w
i
j
=
[
∑
i
∈
A
1
∑
j
∈
A
1
w
i
j
∑
i
∈
A
1
∑
j
∈
A
2
w
i
j
.
.
.
∑
i
∈
A
1
∑
j
∈
A
K
w
i
j
∑
i
∈
A
2
∑
j
∈
A
1
w
i
j
∑
i
∈
A
2
∑
j
∈
A
2
w
i
j
.
.
.
∑
i
∈
A
2
∑
j
∈
A
K
w
i
j
.
.
.
.
.
.
.
.
.
.
.
.
∑
i
∈
A
K
∑
j
∈
A
1
w
i
j
∑
i
∈
A
K
∑
j
∈
A
2
w
i
j
.
.
.
∑
i
∈
A
K
∑
j
∈
A
K
w
i
j
]
Y^TWY维度是K×K维\\ \\ \\ Y^TWY=[y_1,...y_N] \begin{bmatrix} w_{11} & w_{12} & ... & w_{1N} \\ w_{21} & w_{22} & ... & w_{2N}\\ ... & ... & ... & ...\\ w_{N1} & w_{N2} & ... & w_{NN} \end{bmatrix} \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}\\ \\ =[\sum_{i=1}^Ny_iw_{i1},...,\sum_{i=1}^Ny_iw_{Ni}] \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}\\ \\ =\sum_{i=1}^N\sum_{j=1}^Ny_iw_{ij}y_i^T=\sum_{i=1}^N\sum_{j=1}^Ny_iy_i^Tw_{ij}\\ \\ =\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_1}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_1}\sum_{j∈A_K}w_{ij} \\ \sum_{i∈A_2}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_2}\sum_{j∈A_K}w_{ij}\\ ... & ... & ... & ...\\ \sum_{i∈A_K}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_K}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix}
YTWY维度是K×K维YTWY=[y1,...yN]
w11w21...wN1w12w22...wN2............w1Nw2N...wNN
y1Ty2T...yNT
=[i=1∑Nyiwi1,...,i=1∑NyiwNi]
y1Ty2T...yNT
=i=1∑Nj=1∑NyiwijyiT=i=1∑Nj=1∑NyiyiTwij=
∑i∈A1∑j∈A1wij∑i∈A2∑j∈A1wij...∑i∈AK∑j∈A1wij∑i∈A1∑j∈A2wij∑i∈A2∑j∈A2wij...∑i∈AK∑j∈A2wij............∑i∈A1∑j∈AKwij∑i∈A2∑j∈AKwij...∑i∈AK∑j∈AKwij
观察上式和O的后半部分:
[
∑
i
∈
A
1
∑
j
∈
A
1
w
i
j
0
.
.
.
0
0
∑
i
∈
A
2
∑
j
∈
A
2
w
i
j
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
.
.
.
.
.
.
∑
i
∈
A
K
∑
j
∈
A
K
w
i
j
]
\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & 0\\ ... & ... & ... & ...\\ 0 & ... & ...& \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} \\ \\ \\
∑i∈A1∑j∈A1wij0...00∑i∈A2∑j∈A2wij..................00...∑i∈AK∑j∈AKwij
[ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j ∑ i ∈ A 1 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 1 ∑ j ∈ A K w i j ∑ i ∈ A 2 ∑ j ∈ A 1 w i j ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 2 ∑ j ∈ A K w i j . . . . . . . . . . . . ∑ i ∈ A K ∑ j ∈ A 1 w i j ∑ i ∈ A K ∑ j ∈ A 2 w i j . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] \begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_1}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_1}\sum_{j∈A_K}w_{ij} \\ \sum_{i∈A_2}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_2}\sum_{j∈A_K}w_{ij}\\ ... & ... & ... & ...\\ \sum_{i∈A_K}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_K}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} ∑i∈A1∑j∈A1wij∑i∈A2∑j∈A1wij...∑i∈AK∑j∈A1wij∑i∈A1∑j∈A2wij∑i∈A2∑j∈A2wij...∑i∈AK∑j∈A2wij............∑i∈A1∑j∈AKwij∑i∈A2∑j∈AKwij...∑i∈AK∑j∈AKwij
我们发现,这两个矩阵对角线元素是相同的,又因为我们是对迹求最小,即只考虑对角线的元素。所以将O的后半部分换成$Y^TWY $并不影响我们的结果
记 O ′ = Y T D Y − Y T W Y O'=Y^TDY - Y^TWY O′=YTDY−YTWY 那么 O ′ P O'P O′P相当于对 O ′ O' O′的对角线做一些变化。那么就有 T r ( O P ) = T r ( O ′ P ) Tr(OP)=Tr(O'P) Tr(OP)=Tr(O′P)。
至此我们解出了
O
O
O ,并且提出了用
O
′
O'
O′ 代替
O
O
O 可以达到同样的目的。
O
′
=
Y
T
D
Y
−
Y
T
W
Y
O'=Y^TDY-Y^TWY
O′=YTDY−YTWY
我们最终的优化问题变为:
Y
^
=
a
r
g
m
i
n
Y
^
T
r
(
Y
T
(
D
−
W
)
Y
(
Y
T
D
Y
)
−
1
)
=
Y
^
=
a
r
g
m
i
n
Y
^
T
r
(
Y
T
L
Y
(
Y
T
D
Y
)
−
1
)
这里
L
=
D
−
W
是拉普拉斯矩阵
\hat Y=arg \underset{\hat Y}{min}\ Tr(Y^T(D-W)Y(Y^TDY)^{-1})\\ \\ \\ =\hat Y=arg \underset{\hat Y}{min}\ Tr(Y^TLY(Y^TDY)^{-1})\\ \\ 这里L=D-W是拉普拉斯矩阵
Y^=argY^min Tr(YT(D−W)Y(YTDY)−1)=Y^=argY^min Tr(YTLY(YTDY)−1)这里L=D−W是拉普拉斯矩阵
To minimaze T r ( Y T L Y ( Y T D Y ) − 1 ) Tr(Y^T LY(Y^T DY)^{-1}) Tr(YTLY(YTDY)−1)
T r ( Y T L Y ( Y T D Y ) − 1 ) Tr(Y^T LY(Y^T DY)^{-1}) Tr(YTLY(YTDY)−1)
其中
Y
∈
R
N
×
K
Y∈R^{N×K}
Y∈RN×K ,每一行是ONE-HOT,表示第i行属于哪一类。
Y
Y
Y 形如:
[
0
.
.
.
0
1
0
.
.
.
0
0
.
.
.
1
0
0
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
.
.
.
0
0
0
.
.
.
1
1
.
.
.
0
1
0
.
.
.
0
]
\begin{bmatrix} 0 & ...& 0 & 1 & 0 &... & 0 \\ 0 & ...&1 & 0 & 0 &... & 0\\ ... & ...& ... & ... & ... & ... & ...\\ 0 & ...& 0 & 0 & 0 &... & 1\\ 1 & ...&0 & 1 & 0 &... & 0 \end{bmatrix}
00...01...............01...0010...0100...00...............00...10
记:
P
=
Y
T
D
Y
=
d
i
a
g
(
∑
i
∈
A
1
d
i
,
∑
i
∈
A
2
d
i
,
.
.
.
,
∑
i
∈
A
K
d
i
)
=
d
i
a
g
(
p
1
,
p
2
,
.
.
.
,
p
k
)
原式
=
T
r
(
Y
T
L
Y
P
−
1
)
=
T
r
(
Y
T
L
Y
P
−
1
2
P
−
1
2
)
=
T
r
(
P
−
1
2
Y
T
L
Y
P
−
1
2
)
P=Y^TDY=diag(\sum_{i∈A_1}d_i,\sum_{i∈A_2}d_i,...,\sum_{i∈A_K}d_i)=diag(p_1,p_2,...,p_k)\\ \\ 原式=Tr(Y^TLYP^{-1})=Tr(Y^TLYP^{-\frac12}P^{-\frac12})=Tr(P^{-\frac12}Y^TLYP^{-\frac12})
P=YTDY=diag(i∈A1∑di,i∈A2∑di,...,i∈AK∑di)=diag(p1,p2,...,pk)原式=Tr(YTLYP−1)=Tr(YTLYP−21P−21)=Tr(P−21YTLYP−21)
记:
H
=
Y
P
−
1
2
,
H
T
=
P
−
1
2
Y
T
H
T
H
=
P
−
1
2
Y
T
Y
P
−
1
2
=
P
−
1
2
I
P
−
1
2
=
P
−
1
H=YP^{-\frac12},H^T=P^{-\frac12}Y^T\\ \\ H^TH=P^{-\frac12}Y^TYP^{-\frac12}=P^{-\frac12}IP^{-\frac12}=P^{-1}
H=YP−21,HT=P−21YTHTH=P−21YTYP−21=P−21IP−21=P−1
原式
=
T
r
(
H
T
L
H
)
原式=Tr(H^TLH)
原式=Tr(HTLH)
定理1:
对于半正定矩阵L, 特征值(eigenvalue): 0 ≤ λ 1 ≤ λ 2 ≤ . . . ≤ λ n 0≤\lambda_1≤\lambda_2≤...≤\lambda_n 0≤λ1≤λ2≤...≤λn
特征基(eigbasis): { v ‾ 1 , v ‾ 2 , . . . , v ‾ n } \{\overline v_1,\overline v_2,...,\overline v_n\} {v1,v2,...,vn} →Orthonormal,标准正交化之后的特征向量
当 x ∈ R N , a n d x T x = 1 \mathbf{x}∈R^{N},and\ \ \mathbf{x}^T\mathbf{x}=\mathbf{1} x∈RN,and xTx=1 时, x T L x \mathbf{x}^TL\mathbf{x} xTLx 的最小值在 x = v ‾ 1 \mathbf{x}=\overline v_1 x=v1 时取到。
proof:
x
可以用
e
i
g
b
a
s
i
s
表示
,
因为
e
i
g
b
a
s
i
s
是
o
r
t
h
o
n
o
r
m
a
l
x
=
c
1
v
‾
1
+
c
2
v
‾
2
+
.
.
.
+
c
n
v
‾
n
L
x
=
λ
x
=
c
1
λ
1
v
‾
1
+
c
2
λ
2
v
‾
2
+
.
.
.
+
c
n
λ
n
v
‾
n
→
x
T
L
x
=
c
1
2
λ
1
+
c
2
2
λ
2
+
.
.
.
+
c
n
2
λ
n
因为
x
T
x
=
1
→
c
1
2
+
c
2
2
+
.
.
.
+
c
n
2
→
x
T
L
x
=
c
1
2
λ
1
+
c
2
2
λ
2
+
.
.
.
+
c
n
2
λ
n
≥
λ
1
当
c
1
2
=
1
,
c
i
=
0
,
i
≠
1
时等号成立
⇔
x
=
v
‾
1
o
r
x
=
−
v
‾
1
\mathbf{x}可以用eigbasis表示, \ 因为eigbasis是orthonormal\\ \\ \mathbf{x}=c_1\overline v_1+c_2\overline v_2+...+c_n\overline v_n\\ \\ L\mathbf{x}=\lambda \mathbf{x}=c_1\lambda_1\overline v_1+c_2\lambda_2\overline v_2+...+c_n\lambda_n\overline v_n\\ \\ →\mathbf{x}^TL\mathbf{x}=c_1^2\lambda_1+c_2^2\lambda_2+...+c_n^2\lambda_n\\ \\ 因为\mathbf{x}^T\mathbf{x}=\mathbf{1}→c_1^2+c_2^2+...+c_n^2\\ \\ →\mathbf{x}^TL\mathbf{x}=c_1^2\lambda_1+c_2^2\lambda_2+...+c_n^2\lambda_n≥\lambda_1\\ 当c_1^2=1,c_i=0,i≠1时等号成立⇔\mathbf{x}=\overline v_1\ \ or\ \ \mathbf{x}=-\overline v_1
x可以用eigbasis表示, 因为eigbasis是orthonormalx=c1v1+c2v2+...+cnvnLx=λx=c1λ1v1+c2λ2v2+...+cnλnvn→xTLx=c12λ1+c22λ2+...+cn2λn因为xTx=1→c12+c22+...+cn2→xTLx=c12λ1+c22λ2+...+cn2λn≥λ1当c12=1,ci=0,i=1时等号成立⇔x=v1 or x=−v1
定理2:
对于半正定矩阵L, 特征值(eigenvalue): 0 ≤ λ 1 ≤ λ 2 ≤ . . . ≤ λ n 0≤\lambda_1≤\lambda_2≤...≤\lambda_n 0≤λ1≤λ2≤...≤λn
特征基(eigbasis): { v ‾ 1 , v ‾ 2 , . . . , v ‾ n } \{\overline v_1,\overline v_2,...,\overline v_n\} {v1,v2,...,vn} →Orthonormal,标准正交化之后的特征向量
当 F ∈ R N × K , a n d F T F = I F∈R^{N×K},\ and\ F^TF=I F∈RN×K, and FTF=I 时, T r ( F T L F ) Tr(F^TLF) Tr(FTLF) 的最小值在 F = [ v ‾ 1 , v ‾ 2 , . . . , v ‾ K ] F=[\overline v_1,\overline v_2,...,\overline v_K] F=[v1,v2,...,vK] 时取到
proof:
D
e
n
o
t
e
F
=
[
f
1
,
f
2
,
.
.
.
,
f
K
]
T
r
(
F
T
L
F
)
=
∑
i
=
1
K
f
i
T
L
f
i
由于定理
2
f
1
=
v
‾
1
,
f
2
=
v
‾
2
,
.
.
.
,
f
n
=
v
‾
n
时
,
T
r
(
F
T
L
F
)
最小
Denote\ \ \ F=[f_1,f_2,...,f_K]\\ \\ Tr(F^TLF)=\sum_{i=1}^Kf_i^TLf_i\\ \\ 由于定理2\ \ \ f_1=\overline v_1 , f_2=\overline v_2,...,f_n=\overline v_n 时,Tr(F^TLF)最小
Denote F=[f1,f2,...,fK]Tr(FTLF)=i=1∑KfiTLfi由于定理2 f1=v1,f2=v2,...,fn=vn时,Tr(FTLF)最小
因为
F
T
F
=
I
F^TF=I
FTF=I,所以F是orthonormal matrix,故不能每列都是
v
‾
1
\overline v_1
v1
原始优化问题 T r ( H T L H ) Tr(H^TLH) Tr(HTLH) 并没有 H T H = I H^TH=I HTH=I的性质,无法用定理2,于是对H做一些变换。
H
T
D
H
=
P
−
1
2
Y
T
D
Y
P
−
1
2
=
P
−
1
2
P
P
−
1
2
=
I
记
F
=
D
1
2
H
→
F
T
F
=
(
D
1
2
H
)
T
D
1
2
H
=
H
T
D
1
2
D
1
2
H
=
H
T
D
H
=
I
则
H
=
D
−
1
2
F
→
T
r
(
H
T
L
H
)
=
T
r
(
F
T
D
−
1
2
L
D
−
1
2
F
)
,
F
T
F
=
I
H^TDH=P^{-\frac12}Y^TDYP^{-\frac12}=P^{-\frac12}PP^{-\frac12}=I\\ 记F=D^{\frac12}H→F^TF=(D^{\frac12}H)^TD^{\frac12}H=H^TD^{\frac12}D^{\frac12}H=H^TDH=I\\ \\ 则H=D^{-\frac12}F\\ \\ →Tr(H^TLH)=Tr(F^TD^{-\frac12}LD^{-\frac12}F),\ \ \ \ F^TF=I
HTDH=P−21YTDYP−21=P−21PP−21=I记F=D21H→FTF=(D21H)TD21H=HTD21D21H=HTDH=I则H=D−21F→Tr(HTLH)=Tr(FTD−21LD−21F), FTF=I
至此我们得到最终的优化目标:
m
i
n
F
T
r
(
F
T
D
−
1
2
L
D
−
1
2
F
)
,
s
.
t
.
F
T
F
=
I
\underset{F}{min}\ Tr(F^TD^{-\frac12}LD^{-\frac12}F),\ \ \ \ \\ s.t.F^TF=I
Fmin Tr(FTD−21LD−21F), s.t.FTF=I
在解出的F上再做一次k-means,最终求得Y