谱聚类的原理全网最详细的推导过程!!

谱聚类

谱聚类思想

​ 谱聚类的思想来源于图论,它把待聚类的数据集中的每一个样本看做是图中一个顶点,这些顶点连接在一起,连接的这些边上有权重,权重的大小表示这些样本之间的相似程度。同一类的顶点它们的相似程度很高,在图论中体现为同一类的顶点中连接它们的边的权重很大,不在同一类的顶点连接它们的边的权重很小。于是谱聚类的最终目标就是找到一种切割图的方法,使得切割之后的各个子图内的权重很大,子图之间的权重很小。

---------------------------------------------------------------------------------------------------------------------------------------

输入:样本集合 X = { x 1 , x 2 , . . . , x N } X=\{x_1,x_2,...,x_N \} X={x1,x2,...,xN}、聚类数量K

输出:样本集合的聚类C*

优化问题
m i n   { A k } k = 1 K N c u t ( V ) = m i n   { A k } k = 1 K ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i → { A k } k = 1 K = a r g m i n   { A k } k = 1 K N c u t ( V ) = a r g m i n   { A k } k = 1 K ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i → Y ^ = a r g   m i n Y ∑ k = 1 K y k T L y k y k T D y k 这里 y k 是 Y 的第 k 列 Y ∈ R N × K , b e   t h e   c l u s t e r   i n d i c a t o r   m a t r i x , i n   w h i c h   y i l = 1   i n d i c a t e s   t h a t   x i   i s   a s s i g n e d   t o   t h e   l t h   c l u s t e r . { y i ∈ { 0 , 1 } K ∑ j = 1 K y i j = 1           y i = [ y i 1 y i 2 . . . y i K ] \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ →\{A_k\}_{k=1}^K=arg \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=arg\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i} \\ \\ →\hat Y=arg\ \underset{Y}{min}\sum_{k=1}^K\frac{y_k^TLy_k}{y_k^TDy_k}\\ \\ 这里y_k是Y的第k列\\ \\ Y∈R^{N×K},be\ the\ cluster\ indicator\ matrix,\\ in\ which\ y_{il}=1\ indicates\ that\ x_i\ is\ assigned\ to\ the\ l_{th}\ cluster.\\ \begin{cases} y_i ∈\{0,1\}^K \\ \\ \sum_{j=1}^Ky_{ij}=1 \\ \end{cases}\ \ \ \ \ \ \ \ \ y_i= \begin{bmatrix} y_{i1}\\ y_{i2}\\ ... \\ y_{iK} \end{bmatrix} {Ak}k=1Kmin Ncut(V)={Ak}k=1Kmin k=1KiAkdiW(Ak,Ak){Ak}k=1K=arg{Ak}k=1Kmin Ncut(V)=arg{Ak}k=1Kmin k=1KiAkdiW(Ak,Ak)Y^=arg Ymink=1KykTDykykTLyk这里ykY的第kYRN×K,be the cluster indicator matrix,in which yil=1 indicates that xi is assigned to the lth cluster. yi{0,1}Kj=1Kyij=1         yi= yi1yi2...yiK

m i n Y   T r ( Y T L Y ( Y T D Y ) − 1 ) Y ^ = a r g   m i n Y   T r ( Y T L Y ( Y T D Y ) − 1 ) \underset{Y}{min}\ Tr(Y^TLY(Y^TDY)^{-1})\\ \\ \hat{Y}=arg\ \underset{Y}{min}\ Tr(Y^TLY(Y^TDY)^{-1}) \\ \\ Ymin Tr(YTLY(YTDY)1)Y^=arg Ymin Tr(YTLY(YTDY)1)

这里L=D-W是拉普拉斯矩阵

---------------------------------------------------------------------------------------------------------------------------------------

推导过程

符号说明:Graph-based(带权重的无向图)

样本数据:   X = ( x 1 , . . . , x N ) ⊤ \ X=(x_1,...,x_N)^\top  X=(x1,...,xN)
无向图: G = { V , E } G=\{V,E\} G={V,E}
顶点集: V = { 1 , 2 , . . . , N } ⇔ X V=\{1,2,...,N\}⇔X V={1,2,...,N}X
边集: E : s i m i l a r i t y    m a t r i x ( a f f i m t y    m a t r i x ) E:similarity\ \ matrix(affimty\ \ matrix) E:similarity  matrix(affimty  matrix)

权重矩阵:W
W = [ w 11 w 12 . . . w 1 N w 21 w 22 . . . w 2 N . . . . . . . . . . . . w N 1 w N 2 . . . w N N ] = [ w i j ] , 1 ≤ i , j ≤ N 其中 w i j = { K ( x i , x j ) = e x p { − ∣ ∣ x i − x j ∣ ∣ 2 2 2 θ 2 } if  ( i , j ) ∈ E 0 if  ( i , j ) ∉ E W= \begin{bmatrix} w_{11} & w_{12} & ... & w_{1N} \\ w_{21} & w_{22} & ... & w_{2N}\\ ... & ... & ... & ...\\ w_{N1} & w_{N2} & ... & w_{NN} \end{bmatrix} =[w_{ij}],1≤i,j≤N\\ \\ 其中w_{ij}= \begin{cases} K(x_i,x_j)=exp\{-\frac{||x_i-x_j||_2^2}{2\theta ^2} \} & \text{if } (i,j)∈E \\ \\ 0 & \text{if } (i,j)∉E \\ \end{cases}\\ \\ W= w11w21...wN1w12w22...wN2............w1Nw2N...wNN =[wij],1i,jN其中wij= K(xi,xj)=exp{2θ2∣∣xixj22}0if (i,j)Eif (i,j)/E

顶点i的度: d i = ∑ j = 1 N w i j d_i=\sum_{j=1}^Nw_{ij} di=j=1Nwij

度矩阵: D = d i a g ( W ⋅ 1 N ) D=diag(W⋅\mathbf{1}_N) D=diag(W1N)
D = d i a g ( W ⋅ 1 N ) = [ d 1 0 . . . 0 0 d 2 . . . 0 . . . . . . . . . . . . 0 0 . . . d N ] = [ ∑ j = 1 N w 1 j 0 . . . 0 0 ∑ j = 1 N w 2 j . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ j = 1 N w N j ] D=diag(W⋅\mathbf{1}_N)= \begin{bmatrix} d_1 & 0 & ... & 0 \\ 0 & d_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & d_N \end{bmatrix}= \begin{bmatrix} \sum_{j=1}^Nw_{1j} & 0 & ... & 0 \\ 0 & \sum_{j=1}^Nw_{2j} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{j=1}^Nw_{Nj} \end{bmatrix}\\ \\ D=diag(W1N)= d10...00d2...0............00...dN = j=1Nw1j0...00j=1Nw2j...0............00...j=1NwNj
Laplacian Matrix: L = D − W L=D-W L=DW

相关定义:

​ 对于集合A和集合B,A和B是顶点集V的子集且A和B的交集为空,即:
A ⊂ V , B ⊂ V , A ∩ B = ∅ → W ( A , B ) = ∑ i ∈ A , j ∈ B w i j \\A \subset V,B \subset V,A\cap B=\emptyset\\ \\ →W(A,B)=\sum_{i∈A,j∈B} w_{ij} AV,BV,AB=W(A,B)=iA,jBwij
这个就是说对于A和B两个类别,计算一个类的节点到另一个类的节点所有边的权重和。

​ 更严格的切图的定义如下:假如一共K个类别也就是将顶点集合分为K个子集合,即:

V = ∪ k = 1 K A k = A 1 ∪ A 2 ∪ A 3 ∪ . . . ∪ A K V=∪_{k=1}^K A_k=A_1∪A_2∪A_3∪...∪A_K V=k=1KAk=A1A2A3...AK
C u t ( V ) = C u t ( A 1 , . . . , A K ) = ∑ k = 1 K W ( A k , A ‾ k ) = ∑ k = 1 K W ( A k , V ) − ∑ k = 1 K W ( A k , A k ) Cut(V)=Cut(A_1,...,A_K)=\sum_{k=1}^K W(A_k,\overline A_k)=\sum_{k=1}^K W(A_k,V)-\sum_{k=1}^K W(A_k,A_k) Cut(V)=Cut(A1,...,AK)=k=1KW(Ak,Ak)=k=1KW(Ak,V)k=1KW(Ak,Ak)
我们的目标是 m i n { A k } k = 1 K C u t ( V ) \underset{\{A_k\}_{k=1}^K}{min}Cut(V) {Ak}k=1KminCut(V)切割之后的各个子图内的权重很大,子图之间的权重很小。),但是直接拿Cut作为目标函数会有问题,如下图我们选择一个权重最小的边缘的点,比如C和H之间进行cut,这样可以最小化 C u t ( A 1 , A 2 , . . . A k ) Cut(A_1,A_2,...A_k) Cut(A1,A2,...Ak), 但是却不是最优的切图。

在这里插入图片描述

对Cut做一个normalize。

Ncut的定义:

除以度来做一个Normalize,度用 d e g r e e ( A k ) degree(A_k) degree(Ak)表示。
c u t ( V ) = ∑ k = 1 K W ( A k , A ‾ k ) → N c u t = ∑ k = 1 K W ( A k , A ‾ k ) Δ Δ = d e g r e e ( A k ) = ∑ i ∈ A k d i       d i = ∑ j = 1 N w i j → N c u t = ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i       d i = ∑ j = 1 N w i j = ∑ k = 1 K W ( A k , V ) − W ( A k , A k ) ∑ i ∈ A k d i = ∑ k = 1 K W ( A k , V ) − W ( A k , A k ) ∑ i ∈ A k ∑ j = 1 N w i j cut(V)=\sum_{k=1}^KW(A_k,\overline A_k)\\ \\ \\ →Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\Delta}\\ \\ \Delta=degree(A_k)=\sum_{i∈A_k}d_i\ \ \ \ \ d_i=\sum_{j=1}^Nw_{ij}\\ \\ \\ →Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\ \ \ \ \ d_i=\sum_{j=1}^Nw_{ij}\\ \\ \\ =\sum_{k=1}^K\frac{W(A_k,V)-W(A_k,A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ =\sum_{k=1}^K\frac{W(A_k,V)-W(A_k,A_k)}{\sum_{i∈A_k}\sum_{j=1}^Nw_{ij}} cut(V)=k=1KW(Ak,Ak)Ncut=k=1KΔW(Ak,Ak)Δ=degree(Ak)=iAkdi     di=j=1NwijNcut=k=1KiAkdiW(Ak,Ak)     di=j=1Nwij=k=1KiAkdiW(Ak,V)W(Ak,Ak)=k=1KiAkj=1NwijW(Ak,V)W(Ak,Ak)

优化目标:
m i n   { A k } k = 1 K N c u t ( V ) \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V) {Ak}k=1Kmin Ncut(V)

Model
m i n   { A k } k = 1 K N c u t ( V ) = m i n   { A k } k = 1 K ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i → { A k } k = 1 K = a r g m i n   { A k } k = 1 K N c u t ( V ) = a r g m i n   { A k } k = 1 K ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ →\{A_k\}_{k=1}^K=arg \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=arg\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i} {Ak}k=1Kmin Ncut(V)={Ak}k=1Kmin k=1KiAkdiW(Ak,Ak){Ak}k=1K=arg{Ak}k=1Kmin Ncut(V)=arg{Ak}k=1Kmin k=1KiAkdiW(Ak,Ak)
引入指示向量indicator vector:
{ y i ∈ { 0 , 1 } K ∑ j = 1 K y i j = 1           y i = [ y i 1 y i 2 . . . y i K ]          y i j = 1 ⇔ 第  i 个样本属于第  j 个类别 \begin{cases} y_i ∈\{0,1\}^K \\ \\ \sum_{j=1}^Ky_{ij}=1 \\ \end{cases}\ \ \ \ \ \ \ \ \ y_i= \begin{bmatrix} y_{i1}\\ y_{i2}\\ ... \\ y_{iK} \end{bmatrix} \ \ \ \ \ \ \ \ \\ y_{ij}=1⇔第\ i个样本属于第\ j个类别 yi{0,1}Kj=1Kyij=1         yi= yi1yi2...yiK         yij=1 i个样本属于第 j个类别

Y = [ y 1 , . . . y K ] N × K ⊤ 将问题模型转换 : Y ^ = a r g m i n Y ^ N c u t ( V ) Y=[y_1,...y_K]^\top_{N×K}\\ \\ 将问题模型转换: \hat Y=arg \underset{\hat Y}{min}Ncut(V) Y=[y1,...yK]N×K将问题模型转换:Y^=argY^minNcut(V)

目的是将问题模型转换: Y ^ = a r g m i n Y ^ N c u t ( V ) \hat Y=arg \underset{\hat Y}{min}Ncut(V) Y^=argY^minNcut(V)

将Ncut转换成矩阵的形式:
N c u t = ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i = T r [ W ( A 1 , A ‾ 1 ) ∑ i ∈ A 1 d i 0 . . . 0 0 W ( A 2 , A ‾ 2 ) ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ∑ i ∈ A K d i ] = T r [ W ( A 1 , A ‾ 1 ) 0 . . . 0 0 W ( A 2 , A ‾ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ] [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] − 1 Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ =Tr \begin{bmatrix} \frac{W(A_1,\overline A_1)}{\sum_{i∈A_1}d_i} & 0 & ... & 0 \\ 0 & \frac{W(A_2,\overline A_2)}{\sum_{i∈A_2}d_i} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \frac{W(A_K,\overline A_K)}{\sum_{i∈A_K}d_i} \end{bmatrix}\\ \\ \\ =Tr \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix} \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}^{-1} Ncut=k=1KiAkdiW(Ak,Ak)=Tr iA1diW(A1,A1)0...00iA2diW(A2,A2)...0............00...iAKdiW(AK,AK) =Tr W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK) iA1di0...00iA2di...0............00...iAKdi 1

记 O K × K = [ W ( A 1 , A ‾ 1 ) 0 . . . 0 0 W ( A 2 , A ‾ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ] P K × K = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] 记O_{K×K}= \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix}\\ \\ \\ P_{K×K}= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix} OK×K= W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK) PK×K= iA1di0...00iA2di...0............00...iAKdi

现在问题转化为: m i n { A k } k = 1 K N c u t ( V ) = m i n { A k } k = 1 K T r ( O P − 1 ) \underset{\{A_k\}_{k=1}^K}{min}Ncut(V)=\underset{\{A_k\}_{k=1}^K}{min}Tr(OP^{-1}) {Ak}k=1KminNcut(V)={Ak}k=1KminTr(OP1)

已知W、Y,求O、P,我们要将O和P用Y和W表示:

先求解P:
Y ⊤ Y = [ y 1 , . . . y N ] [ y 1 T y 2 T . . . y N T ] Y^\top Y=[y_1,...y_N] \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix} YY=[y1,...yN] y1Ty2T...yNT
= ∑ i = 1 N y i y i T = [ N 1 0 . . . 0 0 N 2 . . . 0 . . . . . . . . . . . . 0 0 . . . N K ] K × K =\sum_{i=1}^Ny_iy_i^T= \begin{bmatrix} N_1 & 0 & ... & 0 \\ 0 & N_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & N_K \end{bmatrix}_{K×K} =i=1NyiyiT= N10...00N2...0............00...NK K×K
= [ ∑ i ∈ A 1 1 0 . . . 0 0 ∑ i ∈ A 2 1 . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K 1 ] K × K =\begin{bmatrix} \sum_{i∈A_1}1 & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}1 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}1 \end{bmatrix}_{K×K} = iA110...00iA21...0............00...iAK1 K×K

N k N_k Nk 的含义:在N个样本中,属于类别k的样本个数。 ∑ k = 1 N N k = N , N k = ∣ A k ∣ = ∑ i ∈ A k 1 \sum_{k=1}^NN_k=N,N_k=|A_k|=\sum_{i∈A_k}1 k=1NNk=N,Nk=Ak=iAk1

∑ i = 1 N y i d i y i T = y 1 d 1 y 1 T + y 2 d 2 y 2 T . . . + y N d N y N T = Y T D Y P K × K = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] = Y T D Y 其中 , D = [ d 1 0 . . . 0 0 d 2 . . . 0 . . . . . . . . . . . . 0 0 . . . d N ] = d i a g ( W ⋅ 1 N ) = [ ∑ j = 1 N w 1 j 0 . . . 0 0 ∑ j = 1 N w 2 j . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ j = 1 N w N j ] \sum_{i=1}^Ny_id_iy_i^T=y_1d_1y_1^T+y_2d_2y_2^T...+y_Nd_Ny_N^T=Y^TDY \\ \\ \\ P_{K×K}= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}=Y^TDY\\ \\ \\ 其中,D= \begin{bmatrix} d_1 & 0 & ... & 0 \\ 0 & d_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & d_N \end{bmatrix}=diag(W·\mathbf{1}_N)= \begin{bmatrix} \sum_{j=1}^Nw_{1j} & 0 & ... & 0 \\ 0 & \sum_{j=1}^Nw_{2j} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{j=1}^Nw_{Nj} \end{bmatrix}\\ \\ \\ i=1NyidiyiT=y1d1y1T+y2d2y2T...+yNdNyNT=YTDYPK×K= iA1di0...00iA2di...0............00...iAKdi =YTDY其中,D= d10...00d2...0............00...dN =diag(W1N)= j=1Nw1j0...00j=1Nw2j...0............00...j=1NwNj
所以我们求解的P为

P = Y T D Y 其中 , D = d i a g ( W ⋅ 1 N ) P=Y^TDY\\ \\ \\ 其中,D=diag(W·\mathbf{1}_N) P=YTDY其中,D=diag(W1N)
再求解O:

O K × K = [ W ( A 1 , A ‾ 1 ) 0 . . . 0 0 W ( A 2 , A ‾ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ] W ( A k , A k ‾ ) = W ( A k , V ) ⏟ ∑ i ∈ A k d i − W ( A k , A k ) ⏟ ∑ i ∈ A k ∑ j ∈ A k w i j → O = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] − [ W ( A 1 , A 1 ) 0 . . . 0 0 W ( A 2 , A 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A K ) ] O_{K×K}= \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix}\\ \\ \\ W(A_k,\overline{A_k})=\underbrace{W(A_k,V)}_{\sum_{i∈A_k}d_i}-\underbrace{W(A_k,A_k)}_{\sum_{i∈A_k}\sum_{j∈A_k}w_{ij}}\\ \\ \\ →O= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}- \begin{bmatrix} W(A_1,A_1) & 0 & ... & 0 \\ 0 & W(A_2, A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K, A_K) \end{bmatrix}\\ \\ \\ OK×K= W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK) W(Ak,Ak)=iAkdi W(Ak,V)iAkjAkwij W(Ak,Ak)O= iA1di0...00iA2di...0............00...iAKdi W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK)
前面的矩阵我们知道: Y T D Y Y^TDY YTDY, 再来看后面部分:

[ W ( A 1 , A 1 ) 0 . . . 0 0 W ( A 2 , A 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A K ) ] \begin{bmatrix} W(A_1,A_1) & 0 & ... & 0 \\ 0 & W(A_2, A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K, A_K) \end{bmatrix} W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK)

= [ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j 0 . . . 0 0 ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . 0 . . . . . . . . . . . . 0 . . . . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] =\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & 0\\ ... & ... & ... & ...\\ 0 & ... & ...& \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} = iA1jA1wij0...00iA2jA2wij..................00...iAKjAKwij

猜想后半部分是否等于 Y T W Y Y^TWY YTWY 验证一下:
Y T W Y 维度是 K × K 维 Y T W Y = [ y 1 , . . . y N ] [ w 11 w 12 . . . w 1 N w 21 w 22 . . . w 2 N . . . . . . . . . . . . w N 1 w N 2 . . . w N N ] [ y 1 T y 2 T . . . y N T ] = [ ∑ i = 1 N y i w i 1 , . . . , ∑ i = 1 N y i w N i ] [ y 1 T y 2 T . . . y N T ] = ∑ i = 1 N ∑ j = 1 N y i w i j y i T = ∑ i = 1 N ∑ j = 1 N y i y i T w i j = [ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j ∑ i ∈ A 1 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 1 ∑ j ∈ A K w i j ∑ i ∈ A 2 ∑ j ∈ A 1 w i j ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 2 ∑ j ∈ A K w i j . . . . . . . . . . . . ∑ i ∈ A K ∑ j ∈ A 1 w i j ∑ i ∈ A K ∑ j ∈ A 2 w i j . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] Y^TWY维度是K×K维\\ \\ \\ Y^TWY=[y_1,...y_N] \begin{bmatrix} w_{11} & w_{12} & ... & w_{1N} \\ w_{21} & w_{22} & ... & w_{2N}\\ ... & ... & ... & ...\\ w_{N1} & w_{N2} & ... & w_{NN} \end{bmatrix} \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}\\ \\ =[\sum_{i=1}^Ny_iw_{i1},...,\sum_{i=1}^Ny_iw_{Ni}] \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}\\ \\ =\sum_{i=1}^N\sum_{j=1}^Ny_iw_{ij}y_i^T=\sum_{i=1}^N\sum_{j=1}^Ny_iy_i^Tw_{ij}\\ \\ =\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_1}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_1}\sum_{j∈A_K}w_{ij} \\ \sum_{i∈A_2}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_2}\sum_{j∈A_K}w_{ij}\\ ... & ... & ... & ...\\ \sum_{i∈A_K}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_K}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} YTWY维度是K×KYTWY=[y1,...yN] w11w21...wN1w12w22...wN2............w1Nw2N...wNN y1Ty2T...yNT =[i=1Nyiwi1,...,i=1NyiwNi] y1Ty2T...yNT =i=1Nj=1NyiwijyiT=i=1Nj=1NyiyiTwij= iA1jA1wijiA2jA1wij...iAKjA1wijiA1jA2wijiA2jA2wij...iAKjA2wij............iA1jAKwijiA2jAKwij...iAKjAKwij
观察上式和O的后半部分:
[ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j 0 . . . 0 0 ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . 0 . . . . . . . . . . . . 0 . . . . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] \begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & 0\\ ... & ... & ... & ...\\ 0 & ... & ...& \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} \\ \\ \\ iA1jA1wij0...00iA2jA2wij..................00...iAKjAKwij

[ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j ∑ i ∈ A 1 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 1 ∑ j ∈ A K w i j ∑ i ∈ A 2 ∑ j ∈ A 1 w i j ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 2 ∑ j ∈ A K w i j . . . . . . . . . . . . ∑ i ∈ A K ∑ j ∈ A 1 w i j ∑ i ∈ A K ∑ j ∈ A 2 w i j . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] \begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_1}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_1}\sum_{j∈A_K}w_{ij} \\ \sum_{i∈A_2}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_2}\sum_{j∈A_K}w_{ij}\\ ... & ... & ... & ...\\ \sum_{i∈A_K}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_K}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} iA1jA1wijiA2jA1wij...iAKjA1wijiA1jA2wijiA2jA2wij...iAKjA2wij............iA1jAKwijiA2jAKwij...iAKjAKwij

我们发现,这两个矩阵对角线元素是相同的,又因为我们是对迹求最小,即只考虑对角线的元素。所以将O的后半部分换成$Y^TWY $并不影响我们的结果

O ′ = Y T D Y − Y T W Y O'=Y^TDY - Y^TWY O=YTDYYTWY 那么 O ′ P O'P OP相当于对 O ′ O' O的对角线做一些变化。那么就有 T r ( O P ) = T r ( O ′ P ) Tr(OP)=Tr(O'P) Tr(OP)=Tr(OP)

至此我们解出了 O O O ,并且提出了用 O ′ O' O 代替 O O O 可以达到同样的目的。
O ′ = Y T D Y − Y T W Y O'=Y^TDY-Y^TWY O=YTDYYTWY

我们最终的优化问题变为:
Y ^ = a r g m i n Y ^   T r ( Y T ( D − W ) Y ( Y T D Y ) − 1 ) = Y ^ = a r g m i n Y ^   T r ( Y T L Y ( Y T D Y ) − 1 ) 这里 L = D − W 是拉普拉斯矩阵 \hat Y=arg \underset{\hat Y}{min}\ Tr(Y^T(D-W)Y(Y^TDY)^{-1})\\ \\ \\ =\hat Y=arg \underset{\hat Y}{min}\ Tr(Y^TLY(Y^TDY)^{-1})\\ \\ 这里L=D-W是拉普拉斯矩阵 Y^=argY^min Tr(YT(DW)Y(YTDY)1)=Y^=argY^min Tr(YTLY(YTDY)1)这里L=DW是拉普拉斯矩阵

To minimaze T r ( Y T L Y ( Y T D Y ) − 1 ) Tr(Y^T LY(Y^T DY)^{-1}) Tr(YTLY(YTDY)1)

T r ( Y T L Y ( Y T D Y ) − 1 ) Tr(Y^T LY(Y^T DY)^{-1}) Tr(YTLY(YTDY)1)

其中 Y ∈ R N × K Y∈R^{N×K} YRN×K ,每一行是ONE-HOT,表示第i行属于哪一类。 Y Y Y 形如:
[ 0 . . . 0 1 0 . . . 0 0 . . . 1 0 0 . . . 0 . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 0 0 . . . 1 1 . . . 0 1 0 . . . 0 ] \begin{bmatrix} 0 & ...& 0 & 1 & 0 &... & 0 \\ 0 & ...&1 & 0 & 0 &... & 0\\ ... & ...& ... & ... & ... & ... & ...\\ 0 & ...& 0 & 0 & 0 &... & 1\\ 1 & ...&0 & 1 & 0 &... & 0 \end{bmatrix} 00...01...............01...0010...0100...00...............00...10
记:
P = Y T D Y = d i a g ( ∑ i ∈ A 1 d i , ∑ i ∈ A 2 d i , . . . , ∑ i ∈ A K d i ) = d i a g ( p 1 , p 2 , . . . , p k ) 原式 = T r ( Y T L Y P − 1 ) = T r ( Y T L Y P − 1 2 P − 1 2 ) = T r ( P − 1 2 Y T L Y P − 1 2 ) P=Y^TDY=diag(\sum_{i∈A_1}d_i,\sum_{i∈A_2}d_i,...,\sum_{i∈A_K}d_i)=diag(p_1,p_2,...,p_k)\\ \\ 原式=Tr(Y^TLYP^{-1})=Tr(Y^TLYP^{-\frac12}P^{-\frac12})=Tr(P^{-\frac12}Y^TLYP^{-\frac12}) P=YTDY=diag(iA1di,iA2di,...,iAKdi)=diag(p1,p2,...,pk)原式=Tr(YTLYP1)=Tr(YTLYP21P21)=Tr(P21YTLYP21)
记:
H = Y P − 1 2 , H T = P − 1 2 Y T H T H = P − 1 2 Y T Y P − 1 2 = P − 1 2 I P − 1 2 = P − 1 H=YP^{-\frac12},H^T=P^{-\frac12}Y^T\\ \\ H^TH=P^{-\frac12}Y^TYP^{-\frac12}=P^{-\frac12}IP^{-\frac12}=P^{-1} H=YP21,HT=P21YTHTH=P21YTYP21=P21IP21=P1

原式 = T r ( H T L H ) 原式=Tr(H^TLH) 原式=Tr(HTLH)
定理1:

对于半正定矩阵L, 特征值(eigenvalue): 0 ≤ λ 1 ≤ λ 2 ≤ . . . ≤ λ n 0≤\lambda_1≤\lambda_2≤...≤\lambda_n 0λ1λ2...λn

特征基(eigbasis): { v ‾ 1 , v ‾ 2 , . . . , v ‾ n } \{\overline v_1,\overline v_2,...,\overline v_n\} {v1,v2,...,vn} →Orthonormal,标准正交化之后的特征向量

x ∈ R N , a n d    x T x = 1 \mathbf{x}∈R^{N},and\ \ \mathbf{x}^T\mathbf{x}=\mathbf{1} xRN,and  xTx=1 时, x T L x \mathbf{x}^TL\mathbf{x} xTLx 的最小值在 x = v ‾ 1 \mathbf{x}=\overline v_1 x=v1 时取到。

proof:
x 可以用 e i g b a s i s 表示 ,  因为 e i g b a s i s 是 o r t h o n o r m a l x = c 1 v ‾ 1 + c 2 v ‾ 2 + . . . + c n v ‾ n L x = λ x = c 1 λ 1 v ‾ 1 + c 2 λ 2 v ‾ 2 + . . . + c n λ n v ‾ n → x T L x = c 1 2 λ 1 + c 2 2 λ 2 + . . . + c n 2 λ n 因为 x T x = 1 → c 1 2 + c 2 2 + . . . + c n 2 → x T L x = c 1 2 λ 1 + c 2 2 λ 2 + . . . + c n 2 λ n ≥ λ 1 当 c 1 2 = 1 , c i = 0 , i ≠ 1 时等号成立 ⇔ x = v ‾ 1    o r    x = − v ‾ 1 \mathbf{x}可以用eigbasis表示, \ 因为eigbasis是orthonormal\\ \\ \mathbf{x}=c_1\overline v_1+c_2\overline v_2+...+c_n\overline v_n\\ \\ L\mathbf{x}=\lambda \mathbf{x}=c_1\lambda_1\overline v_1+c_2\lambda_2\overline v_2+...+c_n\lambda_n\overline v_n\\ \\ →\mathbf{x}^TL\mathbf{x}=c_1^2\lambda_1+c_2^2\lambda_2+...+c_n^2\lambda_n\\ \\ 因为\mathbf{x}^T\mathbf{x}=\mathbf{1}→c_1^2+c_2^2+...+c_n^2\\ \\ →\mathbf{x}^TL\mathbf{x}=c_1^2\lambda_1+c_2^2\lambda_2+...+c_n^2\lambda_n≥\lambda_1\\ 当c_1^2=1,c_i=0,i≠1时等号成立⇔\mathbf{x}=\overline v_1\ \ or\ \ \mathbf{x}=-\overline v_1 x可以用eigbasis表示, 因为eigbasisorthonormalx=c1v1+c2v2+...+cnvnLx=λx=c1λ1v1+c2λ2v2+...+cnλnvnxTLx=c12λ1+c22λ2+...+cn2λn因为xTx=1c12+c22+...+cn2xTLx=c12λ1+c22λ2+...+cn2λnλ1c12=1,ci=0,i=1时等号成立x=v1  or  x=v1
定理2:

对于半正定矩阵L, 特征值(eigenvalue): 0 ≤ λ 1 ≤ λ 2 ≤ . . . ≤ λ n 0≤\lambda_1≤\lambda_2≤...≤\lambda_n 0λ1λ2...λn

特征基(eigbasis): { v ‾ 1 , v ‾ 2 , . . . , v ‾ n } \{\overline v_1,\overline v_2,...,\overline v_n\} {v1,v2,...,vn} →Orthonormal,标准正交化之后的特征向量

F ∈ R N × K ,   a n d   F T F = I F∈R^{N×K},\ and\ F^TF=I FRN×K, and FTF=I 时, T r ( F T L F ) Tr(F^TLF) Tr(FTLF) 的最小值在 F = [ v ‾ 1 , v ‾ 2 , . . . , v ‾ K ] F=[\overline v_1,\overline v_2,...,\overline v_K] F=[v1,v2,...,vK] 时取到

proof:
D e n o t e     F = [ f 1 , f 2 , . . . , f K ] T r ( F T L F ) = ∑ i = 1 K f i T L f i 由于定理 2     f 1 = v ‾ 1 , f 2 = v ‾ 2 , . . . , f n = v ‾ n 时 , T r ( F T L F ) 最小 Denote\ \ \ F=[f_1,f_2,...,f_K]\\ \\ Tr(F^TLF)=\sum_{i=1}^Kf_i^TLf_i\\ \\ 由于定理2\ \ \ f_1=\overline v_1 , f_2=\overline v_2,...,f_n=\overline v_n 时,Tr(F^TLF)最小 Denote   F=[f1,f2,...,fK]Tr(FTLF)=i=1KfiTLfi由于定理2   f1=v1,f2=v2,...,fn=vn,Tr(FTLF)最小
因为 F T F = I F^TF=I FTF=I,所以F是orthonormal matrix,故不能每列都是 v ‾ 1 \overline v_1 v1

原始优化问题 T r ( H T L H ) Tr(H^TLH) Tr(HTLH) 并没有 H T H = I H^TH=I HTH=I的性质,无法用定理2,于是对H做一些变换。

H T D H = P − 1 2 Y T D Y P − 1 2 = P − 1 2 P P − 1 2 = I 记 F = D 1 2 H → F T F = ( D 1 2 H ) T D 1 2 H = H T D 1 2 D 1 2 H = H T D H = I 则 H = D − 1 2 F → T r ( H T L H ) = T r ( F T D − 1 2 L D − 1 2 F ) ,      F T F = I H^TDH=P^{-\frac12}Y^TDYP^{-\frac12}=P^{-\frac12}PP^{-\frac12}=I\\ 记F=D^{\frac12}H→F^TF=(D^{\frac12}H)^TD^{\frac12}H=H^TD^{\frac12}D^{\frac12}H=H^TDH=I\\ \\ 则H=D^{-\frac12}F\\ \\ →Tr(H^TLH)=Tr(F^TD^{-\frac12}LD^{-\frac12}F),\ \ \ \ F^TF=I HTDH=P21YTDYP21=P21PP21=IF=D21HFTF=(D21H)TD21H=HTD21D21H=HTDH=IH=D21FTr(HTLH)=Tr(FTD21LD21F),    FTF=I
至此我们得到最终的优化目标:
m i n F   T r ( F T D − 1 2 L D − 1 2 F ) ,      s . t . F T F = I \underset{F}{min}\ Tr(F^TD^{-\frac12}LD^{-\frac12}F),\ \ \ \ \\ s.t.F^TF=I Fmin Tr(FTD21LD21F),    s.t.FTF=I
在解出的F上再做一次k-means,最终求得Y

  • 23
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值