a LLE 权重 求解
x i ∈ R d × 1 x_i \in \mathbb{R}^{d \times 1} xi∈Rd×1, w i ∈ R 1 × n w_i \in \mathbb{R}^{1 \times n} wi∈R1×n, X = [ x 1 … x n ] X = [x_1 \dots x_n] X=[x1…xn], w i = [ w i 1 … w i n ] w_i = [w_{i1} \dots w_{in}] wi=[wi1…win].
min w i ∑ i = 1 n ∥ x i − ∑ j = 1 n w i j x j ∥ 2 2 s . t . ∑ j = 1 n w i j = 1 \begin{aligned} \min_{w_i} & \sum_{i=1}^{n} \left\| x_i - \sum_{j=1}^{n} w_{ij} x_j \right\|_2^2 \\ {\rm s.t.} & \sum_{j=1}^{n} w_{ij} = 1 \\ \end{aligned} wimins.t.i=1∑n∥∥∥∥∥xi−j=1∑nwijxj∥∥∥∥∥22j=1∑nwij=1
∑ i = 1 n ∥ x i − ∑ j = 1 n w i j x j ∥ 2 2 = ∑ i = 1 n ∥ x i − ∑ j = 1 n w i j x j ∥ 2 2 = ∑ i = 1 n ∥ ∑ j = 1 n w i j x i − ∑ j = 1 n w i j x j ∥ 2 2 = ∑ i = 1 n ∥ ∑ j = 1 n w i j ( x i − x j ) ∥ 2 2 = ∑ i = 1 n ∥ ( x i 1 T − X ) w i T ∥ 2 2 = ∑ i = 1 n w i ( x i 1 T − X ) T ( x i 1 T − X ) w i T \begin{aligned} \sum_{i=1}^{n} \left\| x_i - \sum_{j=1}^{n} w_{ij} x_j \right\|_2^2 &= \sum_{i=1}^{n} \left\| x_i - \sum_{j=1}^{n} w_{ij} x_j \right\|_2^2 \\ &= \sum_{i=1}^{n} \left\| \sum_{j=1}^{n} w_{ij} x_i - \sum_{j=1}^{n} w_{ij} x_j \right\|_2^2 \\ &= \sum_{i=1}^{n} \left\| \sum_{j=1}^{n} w_{ij} (x_i - x_j) \right\|_2^2 \\ &= \sum_{i=1}^{n} \left\| (x_i1^T-X)w_i^T \right\|_2^2 \\ &= \sum_{i=1}^{n} w_i(x_i1^T-X)^T(x_i1^T-X)w_i^T \\ \end{aligned} i=1∑n∥∥∥∥∥xi−j=1∑nwijxj∥∥∥∥∥22=i=1∑n∥∥∥∥∥xi−j=1∑nwijxj∥∥∥∥∥22=i=1∑n∥∥∥∥∥j=1∑nwijxi−j=1∑nwijxj∥∥∥∥∥22=i=1∑n∥∥∥∥∥j=1∑nwij(xi−xj)∥∥∥∥∥22=i=1∑n∥∥(xi1T−X)wiT∥∥22=i=1∑nwi(xi1T−X)T(xi1T−X)wiT
∑ j = 1 n w i j = 1 ⇔ w i 1 = 1 \sum_{j=1}^{n} w_{ij} = 1 \Leftrightarrow w_i 1 = 1 j=1∑nwij=1⇔wi1=1
L = ∑ i = 1 n ( w i ( x i 1 T − X ) T ( x i 1 T − X ) w i T + μ i ( w i 1 − 1 ) ) L = \sum_{i=1}^{n} \left( w_i(x_i1^T-X)^T(x_i1^T-X)w_i^T + \mu_i (w_i1-1) \right) L=i=1∑n(wi(xi1T−X)T(xi1T−X)wiT+μi(wi1−1))
0 = ∂ ∂ w i L = ∂ ∂ w i ∑ i = 1 n ( w i ( x i 1 T − X ) T ( x i 1 T − X ) w i T + μ i ( w i 1 − 1 ) ) = ∂ ∂ w i ( w i ( x i 1 T − X ) T ( x i 1 T − X ) w i T + μ i ( w i 1 − 1 ) ) = 2 w i ( x i 1 T − X ) T ( x i 1 T − X ) + μ i 1 T \begin{aligned} 0 = \frac{\partial}{\partial w_i} L &= \frac{\partial}{\partial w_i} \sum_{i=1}^{n} \left( w_i(x_i1^T-X)^T(x_i1^T-X)w_i^T + \mu_i (w_i1-1) \right) \\ &= \frac{\partial}{\partial w_i} \left( w_i(x_i1^T-X)^T(x_i1^T-X)w_i^T + \mu_i (w_i1-1) \right) \\ &= 2w_i(x_i1^T-X)^T(x_i1^T-X) + \mu_i 1^T \\ \end{aligned} 0=∂wi∂L=∂wi∂i=1∑n(wi(xi1T−X)T(xi1T−X)wiT+μi(wi1−1))=∂wi∂(wi(xi1T−X)T(xi1T−X)wiT+μi(wi1−1))=2wi(xi1T−X)T(xi1T−X)+μi1T
w i = − 1 2 μ i 1 T ( x i 1 T − X ) − 1 ( x i 1 T − X ) − T w_i = -\frac{1}{2} \mu_i 1^T (x_i1^T-X)^{-1}(x_i1^T-X)^{-T} wi=−21μi1T(xi1T−X)−1(xi1T−X)−T
1 = w i 1 = − 1 2 μ i 1 T ( x i 1 T − X ) − 1 ( x i 1 T − X ) − T 1 1 = w_i1 = -\frac{1}{2} \mu_i 1^T (x_i1^T-X)^{-1}(x_i1^T-X)^{-T} 1 \\ 1=wi1=−21μi1T(xi1T−X)−1(xi1T−X)−T1
− 1 2 μ i = 1 1 T ( x i 1 T − X ) − 1 ( x i 1 T − X ) − T 1 -\frac{1}{2} \mu_i = \frac{1}{1^T (x_i1^T-X)^{-1}(x_i1^T-X)^{-T} 1} \\ −21μi=1T(xi1T−X)−1(xi1T−X)−T11
w i = − 1 2 μ i 1 T ( x i 1 T − X ) − 1 ( x i 1 T − X ) − T = 1 T ( x i 1 T − X ) − 1 ( x i 1 T − X ) − T 1 T ( x i 1 T − X ) − 1 ( x i 1 T − X ) − T 1 = 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 \begin{aligned} w_i &= -\frac{1}{2} \mu_i 1^T (x_i1^T-X)^{-1}(x_i1^T-X)^{-T} \\ &= \frac{1^T (x_i1^T-X)^{-1}(x_i1^T-X)^{-T}}{1^T (x_i1^T-X)^{-1}(x_i1^T-X)^{-T} 1} \\ &= \frac{1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1}}{1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1} 1} \\ \end{aligned} wi=−21μi1T(xi1T−X)−1(xi1T−X)−T=1T(xi1T−X)−1(xi1T−X)−T11T(xi1T−X)−1(xi1T−X)−T=1T[(xi1T−X)T(xi1T−X)]−111T[(xi1T−X)T(xi1T−X)]−1
w i j = ( 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 ) j = ( 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 ) j 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 = ( 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 ) j ∑ j = 1 n ( 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 ) j = ∑ k = 1 n ( [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 ) k j ∑ j = 1 n ∑ k = 1 n ( [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 ) k j \begin{aligned} w_{ij} &= \left( \frac{1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1}}{1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1} 1} \right)_j \\ &= \frac{\left( 1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1} \right)_j}{1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1} 1} \\ &= \frac{\left( 1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1} \right)_j}{\sum\limits_{j=1}^{n} \left( 1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1} \right)_j} \\ &= \frac{\sum\limits_{k=1}^{n} \left( [(x_i1^T-X)^T(x_i1^T-X)]^{-1} \right)_{kj}}{\sum\limits_{j=1}^{n} \sum\limits_{k=1}^{n} \left( [(x_i1^T-X)^T(x_i1^T-X)]^{-1} \right)_{kj}} \\ \end{aligned} wij=(1T[(xi1T−X)T(xi1T−X)]−111T[(xi1T−X)T(xi1T−X)]−1)j=1T[(xi1T−X)T(xi1T−X)]−11(1T[(xi1T−X)T(xi1T−X)]−1)j=j=1∑n(1T[(xi1T−X)T(xi1T−X)]−1)j(1T[(xi1T−X)T(xi1T−X)]−1)j=j=1∑nk=1∑n([(xi1T−X)T(xi1T−X)]−1)kjk=1∑n([(xi1T−X)T(xi1T−X)]−1)kj
实际上 [ ( x i 1 T − X ) T ( x i 1 T − X ) ] [(x_i1^T-X)^T(x_i1^T-X)] [(xi1T−X)T(xi1T−X)]不可逆, 原因是 ( x i 1 T − X ) (x_i1^T-X) (xi1T−X)第 i i i列是 0 0 0, 所以我们实际采用
w i = 1 T [ ( x i 1 T − X ( i ) ) T ( x i 1 T − X ( i ) ) ] − 1 1 T [ ( x i 1 T − X ( i ) ) T ( x i 1 T − X ( i ) ) ] − 1 1 w_i = \frac{1^T [(x_i1^T-X^{(i)})^T(x_i1^T-X^{(i)})]^{-1}}{1^T [(x_i1^T-X^{(i)})^T(x_i1^T-X^{(i)})]^{-1} 1} wi=1T[(xi1T−X(i))T(xi1T−X(i))]−111T[(xi1T−X(i))T(xi1T−X(i))]−1
w i j = ∑ k = 1 n ( i ) ( [ ( x i 1 T − X ( i ) ) T ( x i 1 T − X ( i ) ) ] − 1 ) k j ∑ j ′ = 1 n ( i ) ∑ k = 1 n ( i ) ( [ ( x i 1 T − X ( i ) ) T ( x i 1 T − X ( i ) ) ] − 1 ) k j ′ w_{ij} = \frac{\sum\limits_{k=1}^{n^{(i)}} \left( [(x_i1^T-X^{(i)})^T(x_i1^T-X^{(i)})]^{-1} \right)_{kj}}{\sum\limits_{j'=1}^{n^{(i)}} \sum\limits_{k=1}^{n^{(i)}} \left( [(x_i1^T-X^{(i)})^T(x_i1^T-X^{(i)})]^{-1} \right)_{kj'}} wij=j′=1∑n(i)k=1∑n(i)([(xi1T−X(i))T(xi1T−X(i))]−1)kj′k=1∑n(i)([(xi1T−X(i))T(xi1T−X(i))]−1)kj
其中 X ( i ) X^{(i)} X(i)仅包含 x i x_i xi的近邻(不包括 x i x_i xi自身), 共 n ( i ) n^{(i)} n(i)个近邻.
b LLE 权重 旋转/平移/缩放 不变性
w i ( X ) = 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 w_i(X) = \frac{1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1}}{1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1} 1} wi(X)=1T[(xi1T−X)T(xi1T−X)]−111T[(xi1T−X)T(xi1T−X)]−1
记对 x i x_i xi的变换为 f ( x i ) f(x_i) f(xi), 记对 X X X的变换为 F ( X ) F(X) F(X). w i ( T ( X ) ) = w i ( X ) w_i(T(X))=w_i(X) wi(T(X))=wi(X), 的充分条件有
1 T [ ( f ( x i ) 1 T − F ( X ) ) T ( f ( x i ) 1 T − F ( X ) ) ] − 1 1 T [ ( f ( x i ) 1 T − F ( X ) ) T ( f ( x i ) 1 T − F ( X ) ) ] − 1 1 = 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 ⇐ ( f ( x i ) 1 T − F ( X ) ) T ( f ( x i ) 1 T − F ( X ) ) = ( x i 1 T − X ) T ( x i 1 T − X ) ⇐ f ( x i ) 1 T − F ( X ) = x i 1 T − X \begin{aligned} & \frac{1^T [(f(x_i)1^T-F(X))^T(f(x_i)1^T-F(X))]^{-1}}{1^T [(f(x_i)1^T-F(X))^T(f(x_i)1^T-F(X))]^{-1} 1} = \frac{1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1}}{1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1} 1} \\ \Leftarrow& (f(x_i)1^T-F(X))^T(f(x_i)1^T-F(X)) = (x_i1^T-X)^T(x_i1^T-X) \\ \Leftarrow& f(x_i)1^T-F(X) = x_i1^T-X \\ \end{aligned} ⇐⇐1T[(f(xi)1T−F(X))T(f(xi)1T−F(X))]−111T[(f(xi)1T−F(X))T(f(xi)1T−F(X))]−1=1T[(xi1T−X)T(xi1T−X)]−111T[(xi1T−X)T(xi1T−X)]−1(f(xi)1T−F(X))T(f(xi)1T−F(X))=(xi1T−X)T(xi1T−X)f(xi)1T−F(X)=xi1T−X
旋转 f ( x i ) = Q x i , F ( X ) = Q X f(x_i)=Qx_i, F(X)=QX f(xi)=Qxi,F(X)=QX, 证明充分条件
( f ( x i ) 1 T − F ( X ) ) T ( f ( x i ) 1 T − F ( X ) ) = ( Q x i 1 T − Q X ) T ( Q x i 1 T − Q X ) = [ Q ( x i 1 T − X ) ] T [ Q ( x i 1 T − X ) ] = ( x i 1 T − X ) T Q T Q ( x i 1 T − X ) = ( x i 1 T − X ) T I ( x i 1 T − X ) = ( x i 1 T − X ) T ( x i 1 T − X ) \begin{aligned} & (f(x_i)1^T-F(X))^T(f(x_i)1^T-F(X)) \\ &= (Qx_i1^T-QX)^T(Qx_i1^T-QX) \\ &= [Q(x_i1^T-X)]^T[Q(x_i1^T-X)] \\ &= (x_i1^T-X)^TQ^TQ(x_i1^T-X) \\ &= (x_i1^T-X)^TI(x_i1^T-X) \\ &= (x_i1^T-X)^T(x_i1^T-X) \\ \end{aligned} (f(xi)1T−F(X))T(f(xi)1T−F(X))=(Qxi1T−QX)T(Qxi1T−QX)=[Q(xi1T−X)]T[Q(xi1T−X)]=(xi1T−X)TQTQ(xi1T−X)=(xi1T−X)TI(xi1T−X)=(xi1T−X)T(xi1T−X)
平移 f ( x i ) = x i + v , F ( X ) = X + v 1 T f(x_i)=x_i+v, F(X)=X+v1^T f(xi)=xi+v,F(X)=X+v1T, 证明充分条件
f ( x i ) 1 T − F ( X ) = ( x i + v ) 1 T − ( X + v 1 T ) = x i 1 T + v 1 T − X − v 1 T = x i 1 T − X \begin{aligned} & f(x_i)1^T-F(X) \\ &= (x_i+v)1^T-(X+v1^T) \\ &= x_i1^T+v1^T-X-v1^T \\ &= x_i1^T-X \\ \end{aligned} f(xi)1T−F(X)=(xi+v)1T−(X+v1T)=xi1T+v1T−X−v1T=xi1T−X
伸缩 f ( x i ) = a x i , F ( X ) = a X f(x_i)=ax_i, F(X)=aX f(xi)=axi,F(X)=aX, 证明充要条件
1 T [ ( f ( x i ) 1 T − F ( X ) ) T ( f ( x i ) 1 T − F ( X ) ) ] − 1 1 T [ ( f ( x i ) 1 T − F ( X ) ) T ( f ( x i ) 1 T − F ( X ) ) ] − 1 1 = 1 T [ ( a x i 1 T − a X ) T ( a x i 1 T − a X ) ] − 1 1 T [ ( a x i 1 T − a X ) T ( a x i 1 T − a X ) ] − 1 1 = 1 T [ a 2 ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 T [ a 2 ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 = a − 2 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 a − 2 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 = 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 T [ ( x i 1 T − X ) T ( x i 1 T − X ) ] − 1 1 \begin{aligned} & \frac{1^T [(f(x_i)1^T-F(X))^T(f(x_i)1^T-F(X))]^{-1}}{1^T [(f(x_i)1^T-F(X))^T(f(x_i)1^T-F(X))]^{-1} 1} \\ &= \frac{1^T [(ax_i1^T-aX)^T(ax_i1^T-aX)]^{-1}}{1^T [(ax_i1^T-aX)^T(ax_i1^T-aX)]^{-1} 1} \\ &= \frac{1^T [a^2(x_i1^T-X)^T(x_i1^T-X)]^{-1}}{1^T [a^2(x_i1^T-X)^T(x_i1^T-X)]^{-1} 1} \\ &= \frac{a^{-2} 1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1}}{a^{-2} 1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1} 1} \\ &= \frac{1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1}}{1^T [(x_i1^T-X)^T(x_i1^T-X)]^{-1} 1} \\ \end{aligned} 1T[(f(xi)1T−F(X))T(f(xi)1T−F(X))]−111T[(f(xi)1T−F(X))T(f(xi)1T−F(X))]−1=1T[(axi1T−aX)T(axi1T−aX)]−111T[(axi1T−aX)T(axi1T−aX)]−1=1T[a2(xi1T−X)T(xi1T−X)]−111T[a2(xi1T−X)T(xi1T−X)]−1=a−21T[(xi1T−X)T(xi1T−X)]−11a−21T[(xi1T−X)T(xi1T−X)]−1=1T[(xi1T−X)T(xi1T−X)]−111T[(xi1T−X)T(xi1T−X)]−1
c LLE 低维表示 意义
max y i ∑ i = 1 n ∥ y i − ∑ j = 1 n w i j y j ∥ 2 2 s . t . ∑ i = 1 n y i = 0 Y 1 = 0 ∑ i = 1 n y i y i T = I Y Y T = I \begin{aligned} \max_{y_i} & \sum_{i=1}^{n} \left\| y_i - \sum_{j=1}^{n} w_{ij} y_j \right\|_2^2 \\ {\rm s.t.} & \sum\limits_{i=1}^n y_i=0 & Y1=0 \\ & \sum\limits_{i=1}^n y_iy_i^T=I & YY^T=I \\ \end{aligned} yimaxs.t.i=1∑n∥∥∥∥∥yi−j=1∑nwijyj∥∥∥∥∥22i=1∑nyi=0i=1∑nyiyiT=IY1=0YYT=I
为什么保持了局部几何性质?
∑
i
=
1
n
∥
x
i
−
∑
j
=
1
n
w
i
j
x
j
∥
2
2
=
∑
i
=
1
n
∥
y
i
−
∑
j
=
1
n
w
i
j
y
j
∥
2
2
\sum_{i=1}^{n} \left\| x_i - \sum_{j=1}^{n} w_{ij} x_j \right\|_2^2 = \sum_{i=1}^{n} \left\| y_i - \sum_{j=1}^{n} w_{ij} y_j \right\|_2^2
∑i=1n∥∥∥xi−∑j=1nwijxj∥∥∥22=∑i=1n∥∥∥yi−∑j=1nwijyj∥∥∥22
x
i
x_i
xi和
y
i
y_i
yi共享了局部的权重
w
i
w_i
wi, 保持邻域内样本之间的线性关系.
样本点
x
i
x_i
xi的坐标能通过它的邻域样本
x
j
x_j
xj通过线性组合而重构出来, 降维后样本点
y
i
y_i
yi的坐标能通过它的邻域样本
y
j
y_j
yj通过相同的线性组合而重构出来. 从而, 原空间邻域内样本之间的线性关系在降维后的低维空间以保持.
从统计的角度
- Y 1 = 0 Y1=0 Y1=0消去各个维度均值不确定性.
- Y Y T = I YY^T=I YYT=I消去各个维度方差不确定性,消去各个维度之间线性相关性不确定性.
从几何的角度
- Y 1 = 0 Y1=0 Y1=0消去平移不确定性.
- Y Y T = I YY^T=I YYT=I消去伸缩不确定性.
证明:
-
Y
1
=
0
Y1=0
Y1=0消去平移不确定性?
Y ′ = Y + v 1 T Y'=Y+v1^T Y′=Y+v1T, Y ′ 1 = 0. Y'1=0. Y′1=0..
Y ′ 1 = Y 1 + v 1 T 1 = 0 + v = v = s e t 0 ⇒ v = 0 Y'1=Y1+v1^T1=0+v=v\stackrel{\rm set}{=}0 \Rightarrow v=0 Y′1=Y1+v1T1=0+v=v=set0⇒v=0
v v v没有自由度, 所以消去了平移不确定性. -
Y
Y
T
=
I
YY^T=I
YYT=I消去旋转不确定性?
Y ′ = Q Y Y'=QY Y′=QY, Q T Q = I Q^TQ=I QTQ=I, Y ′ 1 = 0 Y'1=0 Y′1=0, Y ′ Y ′ T = I Y'Y'^T=I Y′Y′T=I.
Y ′ 1 = Q Y 1 = 0 ≡ s e t 0 Y'1=QY1=0\stackrel{\rm set}{\equiv}0 Y′1=QY1=0≡set0
Y ′ Y ′ T = Q Y Y T Q T = Q Q T = I ≡ s e t I Y'Y'^T=QYY^TQ^T=QQ^T=I\stackrel{\rm set}{\equiv}I Y′Y′T=QYYTQT=QQT=I≡setI
Q Q Q仍有自由度, 所以没有消去旋转不确定性. -
Y
Y
T
=
I
YY^T=I
YYT=I消去伸缩不确定性?
Y ′ = a Y Y'=aY Y′=aY, a ≠ 0 a \neq 0 a=0, Y ′ Y ′ T = I Y'Y'^T=I Y′Y′T=I.
Y ′ Y ′ T = ( a Y ) ( a Y ) T = a 2 Y Y T = a 2 I = s e t I ⇒ a = 1 Y'Y'^T=(aY)(aY)^T=a^2YY^T=a^2I\stackrel{\rm set}{=}I \Rightarrow a=1 Y′Y′T=(aY)(aY)T=a2YYT=a2I=setI⇒a=1
a a a没有自由度, 所以消去了伸缩不确定性.
没有消去旋转不确定性, 会对新的表示产生负面的影响.
- y i y_i yi分量之间可能独立, 但由于没有消去旋转不确定, 实际得到 y i y_i yi分量之间不独立.
-
Y
Y
Y有稀疏解, 但由于没有消去旋转不确定性, 实际得到一个稠密解.
实际上 Q T Q = I Q^TQ=I QTQ=I, Q Q Q的语义除了旋转, 还有反射和排列.
从信号处理的角度, 消去旋转不确定性相当于盲信号分离(通常假设信号之间独立).
从机器学习的角度, 我们可以添加适当的正则化项, 例如 ∥ Y ∥ 1 \|Y\|_1 ∥Y∥1, 迫使尽量多的分量与坐标轴同向, 消除一部分旋转不确定性, 进而获得稀疏解.
d LLE 低维表示 优化
引理 迹(trace)的性质
tr(AB) = tr(BA) tr(ABC) = tr(CAB) = tr(BCA)
y
i
∈
R
d
′
×
1
y_i \in \mathbb{R}^{d' \times 1}
yi∈Rd′×1,
w
i
∈
R
1
×
n
w_i \in \mathbb{R}^{1 \times n}
wi∈R1×n,
Y
=
[
y
1
…
y
n
]
Y = [y_1 \dots y_n]
Y=[y1…yn],
w
i
=
[
w
i
1
…
w
i
n
]
w_i = [w_{i1} \dots w_{in}]
wi=[wi1…win].
(
e
i
)
k
=
{
1
i
f
k
=
1
0
o
t
h
e
r
w
i
s
e
(e_i)_k = \begin{cases} 1 & {\rm if} ~ k=1 \\ 0 & {\rm otherwise} \\ \end{cases}
(ei)k={10if k=1otherwise,
I
=
[
e
1
…
e
n
]
I = [e_1 \dots e_n]
I=[e1…en].
注意 ( I − W T ) T (I-W^T)^T (I−WT)T和 Y T Y Y^TY YTY和 ( I − W T ) (I-W^T) (I−WT)都是 n × n n \times n n×n的方形矩阵, 可以使用引理.
∑ i = 1 n ∥ y i − ∑ j = 1 n w i j y j ∥ 2 2 = ∑ i = 1 n ∥ Y e i − Y w i T ∥ 2 2 = ∥ Y I − Y W T ∥ F 2 = ∥ Y ( I − W T ) ∥ F 2 = t r { [ Y ( I − W T ) ] T [ Y ( I − W T ) ] } = t r [ ( I − W T ) T Y T Y ( I − W T ) ] = t r [ ( I − W T ) ( I − W T ) T Y T Y ] = t r [ ( I − W ) T ( I − W ) Y T Y ] = t r [ M Y T Y ] = ∑ k = 1 n ∑ i = 1 n M k i ( Y T Y ) i k = ∑ k = 1 n ∑ i = 1 n M k i y i T y k \begin{aligned} \sum_{i=1}^{n} \left\| y_i - \sum_{j=1}^{n} w_{ij} y_j \right\|_2^2 &= \sum_{i=1}^{n} \left\| Ye_i - Yw_i^T \right\|_2^2 \\ &= \left\| YI - YW^T \right\|_F^2 \\ &= \left\| Y(I-W^T) \right\|_F^2 \\ &= {\rm tr}\{[Y(I-W^T)]^T[Y(I-W^T)]\} \\ &= {\rm tr}[(I-W^T)^TY^TY(I-W^T)] \\ &= {\rm tr}[(I-W^T)(I-W^T)^TY^TY] \\ &= {\rm tr}[(I-W)^T(I-W)Y^TY] \\ &= {\rm tr}[MY^TY] \\ &= \sum_{k=1}^{n}\sum_{i=1}^{n} M_{ki} \left(Y^TY\right)_{ik} \\ &= \sum_{k=1}^{n}\sum_{i=1}^{n} M_{ki} y_i^Ty_k \\ \end{aligned} i=1∑n∥∥∥∥∥yi−j=1∑nwijyj∥∥∥∥∥22=i=1∑n∥∥Yei−YwiT∥∥22=∥∥YI−YWT∥∥F2=∥∥Y(I−WT)∥∥F2=tr{[Y(I−WT)]T[Y(I−WT)]}=tr[(I−WT)TYTY(I−WT)]=tr[(I−WT)(I−WT)TYTY]=tr[(I−W)T(I−W)YTY]=tr[MYTY]=k=1∑ni=1∑nMki(YTY)ik=k=1∑ni=1∑nMkiyiTyk
e LLE 低维表示 优化 求解
M = ( I − W ) T ( I − W ) = ( I − W T ) ( I − W ) = I − W − W T + W T W \begin{aligned} M &= (I-W)^T(I-W) \\ &= (I-W^T)(I-W) \\ &= I-W-W^T+W^TW \\ \end{aligned} M=(I−W)T(I−W)=(I−WT)(I−W)=I−W−WT+WTW
e.1 M M M的半正定性
要证明 M M M是半正定矩阵, 只需证明对任意 n n n维向量 v ≠ 0 v \neq 0 v=0都有 v T M v ⩾ 0 v^TMv \geqslant 0 vTMv⩾0
-
v
T
M
v
=
v
T
(
I
−
W
)
T
(
I
−
W
)
v
=
[
(
I
−
W
)
v
]
T
[
(
I
−
W
)
v
]
=
∥
(
I
−
W
)
v
∥
2
2
⩾
0
v^TMv = v^T(I-W)^T(I-W)v = [(I-W)v]^T[(I-W)v] = \|(I-W)v\|_2^2 \geqslant 0
vTMv=vT(I−W)T(I−W)v=[(I−W)v]T[(I−W)v]=∥(I−W)v∥22⩾0
v T X v = ( L T v ) T D ( L T v ) v v^TXv = (L^Tv)^TD(L^Tv)v vTXv=(LTv)TD(LTv)v - 综上所述, 对任意 n n n维向量 v ≠ 0 v \neq 0 v=0都有 v T M v ⩾ 0 v^TMv \geqslant 0 vTMv⩾0, 即 M M M是半正定矩阵.
e.2 M M M的特征向量 1 1 1
注意
w
i
1
=
1
w_i1=1
wi1=1, 所以
W
1
=
1
W1=1
W1=1.
(注意
w
⃗
i
1
⃗
=
1
\vec w_i \vec 1 = 1
wi1=1, 所以
W
1
⃗
=
1
⃗
\boldsymbol W \vec 1 = \vec 1
W1=1.)
M 1 = ( I − W − W T + W T W ) 1 = I 1 − W 1 − W T 1 + W T W 1 = 1 − 1 − W T 1 + W T 1 = 0 \begin{aligned} M1 &= (I-W-W^T+W^TW)1 \\ &= I1-W1-W^T1+W^TW1 \\ &= 1-1-W^T1+W^T1 \\ &= 0 \\ \end{aligned} M1=(I−W−WT+WTW)1=I1−W1−WT1+WTW1=1−1−WT1+WT1=0
注意
M
1
=
01
M1=01
M1=01, 所以
M
M
M的特征值
0
0
0有一个特征向量是
1
1
1.
(注意
M
1
⃗
=
0
⋅
1
⃗
\boldsymbol M \vec 1 = 0 \cdot \vec 1
M1=0⋅1, 所以
M
\boldsymbol M
M的特征值
0
0
0有一个特征向量是
1
⃗
\vec 1
1.)
f 实际求解过程中, 对 M M M特征值分解, 特征值升序排列, 去除最小特征值的特征向量 ξ 1 \xi_1 ξ1, Y T = [ y 1 … y n ] T = [ ξ 2 … ξ d ′ + 1 ] Y^T=[y_1 \dots y_n]^T = [\xi_2 \dots \xi_{d'+1}] YT=[y1…yn]T=[ξ2…ξd′+1]就是最终的低维表示.
f.1 舍弃 ξ 1 \xi_1 ξ1, 实际上是舍弃 1 1 1
- 由于
M
M
M是半正定矩阵, 所以
M
M
M的特征值及其特征向量
0 ⩽ σ 1 ⩽ σ 2 ⩽ ⋯ ⩽ σ n 0 ⩽ ξ 1 ⩽ ξ 2 ⩽ … ⩽ ξ n \begin{aligned} 0 \leqslant \sigma_1 \leqslant \sigma_2 \leqslant \dots \leqslant \sigma_n \\ \phantom{0 \leqslant} \xi_1 \phantom{\leqslant} \xi_2 \phantom{\leqslant} \dots \phantom{\leqslant} \xi_n \\ \end{aligned} 0⩽σ1⩽σ2⩽⋯⩽σn0⩽ξ1⩽ξ2⩽…⩽ξn - 又
M
M
M具有特征值
0
0
0及其特征向量是
1
1
1, 所以
M
M
M的特征值及其特征向量
0 = σ 1 ⩽ σ 2 ⩽ ⋯ ⩽ σ n 0 = 1 ⩽ ξ 2 ⩽ … ⩽ ξ n \begin{aligned} 0 = \sigma_1 \leqslant \sigma_2 \leqslant \dots \leqslant \sigma_n \\ \phantom{0 =} 1 \phantom{\leqslant} \xi_2 \phantom{\leqslant} \dots \phantom{\leqslant} \xi_n \\ \end{aligned} 0=σ1⩽σ2⩽⋯⩽σn0=1⩽ξ2⩽…⩽ξn
f.2 舍弃 1 1 1的原因
- 从直观的角度, 每个 y i y_i yi的这个分量都是1, 因此这个分量不包含任何信息, 所以舍弃.
- 从优化的角度, 舍弃
1
1
1能确保约束条件
∑
i
=
1
n
y
i
=
0
\sum\limits_{i=1}^n y_i=0
i=1∑nyi=0.
- 如果不舍弃 1 1 1, 由于每个 y i y_i yi的这个分量都是1, 所以 ( ∑ i = 1 n y i ) \left(\sum\limits_{i=1}^n y_i\right) (i=1∑nyi)的这个分量都是 n n n, 不等于0.
- 如果舍弃
1
1
1, 由于
M
M
M矩阵是规正矩阵(实对称矩阵/Hermitian矩阵是规正矩阵的子集), 所以
M
M
M矩阵的特征向量相互正交.
首先注意到 Y T = [ y 1 … y n ] T = [ ξ 2 … ξ d ′ + 1 ] Y^T=[y_1 \dots y_n]^T = [\xi_2 \dots \xi_{d'+1}] YT=[y1…yn]T=[ξ2…ξd′+1]
然后注意到 ξ 1 T ξ i = 0 , i ≠ 1 \xi_1^T\xi_i=0, i \neq 1 ξ1Tξi=0,i=1
即 ξ 1 T [ ξ 2 … ξ d ′ + 1 ] = 0 \xi_1^T[\xi_2 \dots \xi_{d'+1}]=0 ξ1T[ξ2…ξd′+1]=0
即 1 T [ ξ 2 … ξ d ′ + 1 ] = 0 1^T[\xi_2 \dots \xi_{d'+1}]=0 1T[ξ2…ξd′+1]=0
即 1 T [ ξ 2 … ξ d ′ + 1 ] = 0 1^T[\xi_2 \dots \xi_{d'+1}]=0 1T[ξ2…ξd′+1]=0
即 1 T Y T = 0 1^TY^T=0 1TYT=0
即 Y 1 = 0 Y1=0 Y1=0
即 ∑ i = 1 n y i = 0 \sum\limits_{i=1}^n y_i=0 i=1∑nyi=0