C2eg1-Principal Components Analysis

Suppose

  • m m m points { x ( 1 ) , ⋯   , x ( m ) } ∈ R n \{\mathbf{x}^{(1)},\cdots,\mathbf{x}^{(m)}\} \in \mathbb{R}^n {x(1),,x(m)}Rn
  • each point x ( i ) ∈ R n \mathbf{x}^{(i)}\in \mathbb{R}^n x(i)Rn corresponding to code vector c ( i ) ∈ R l \mathbf{c}^{(i)}\in\mathbb{R}^l c(i)Rl
  • encode function: f ( x ) = c f(\mathbf{x})=\mathbf{c} f(x)=c
  • decode function: x ≈ g ( f ( x ) ) x\approx g(f(\mathbf{x})) xg(f(x))

Definition

  • PCA is defined by our choice of the decoding function
  • decoding: D ∈ R n × l \mathbf{D}\in\mathbb{R}^{n\times l} DRn×l, where g ( c ) = D c g(\mathbf{c})=\mathbf{Dc} g(c)=Dc.
  • constraints for simplify encoding problem: the columns of D \mathbf{D} D must be orthogonal to each other.
  • In PCA, L 2 L^2 L2 norm use to optimal code point c ∗ \mathbf{c}^* c and the squaring operation for monotonically increasing
    c ∗ = arg ⁡ min ⁡ c ∥ x − g ( c ) ∥ 2 2 \mathbf{c}^*=\arg\min\limits_{\mathbf{c}}\|\mathbf{x}-g(\mathbf{c})\|_2^2 c=argcminxg(c)22

Solving

∥ x − g ( c ) ∥ 2 2 = ( x − g ( c ) ) T ( x − g ( c ) ) = x T x − x T g ( c ) − g ( c ) T x + g ( c ) T g ( c ) = x T x − 2 x T g ( c ) + g ( c ) T g ( c ) \begin{matrix} \|\mathbf{x}-g(\mathbf{c})\|_2^2 &=& (\mathbf{x}-g(\mathbf{c}))^T(\mathbf{x}-g(\mathbf{c}))\\ &=&\mathbf{x}^T\mathbf{x}-\mathbf{x}^Tg(\mathbf{c})-g(\mathbf{c})^T\mathbf{x}+g(\mathbf{c})^Tg(\mathbf{c})\\ &=&\mathbf{x}^T\mathbf{x}-2\mathbf{x}^Tg(\mathbf{c})+g(\mathbf{c})^Tg(\mathbf{c}) \end{matrix} xg(c)22===(xg(c))T(xg(c))xTxxTg(c)g(c)Tx+g(c)Tg(c)xTx2xTg(c)+g(c)Tg(c)
Simplify the problem with the orthogonality and unit norm constraints on D \mathbf{D} D:
c ∗ = arg ⁡ min ⁡ c x T x − 2 x T g ( c ) + g ( c ) T g ( c ) = arg ⁡ min ⁡ c − 2 x T g ( c ) + g ( c ) T g ( c ) = arg ⁡ min ⁡ c − 2 x T D c + c T D T D c = arg ⁡ min ⁡ c − 2 x T D c + c T I l c = arg ⁡ min ⁡ c − 2 x T D c + c T c \begin{matrix} \mathbf{c}^*&=&\arg\min\limits_{\mathbf{c}}\mathbf{x}^T\mathbf{x}-2\mathbf{x}^Tg(\mathbf{c})+g(\mathbf{c})^Tg(\mathbf{c})\\ &=&\arg\min\limits_{\mathbf{c}}-2\mathbf{x}^Tg(\mathbf{c})+g(\mathbf{c})^Tg(\mathbf{c})\\ &=&\arg\min\limits_{\mathbf{c}}-2\mathbf{x}^T\mathbf{Dc}+\mathbf{c}^T\mathbf{D}^T\mathbf{Dc}\\ &=&\arg\min\limits_{\mathbf{c}}-2\mathbf{x}^T\mathbf{Dc}+\mathbf{c}^T\mathbf{I}_l\mathbf{c}\\ &=&\arg\min\limits_{\mathbf{c}}-2\mathbf{x}^T\mathbf{Dc}+\mathbf{c}^T\mathbf{c} \end{matrix} c=====argcminxTx2xTg(c)+g(c)Tg(c)argcmin2xTg(c)+g(c)Tg(c)argcmin2xTDc+cTDTDcargcmin2xTDc+cTIlcargcmin2xTDc+cTc
Solve the optimization problem
∇ c ( − 2 x T D c + c T c ) = 0 − 2 D T x + 2 c = 0 c = D T x \begin{matrix} \nabla_{\mathbf{c}}(-2\mathbf{x}^T\mathbf{Dc}+\mathbf{c}^T\mathbf{c}) = 0\\ -2\mathbf{D}^T\mathbf{x}+2\mathbf{c} = 0\\ \mathbf{c}=\mathbf{D}^T\mathbf{x} \end{matrix} c(2xTDc+cTc)=02DTx+2c=0c=DTx
Get encode function
f ( x ) = x D T x f(x)=\mathbf{x}\mathbf{D}^T\mathbf{x} f(x)=xDTx
Get decode function
r ( x ) = g ( f ( x ) ) = D D T x r(\mathbf{x})=g(f(x))=\mathbf{D}\mathbf{D}^T\mathbf{x} r(x)=g(f(x))=DDTx
Choose the encoding matrix D \mathbf{D} D
D ∗ = arg ⁡ min ⁡ D ∑ i , j ( x j ( i ) − r ( x ( i ) ) j ) 2 subject to D T D = I l \begin{matrix} \mathbf{D}^*=\arg\min\limits_{\mathbf{D}}\sqrt{\sum\limits_{i,j}(x_j^{(i)}-r(\mathbf{x}^{(i)})_j)^2}&\text{subject to}&\mathbf{D}^T\mathbf{D}=\mathbf{I}_l \end{matrix} D=argDmini,j(xj(i)r(x(i))j)2 subject toDTD=Il
Set l = 1 l=1 l=1, D \mathbf{D} D is a single vector d \mathbf{d} d
d ∗ = arg ⁡ min ⁡ d ∑ i ∥ x ( i ) − d d T x ( i ) ∥ 2 2 subject to ∥ d ∥ 2 = 1. \begin{matrix} \mathbf{d}^*=\arg\min\limits_{\mathbf{d}}\sum\limits_i\|\mathbf{x}^{(i)}-\mathbf{dd}^T\mathbf{x}^{(i)}\|_2^2&\text{subject to}&\|\mathbf{d}\|_2=1. \end{matrix} d=argdminix(i)ddTx(i)22subject tod2=1.
d T x ( i ) \mathbf{d}^T\mathbf{x}^{(i)} dTx(i) is a value, namely d T x ( i ) = x ( i ) d T \mathbf{d}^T\mathbf{x}^{(i)}=\mathbf{x}^{(i)}\mathbf{d}^T dTx(i)=x(i)dT
d ∗ = arg ⁡ min ⁡ d ∑ i ∥ x ( i ) − d d T x ( i ) ∥ 2 2 = arg ⁡ min ⁡ d ∑ i ∥ x ( i ) − d T x ( i ) d ∥ 2 2 = arg ⁡ min ⁡ d ∑ i ∥ x ( i ) − x ( i ) T d d ∥ 2 2 subject to ∥ d ∥ 2 = 1 = arg ⁡ min ⁡ d ∥ X − X d d T ∥ F 2 subject to d T d = 1 \begin{matrix} \mathbf{d}^*&=&\arg\min\limits_{\mathbf{d}}\sum\limits_i\|\mathbf{x}^{(i)}-\mathbf{dd}^T\mathbf{x}^{(i)}\|_2^2&&\\ &=&\arg\min\limits_{\mathbf{d}}\sum\limits_i\|\mathbf{x}^{(i)}-\mathbf{d}^T\mathbf{x}^{(i)}\mathbf{d}\|_2^2&&\\ &=&\arg\min\limits_{\mathbf{d}}\sum\limits_i\|\mathbf{x}^{(i)}-\mathbf{x}^{(i)T}\mathbf{dd}\|_2^2&\text{subject to}&\|\mathbf{d}\|_2=1\\ &=&\arg\min\limits_{\mathbf{d}}\|\mathbf{X}-\mathbf{Xdd}^T\|_F^2&\text{subject to}&\mathbf{d}^T\mathbf{d}=1 \end{matrix} d====argdminix(i)ddTx(i)22argdminix(i)dTx(i)d22argdminix(i)x(i)Tdd22argdminXXddTF2subject tosubject tod2=1dTd=1
Solving the optimization problem of d \mathbf{d} d
arg ⁡ min ⁡ d ∥ X − X d d T ∥ F 2 = arg ⁡ min ⁡ d Tr ( ( X − X d d T ) T ( X − X d d T ) ) = arg ⁡ min ⁡ d Tr ( X T X − X T X d d T − d d T X T X + d d T X T X d d T ) = arg ⁡ min ⁡ d Tr ( X T X ) − Tr ( X T X d d T ) − Tr ( d d T X T X ) + Tr ( d d T X T X d d T ) = arg ⁡ min ⁡ d − Tr ( X T X d d T ) − Tr ( d d T X T X ) + Tr ( d d T X T X d d T ) = arg ⁡ min ⁡ d − 2 Tr ( X T X d d T ) + Tr ( d d T X T X d d T ) = arg ⁡ min ⁡ d − 2 Tr ( X T X d d T ) + Tr ( X T X d d d d T ) \begin{matrix} \arg\min\limits_{\mathbf{d}}\|\mathbf{X}-\mathbf{Xdd}^T\|_F^2&=&\arg\min\limits_{\mathbf{d}}\text{Tr}\big((\mathbf{X}-\mathbf{Xdd}^T)^T(\mathbf{X}-\mathbf{Xdd}^T)\big)\\ &=&\arg\min\limits_{\mathbf{d}}\text{Tr}(\mathbf{X}^T\mathbf{X}-\mathbf{X}^T\mathbf{Xdd}^T-\mathbf{dd}^T\mathbf{X}^T\mathbf{X}+\mathbf{dd}^T\mathbf{X}^T\mathbf{Xdd}^T)\\ &=&\arg\min\limits_{\mathbf{d}}\text{Tr}(\mathbf{X}^T\mathbf{X})-\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)-\text{Tr}(\mathbf{dd}^T\mathbf{X}^T\mathbf{X})+\text{Tr}(\mathbf{dd}^T\mathbf{X}^T\mathbf{Xdd}^T)\\ &=&\arg\min\limits_{\mathbf{d}}-\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)-\text{Tr}(\mathbf{dd}^T\mathbf{X}^T\mathbf{X})+\text{Tr}(\mathbf{dd}^T\mathbf{X}^T\mathbf{Xdd}^T)\\ &=&\arg\min\limits_{\mathbf{d}}-2\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)+\text{Tr}(\mathbf{dd}^T\mathbf{X}^T\mathbf{Xdd}^T)\\ &=&\arg\min\limits_{\mathbf{d}}-2\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)+\text{Tr}(\mathbf{X}^T\mathbf{Xdd}\mathbf{dd}^T)\\ \end{matrix} argdminXXddTF2======argdminTr((XXddT)T(XXddT))argdminTr(XTXXTXddTddTXTX+ddTXTXddT)argdminTr(XTX)Tr(XTXddT)Tr(ddTXTX)+Tr(ddTXTXddT)argdminTr(XTXddT)Tr(ddTXTX)+Tr(ddTXTXddT)argdmin2Tr(XTXddT)+Tr(ddTXTXddT)argdmin2Tr(XTXddT)+Tr(XTXddddT)
Compute with constraint condition
arg ⁡ min ⁡ d − 2 Tr ( X T X d d T ) + Tr ( X T X d d T d d T ) subject to d T d = 1 = arg ⁡ min ⁡ d − 2 Tr ( X T X d d T ) + Tr ( X T X d d T ) subject to d T d = 1 = arg ⁡ min ⁡ d − Tr ( X T X d d T ) subject to d T d = 1 = arg ⁡ max ⁡ d Tr ( X T X d d T ) subject to d T d = 1 = arg ⁡ max ⁡ d Tr ( d T X T X d ) subject to d T d = 1 \begin{matrix} &\arg\min\limits_{\mathbf{d}}-2\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)+\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T\mathbf{dd}^T)&\text{subject to}&\mathbf{d}^T\mathbf{d}=1\\ =&\arg\min\limits_{\mathbf{d}}-2\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)+\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)&\text{subject to}&\mathbf{d}^T\mathbf{d}=1\\ =&\arg\min\limits_{\mathbf{d}}-\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)&\text{subject to}&\mathbf{d}^T\mathbf{d}=1\\ =&\arg\max\limits_{\mathbf{d}}\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)&\text{subject to}&\mathbf{d}^T\mathbf{d}=1\\ =&\arg\max\limits_{\mathbf{d}}\text{Tr}(\mathbf{d}^T\mathbf{X}^T\mathbf{Xd})&\text{subject to}&\mathbf{d}^T\mathbf{d}=1\\ \end{matrix} ====argdmin2Tr(XTXddT)+Tr(XTXddTddT)argdmin2Tr(XTXddT)+Tr(XTXddT)argdminTr(XTXddT)argdmaxTr(XTXddT)argdmaxTr(dTXTXd)subject tosubject tosubject tosubject tosubject todTd=1dTd=1dTd=1dTd=1dTd=1
Set X T X \mathbf{X}^T\mathbf{X} XTX as A \mathbf{A} A, then the optimal d \mathbf{d} d is given by the eigenvector of A \mathbf{A} A corresponding to the largest eigenvalue

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值