Suppose
- m m m points { x ( 1 ) , ⋯   , x ( m ) } ∈ R n \{\mathbf{x}^{(1)},\cdots,\mathbf{x}^{(m)}\} \in \mathbb{R}^n {x(1),⋯,x(m)}∈Rn
- each point x ( i ) ∈ R n \mathbf{x}^{(i)}\in \mathbb{R}^n x(i)∈Rn corresponding to code vector c ( i ) ∈ R l \mathbf{c}^{(i)}\in\mathbb{R}^l c(i)∈Rl
- encode function: f ( x ) = c f(\mathbf{x})=\mathbf{c} f(x)=c
- decode function: x ≈ g ( f ( x ) ) x\approx g(f(\mathbf{x})) x≈g(f(x))
Definition
- PCA is defined by our choice of the decoding function
- decoding: D ∈ R n × l \mathbf{D}\in\mathbb{R}^{n\times l} D∈Rn×l, where g ( c ) = D c g(\mathbf{c})=\mathbf{Dc} g(c)=Dc.
- constraints for simplify encoding problem: the columns of D \mathbf{D} D must be orthogonal to each other.
- In PCA,
L
2
L^2
L2 norm use to optimal code point
c
∗
\mathbf{c}^*
c∗ and the squaring operation for monotonically increasing
c ∗ = arg min c ∥ x − g ( c ) ∥ 2 2 \mathbf{c}^*=\arg\min\limits_{\mathbf{c}}\|\mathbf{x}-g(\mathbf{c})\|_2^2 c∗=argcmin∥x−g(c)∥22
Solving
∥
x
−
g
(
c
)
∥
2
2
=
(
x
−
g
(
c
)
)
T
(
x
−
g
(
c
)
)
=
x
T
x
−
x
T
g
(
c
)
−
g
(
c
)
T
x
+
g
(
c
)
T
g
(
c
)
=
x
T
x
−
2
x
T
g
(
c
)
+
g
(
c
)
T
g
(
c
)
\begin{matrix} \|\mathbf{x}-g(\mathbf{c})\|_2^2 &=& (\mathbf{x}-g(\mathbf{c}))^T(\mathbf{x}-g(\mathbf{c}))\\ &=&\mathbf{x}^T\mathbf{x}-\mathbf{x}^Tg(\mathbf{c})-g(\mathbf{c})^T\mathbf{x}+g(\mathbf{c})^Tg(\mathbf{c})\\ &=&\mathbf{x}^T\mathbf{x}-2\mathbf{x}^Tg(\mathbf{c})+g(\mathbf{c})^Tg(\mathbf{c}) \end{matrix}
∥x−g(c)∥22===(x−g(c))T(x−g(c))xTx−xTg(c)−g(c)Tx+g(c)Tg(c)xTx−2xTg(c)+g(c)Tg(c)
Simplify the problem with the orthogonality and unit norm constraints on
D
\mathbf{D}
D:
c
∗
=
arg
min
c
x
T
x
−
2
x
T
g
(
c
)
+
g
(
c
)
T
g
(
c
)
=
arg
min
c
−
2
x
T
g
(
c
)
+
g
(
c
)
T
g
(
c
)
=
arg
min
c
−
2
x
T
D
c
+
c
T
D
T
D
c
=
arg
min
c
−
2
x
T
D
c
+
c
T
I
l
c
=
arg
min
c
−
2
x
T
D
c
+
c
T
c
\begin{matrix} \mathbf{c}^*&=&\arg\min\limits_{\mathbf{c}}\mathbf{x}^T\mathbf{x}-2\mathbf{x}^Tg(\mathbf{c})+g(\mathbf{c})^Tg(\mathbf{c})\\ &=&\arg\min\limits_{\mathbf{c}}-2\mathbf{x}^Tg(\mathbf{c})+g(\mathbf{c})^Tg(\mathbf{c})\\ &=&\arg\min\limits_{\mathbf{c}}-2\mathbf{x}^T\mathbf{Dc}+\mathbf{c}^T\mathbf{D}^T\mathbf{Dc}\\ &=&\arg\min\limits_{\mathbf{c}}-2\mathbf{x}^T\mathbf{Dc}+\mathbf{c}^T\mathbf{I}_l\mathbf{c}\\ &=&\arg\min\limits_{\mathbf{c}}-2\mathbf{x}^T\mathbf{Dc}+\mathbf{c}^T\mathbf{c} \end{matrix}
c∗=====argcminxTx−2xTg(c)+g(c)Tg(c)argcmin−2xTg(c)+g(c)Tg(c)argcmin−2xTDc+cTDTDcargcmin−2xTDc+cTIlcargcmin−2xTDc+cTc
Solve the optimization problem
∇
c
(
−
2
x
T
D
c
+
c
T
c
)
=
0
−
2
D
T
x
+
2
c
=
0
c
=
D
T
x
\begin{matrix} \nabla_{\mathbf{c}}(-2\mathbf{x}^T\mathbf{Dc}+\mathbf{c}^T\mathbf{c}) = 0\\ -2\mathbf{D}^T\mathbf{x}+2\mathbf{c} = 0\\ \mathbf{c}=\mathbf{D}^T\mathbf{x} \end{matrix}
∇c(−2xTDc+cTc)=0−2DTx+2c=0c=DTx
Get encode function
f
(
x
)
=
x
D
T
x
f(x)=\mathbf{x}\mathbf{D}^T\mathbf{x}
f(x)=xDTx
Get decode function
r
(
x
)
=
g
(
f
(
x
)
)
=
D
D
T
x
r(\mathbf{x})=g(f(x))=\mathbf{D}\mathbf{D}^T\mathbf{x}
r(x)=g(f(x))=DDTx
Choose the encoding matrix
D
\mathbf{D}
D
D
∗
=
arg
min
D
∑
i
,
j
(
x
j
(
i
)
−
r
(
x
(
i
)
)
j
)
2
subject to
D
T
D
=
I
l
\begin{matrix} \mathbf{D}^*=\arg\min\limits_{\mathbf{D}}\sqrt{\sum\limits_{i,j}(x_j^{(i)}-r(\mathbf{x}^{(i)})_j)^2}&\text{subject to}&\mathbf{D}^T\mathbf{D}=\mathbf{I}_l \end{matrix}
D∗=argDmini,j∑(xj(i)−r(x(i))j)2subject toDTD=Il
Set
l
=
1
l=1
l=1,
D
\mathbf{D}
D is a single vector
d
\mathbf{d}
d
d
∗
=
arg
min
d
∑
i
∥
x
(
i
)
−
d
d
T
x
(
i
)
∥
2
2
subject to
∥
d
∥
2
=
1.
\begin{matrix} \mathbf{d}^*=\arg\min\limits_{\mathbf{d}}\sum\limits_i\|\mathbf{x}^{(i)}-\mathbf{dd}^T\mathbf{x}^{(i)}\|_2^2&\text{subject to}&\|\mathbf{d}\|_2=1. \end{matrix}
d∗=argdmini∑∥x(i)−ddTx(i)∥22subject to∥d∥2=1.
d
T
x
(
i
)
\mathbf{d}^T\mathbf{x}^{(i)}
dTx(i) is a value, namely
d
T
x
(
i
)
=
x
(
i
)
d
T
\mathbf{d}^T\mathbf{x}^{(i)}=\mathbf{x}^{(i)}\mathbf{d}^T
dTx(i)=x(i)dT
d
∗
=
arg
min
d
∑
i
∥
x
(
i
)
−
d
d
T
x
(
i
)
∥
2
2
=
arg
min
d
∑
i
∥
x
(
i
)
−
d
T
x
(
i
)
d
∥
2
2
=
arg
min
d
∑
i
∥
x
(
i
)
−
x
(
i
)
T
d
d
∥
2
2
subject to
∥
d
∥
2
=
1
=
arg
min
d
∥
X
−
X
d
d
T
∥
F
2
subject to
d
T
d
=
1
\begin{matrix} \mathbf{d}^*&=&\arg\min\limits_{\mathbf{d}}\sum\limits_i\|\mathbf{x}^{(i)}-\mathbf{dd}^T\mathbf{x}^{(i)}\|_2^2&&\\ &=&\arg\min\limits_{\mathbf{d}}\sum\limits_i\|\mathbf{x}^{(i)}-\mathbf{d}^T\mathbf{x}^{(i)}\mathbf{d}\|_2^2&&\\ &=&\arg\min\limits_{\mathbf{d}}\sum\limits_i\|\mathbf{x}^{(i)}-\mathbf{x}^{(i)T}\mathbf{dd}\|_2^2&\text{subject to}&\|\mathbf{d}\|_2=1\\ &=&\arg\min\limits_{\mathbf{d}}\|\mathbf{X}-\mathbf{Xdd}^T\|_F^2&\text{subject to}&\mathbf{d}^T\mathbf{d}=1 \end{matrix}
d∗====argdmini∑∥x(i)−ddTx(i)∥22argdmini∑∥x(i)−dTx(i)d∥22argdmini∑∥x(i)−x(i)Tdd∥22argdmin∥X−XddT∥F2subject tosubject to∥d∥2=1dTd=1
Solving the optimization problem of
d
\mathbf{d}
d
arg
min
d
∥
X
−
X
d
d
T
∥
F
2
=
arg
min
d
Tr
(
(
X
−
X
d
d
T
)
T
(
X
−
X
d
d
T
)
)
=
arg
min
d
Tr
(
X
T
X
−
X
T
X
d
d
T
−
d
d
T
X
T
X
+
d
d
T
X
T
X
d
d
T
)
=
arg
min
d
Tr
(
X
T
X
)
−
Tr
(
X
T
X
d
d
T
)
−
Tr
(
d
d
T
X
T
X
)
+
Tr
(
d
d
T
X
T
X
d
d
T
)
=
arg
min
d
−
Tr
(
X
T
X
d
d
T
)
−
Tr
(
d
d
T
X
T
X
)
+
Tr
(
d
d
T
X
T
X
d
d
T
)
=
arg
min
d
−
2
Tr
(
X
T
X
d
d
T
)
+
Tr
(
d
d
T
X
T
X
d
d
T
)
=
arg
min
d
−
2
Tr
(
X
T
X
d
d
T
)
+
Tr
(
X
T
X
d
d
d
d
T
)
\begin{matrix} \arg\min\limits_{\mathbf{d}}\|\mathbf{X}-\mathbf{Xdd}^T\|_F^2&=&\arg\min\limits_{\mathbf{d}}\text{Tr}\big((\mathbf{X}-\mathbf{Xdd}^T)^T(\mathbf{X}-\mathbf{Xdd}^T)\big)\\ &=&\arg\min\limits_{\mathbf{d}}\text{Tr}(\mathbf{X}^T\mathbf{X}-\mathbf{X}^T\mathbf{Xdd}^T-\mathbf{dd}^T\mathbf{X}^T\mathbf{X}+\mathbf{dd}^T\mathbf{X}^T\mathbf{Xdd}^T)\\ &=&\arg\min\limits_{\mathbf{d}}\text{Tr}(\mathbf{X}^T\mathbf{X})-\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)-\text{Tr}(\mathbf{dd}^T\mathbf{X}^T\mathbf{X})+\text{Tr}(\mathbf{dd}^T\mathbf{X}^T\mathbf{Xdd}^T)\\ &=&\arg\min\limits_{\mathbf{d}}-\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)-\text{Tr}(\mathbf{dd}^T\mathbf{X}^T\mathbf{X})+\text{Tr}(\mathbf{dd}^T\mathbf{X}^T\mathbf{Xdd}^T)\\ &=&\arg\min\limits_{\mathbf{d}}-2\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)+\text{Tr}(\mathbf{dd}^T\mathbf{X}^T\mathbf{Xdd}^T)\\ &=&\arg\min\limits_{\mathbf{d}}-2\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)+\text{Tr}(\mathbf{X}^T\mathbf{Xdd}\mathbf{dd}^T)\\ \end{matrix}
argdmin∥X−XddT∥F2======argdminTr((X−XddT)T(X−XddT))argdminTr(XTX−XTXddT−ddTXTX+ddTXTXddT)argdminTr(XTX)−Tr(XTXddT)−Tr(ddTXTX)+Tr(ddTXTXddT)argdmin−Tr(XTXddT)−Tr(ddTXTX)+Tr(ddTXTXddT)argdmin−2Tr(XTXddT)+Tr(ddTXTXddT)argdmin−2Tr(XTXddT)+Tr(XTXddddT)
Compute with constraint condition
arg
min
d
−
2
Tr
(
X
T
X
d
d
T
)
+
Tr
(
X
T
X
d
d
T
d
d
T
)
subject to
d
T
d
=
1
=
arg
min
d
−
2
Tr
(
X
T
X
d
d
T
)
+
Tr
(
X
T
X
d
d
T
)
subject to
d
T
d
=
1
=
arg
min
d
−
Tr
(
X
T
X
d
d
T
)
subject to
d
T
d
=
1
=
arg
max
d
Tr
(
X
T
X
d
d
T
)
subject to
d
T
d
=
1
=
arg
max
d
Tr
(
d
T
X
T
X
d
)
subject to
d
T
d
=
1
\begin{matrix} &\arg\min\limits_{\mathbf{d}}-2\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)+\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T\mathbf{dd}^T)&\text{subject to}&\mathbf{d}^T\mathbf{d}=1\\ =&\arg\min\limits_{\mathbf{d}}-2\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)+\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)&\text{subject to}&\mathbf{d}^T\mathbf{d}=1\\ =&\arg\min\limits_{\mathbf{d}}-\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)&\text{subject to}&\mathbf{d}^T\mathbf{d}=1\\ =&\arg\max\limits_{\mathbf{d}}\text{Tr}(\mathbf{X}^T\mathbf{Xdd}^T)&\text{subject to}&\mathbf{d}^T\mathbf{d}=1\\ =&\arg\max\limits_{\mathbf{d}}\text{Tr}(\mathbf{d}^T\mathbf{X}^T\mathbf{Xd})&\text{subject to}&\mathbf{d}^T\mathbf{d}=1\\ \end{matrix}
====argdmin−2Tr(XTXddT)+Tr(XTXddTddT)argdmin−2Tr(XTXddT)+Tr(XTXddT)argdmin−Tr(XTXddT)argdmaxTr(XTXddT)argdmaxTr(dTXTXd)subject tosubject tosubject tosubject tosubject todTd=1dTd=1dTd=1dTd=1dTd=1
Set
X
T
X
\mathbf{X}^T\mathbf{X}
XTX as
A
\mathbf{A}
A, then the optimal
d
\mathbf{d}
d is given by the eigenvector of
A
\mathbf{A}
A corresponding to the largest eigenvalue