前言
这个笔记是北大那位老师课程的学习笔记,讲的概念浅显易懂,非常有利于我们掌握基本的概念,从而掌握相关的技术。
basic concepts
e
x
p
(
−
t
z
1
2
)
=
∫
e
x
p
(
−
t
u
z
)
d
F
(
u
)
exp(-tz^{\frac{1}{2}}) =\int exp(-tuz) dF(u)
exp(−tz21)=∫exp(−tuz)dF(u)
z
=
∣
∣
x
∣
∣
2
z=||x||^2
z=∣∣x∣∣2
e
x
p
(
−
t
∣
∣
x
∣
∣
)
,
e
x
p
(
−
t
∣
∣
x
∣
∣
)
.
exp(-t||x||),exp(-t||x||).
exp(−t∣∣x∣∣),exp(−t∣∣x∣∣).
The product of P.D is P.D
eul distance transformed into another space to get the distance.
∣
∣
ϕ
(
x
)
−
ϕ
(
y
)
∣
∣
2
2
||\phi(x)-\phi(y)||^2_2
∣∣ϕ(x)−ϕ(y)∣∣22
Part2 unsuperrised learning
CB dimensionlity reduction.
PCA(Principal Component Analysis)
Population PCA
Def. if
x
‾
⊂
R
p
i
s
a
r
a
n
d
o
m
v
e
c
t
o
r
,
w
i
t
h
m
e
a
n
:
u
a
n
d
c
o
v
a
r
i
a
n
c
e
m
a
t
r
i
x
σ
\overline x \subset R^p \quad is\quad a\quad random \quad vector, \quad with \quad mean:u \quad and \quad covariance \quad matrix \sigma
x⊂Rpisarandomvector,withmean:uandcovariancematrixσ
then the PCA is
x
‾
−
>
y
‾
=
U
t
(
x
−
u
)
\overline x-> \overline y=U^t(x-u)
x−>y=Ut(x−u)
when U is orthgonal.
Spectral Decompistion
Thm,
I
f
x
−
>
N
(
μ
,
σ
)
If x->N(\mu,\sigma)
Ifx−>N(μ,σ) Then,
y
N
(
0
,
n
)
y~N(0,n)
y N(0,n)
(2)
E
(
y
0
)
=
0
,
E(y_0)=0,
E(y0)=0,
(3)
C
o
v
(
Y
m
,
Y
i
)
=
0
f
o
r
i
!
=
j
Cov(Y_m,Y_i)=0 for i !=j
Cov(Ym,Yi)=0fori!=j
(4)
y
i
s
a
o
r
t
h
a
n
g
o
n
a
l
t
r
a
n
s
f
o
r
m
x
i
s
u
n
c
o
r
r
e
l
a
t
i
o
n
b
u
t
o
t
s
q
u
r
e
.
y \quad is\quad a \quad orthangonal \quad transform \quad x \quad is \quad uncorrelation \quad but \quad ot \quad squre.
yisaorthangonaltransformxisuncorrelationbutotsqure.
(5)
V
a
r
(
Y
i
)
=
σ
i
Var(Y_i)=\sigma_i
Var(Yi)=σi
Sample Principal Component
L e t X = [ x ‾ 1 . . . x ‾ n ] T b e a n ∗ p Let X=[\overline x_1 ...\overline x_n]^T be\quad a \quad n*p LetX=[x1...xn]Tbean∗p
sample data matrix
x
‾
=
1
n
∑
x
=
1
n
x
‾
i
,
\overline x=\frac{1}{n} \sum_{x=1}^n \overline x_i,
x=n1x=1∑nxi,
S
=
1
n
X
T
H
X
S=\frac{1}{n}X^THX
S=n1XTHX
H
:
I
n
=
1
n
I
n
I
n
H:I_n=\frac{1}{n}I_nI_n
H:In=n1InIn
reduce the data to k-dimension ,you get the first k element.
keep most information,PCA.suppos.
SVD
U
=
e
i
g
e
n
v
e
c
t
o
r
o
f
(
A
A
T
)
U=eigenvectorof(AA^T)
U=eigenvectorof(AAT)
D
=
A
A
T
D=\sqrt{AA^T}
D=AAT
V
=
e
i
g
e
n
v
e
c
t
o
r
(
A
T
A
)
V=eigenvector(A^TA)
V=eigenvector(ATA)
PCO(Principal Coordinate Analysis)
S
=
X
T
H
X
S=X^THX
S=XTHX
power equal : HH=H
B
=
H
X
X
T
H
B=HXX^TH
B=HXXTH
variance matrix
AB=BA
Non-zero eigenvector are equal.