前言
本文为7月25日深度学习笔记,分为三个章节:
- How to do Classification;
- Logistic Regression;
- Intro to Deep Learning.
一、How to do Classification
1、Two Classes
蓝球来自于
B
1
B_1
B1的概率:
P
(
B
1
∣
B
l
u
e
)
=
P
(
B
l
u
e
∣
B
1
)
P
(
B
1
)
P
(
B
l
u
e
∣
B
1
)
P
(
B
1
)
+
P
(
B
l
u
e
∣
B
2
)
P
(
B
2
)
P(B_1|Blue) = \frac{P(Blue|B_1) P(B_1)}{P(Blue|B_1) P(B_1) + P(Blue|B_2) P(B_2)}
P(B1∣Blue)=P(Blue∣B1)P(B1)+P(Blue∣B2)P(B2)P(Blue∣B1)P(B1)
Given an
x
x
x, which class does it belong to:
P
(
C
1
∣
x
)
=
P
(
x
∣
C
1
)
P
(
C
1
)
P
(
x
∣
C
1
)
P
(
C
1
)
+
P
(
x
∣
C
2
)
P
(
C
2
)
P(C_1|x) = \frac{P(x|C_1) P(C_1)}{P(x|C_1) P(C_1) + P(x|C_2) P(C_2)}
P(C1∣x)=P(x∣C1)P(C1)+P(x∣C2)P(C2)P(x∣C1)P(C1)
2、Gaussian Distribution
f μ , ∑ ( x ) = 1 ( 2 π D / 2 ) 1 ∣ ∑ ∣ 1 / 2 e x p { − 1 2 ( x − μ ) T ∑ − 1 ( x − μ ) } f_{\mu, \sum}(x) = \frac{1}{(2\pi ^{D/2})} \frac{1}{|\sum|^{1/2}} exp\{-\frac {1}{2}(x-\mu)^T \textstyle \sum^{-1}(x-\mu) \} fμ,∑(x)=(2πD/2)1∣∑∣1/21exp{−21(x−μ)T∑−1(x−μ)}
-
input: vector x;
-
output: probability of sampling x.
Where μ \mu μ is the mean, ∑ \sum ∑ is the covariance matirx(协方差矩阵).
3、Maximum Likelihood
L ( μ , ∑ ) = f μ , ∑ ( x 1 ) f μ , ∑ ( x 2 ) … f μ , ∑ ( x n ) μ ∗ , ∑ ∗ = a r g m i n L ( μ , ∑ ) μ ∗ = 1 79 ∑ n = 1 n x n ∑ ∗ = 1 79 ∑ n = 1 n ( x n − μ ∗ ) ( x n − μ ∗ ) T L(\mu, \sum) = f_{\mu, \sum}(x^1) f_{\mu, \sum}(x^2) … f_{\mu, \sum}(x^n)\\ \mu*, \sum* = arg\ min\ L(\mu, \sum)\\ \mu* = \frac{1}{79}\sum_{n=1}^{n}x^n \quad \sum* = \frac{1}{79}\sum_{n=1}^{n}(x^n - \mu*)(x^n - \mu*)^T L(μ,∑)=fμ,∑(x1)fμ,∑(x2)…fμ,∑(xn)μ∗,∑∗=arg min L(μ,∑)μ∗=791n=1∑nxn∑∗=791n=1∑n(xn−μ∗)(xn−μ∗)T
P ( C 1 ∣ x ) = P ( x ∣ C 1 ) P ( C 1 ) P ( x ∣ C 1 ) P ( C 1 ) + P ( x ∣ C 2 ) P ( C 2 ) P ( x ∣ C 1 ) = f μ 1 , ∑ 1 ( x ) = 1 ( 2 π D / 2 ) 1 ∣ ∑ 1 ∣ 1 / 2 e x p { − 1 2 ( x − μ 1 ) T ( ∑ 1 ) − 1 ( x − μ 1 ) } P ( C 1 ) = 0.56 P ( x ∣ C 2 ) = f μ 2 , ∑ 2 ( x ) P ( C 2 ) = 0.44 P(C_1|x) = \frac{P(x|C_1) P(C_1)}{P(x|C_1) P(C_1) + P(x|C_2) P(C_2)}\\ P(x|C_1) = f_{\mu^1, \sum^1}(x) = \frac{1}{(2\pi ^{D/2})} \frac{1}{|\sum^1|^{1/2}} exp\{-\frac {1}{2}(x-\mu^1)^T \textstyle (\sum^1)^{-1}(x-\mu^1) \}\\ P(C_1) = 0.56\\ P(x|C_2) = f_{\mu^2, \sum^2}(x)\\ P(C_2) = 0.44 P(C1∣x)=P(x∣C1)P(C1)+P(x∣C2)P(C2)P(x∣C1)P(C1)P(x∣C1)=fμ1,∑1(x)=(2πD/2)1∣∑1∣1/21exp{−21(x−μ1)T(∑1)−1(x−μ1)}P(C1)=0.56P(x∣C2)=fμ2,∑2(x)P(C2)=0.44
4、Modyfying Model
使两个 class 有同样的方差
Σ
\Sigma
Σ.
L
(
μ
1
,
μ
2
,
Σ
)
=
f
μ
1
,
Σ
(
x
1
)
f
μ
1
,
Σ
(
x
2
)
…
f
μ
1
,
Σ
(
x
7
9
)
×
f
μ
2
,
Σ
(
x
8
0
)
f
μ
2
,
Σ
(
x
8
1
)
…
f
μ
2
,
Σ
(
x
1
40
)
L(\mu^1, \mu^2, \Sigma) = f_{\mu^1, \Sigma}(x^1)f_{\mu^1, \Sigma}(x^2) … f_{\mu^1, \Sigma}(x^79)\times f_{\mu^2, \Sigma}(x^80) f_{\mu^2, \Sigma}(x^81) … f_{\mu^2, \Sigma}(x^140)
L(μ1,μ2,Σ)=fμ1,Σ(x1)fμ1,Σ(x2)…fμ1,Σ(x79)×fμ2,Σ(x80)fμ2,Σ(x81)…fμ2,Σ(x140)
P
(
C
1
∣
x
)
=
P
(
x
∣
C
1
)
P
(
C
1
)
P
(
x
∣
C
1
)
P
(
C
1
)
+
P
(
x
∣
C
2
)
P
(
C
2
)
=
1
1
+
P
(
x
∣
C
2
)
P
(
C
2
)
P
(
x
∣
C
1
)
P
(
C
1
)
令
z
=
l
n
P
(
x
∣
C
1
)
P
(
C
1
)
P
(
x
∣
C
2
)
P
(
C
2
)
P
(
C
1
∣
x
)
=
1
1
+
e
x
p
(
−
z
)
=
σ
(
z
)
(
S
i
g
m
o
i
d
f
u
n
c
t
i
o
n
)
=
σ
(
w
⋅
x
+
b
)
P(C_1|x) = \frac{P(x|C_1) P(C_1)}{P(x|C_1) P(C_1) + P(x|C_2) P(C_2)} = \frac{1}{1+\frac{P(x|C_2) P(C_2)}{P(x|C_1) P(C_1)}}\\ 令\ z=ln\frac{P(x|C_1) P(C_1)}{P(x|C_2) P(C_2)}\\ P(C_1|x) = \frac{1}{1+exp(-z)} = \sigma (z)(Sigmoid function) = \sigma (w\cdot x + b)
P(C1∣x)=P(x∣C1)P(C1)+P(x∣C2)P(C2)P(x∣C1)P(C1)=1+P(x∣C1)P(C1)P(x∣C2)P(C2)1令 z=lnP(x∣C2)P(C2)P(x∣C1)P(C1)P(C1∣x)=1+exp(−z)1=σ(z)(Sigmoidfunction)=σ(w⋅x+b)
二、Logistic Regression
1、Step 1: Function Set
2、Goodness of a Function
(1)、Cross Entropy
H
(
p
,
q
)
=
−
∑
x
p
(
x
)
l
n
(
q
(
x
)
)
=
∑
n
−
[
y
^
n
l
n
f
w
,
b
(
x
n
)
+
(
1
−
y
^
n
)
l
n
(
1
−
f
w
,
b
(
x
n
)
)
]
H(p, q) = -\sum_{x} p(x)ln(q(x)) = \sum_n -[\hat{y}^n lnf_{w, b}(x^n) + (1-\hat{y}^n) ln(1 - f_{w, b}(x^n))]
H(p,q)=−x∑p(x)ln(q(x))=n∑−[y^nlnfw,b(xn)+(1−y^n)ln(1−fw,b(xn))]
3、Find the best function
∂ l n L ( w , b ) ∂ w i − l n L ( w , b ) = ∂ l n f w , b ( x ) ∂ z ∂ z ∂ w i \frac{\partial lnL(w, b)}{\partial w_i} -lnL(w, b) = \frac{\partial lnf_{w, b}(x)}{\partial z} \frac{\partial z}{\partial w_i} ∂wi∂lnL(w,b)−lnL(w,b)=∂z∂lnfw,b(x)∂wi∂z
(1)、Cross Entropy vs. Squarre Error
4、Multi-class Classification
- Softmax:
y n = e z n ∑ j = 1 n e z j y_n = \frac{e^{z_n}}{\sum_{j=1}^{n} e^{z_j}} yn=∑j=1nezjezn
三、Intro to Deep Learning
1、Step 1: Neural Network
2、Step 2: Goodness of function
- Total Loss: