深度学习(3)手写数字识别问题
1. 问题归类
Discrete Prediction(离散值的预测)
- y = w ∗ x + b y=w*x+b y=w∗x+b
- [up, left, down, right]
- [dog, cat, whale, bird, …]
手写数字识别问题属于离散值的预测。
2. 数据集
- MNIST
- 7000 images per category
- train/test splitting: 60k vs 10k
3. Image
- [28, 28, 1]
图片是由28行×28列,共784个像素点组成,[0, 255]代表图片像素的灰度值,其中0代表纯白色,255代表纯黑色,1代表每个像素点的灰度值,也就是每个像素点只有1个维度,就是其灰度值。 -
→
\to
→ [784]
将28×28的数据变为一维,将第二行的像素点拼接到一行后,后面26行同理,这样一张图片就变为了拥有784个元素的一维数据。
4. Input and Output
(1) 输入
x
:
[
b
,
784
]
x:[b,784]
x:[b,784]
输入是[b, 784],b可以理解为共有多少张图片,784表示每张图片有784个像素点。
(2) 编码方式
- dog=0, cat=1, fish=2, …
缺点: 不确定性高,例如要是预测值为1.5,就会产生判断失误。 - dog = [1, 0, 0, …],其中“1”表示该预测值为“dog”的概率,“0”表示该预测值为“cat”的概率,…,这些概率和为1。
cat = [0, 1, 0, …]
fish = [0, 0, 1, …]
这种编码方式被称为one-hot编码。
5. Regression VS Classification
(1) 模型
- y = x ∗ x + b y=x*x+b y=x∗x+b
- y ∈ R d y∈R^d y∈Rd
(2) 输出
- o u t = X @ W + b out=X@W+b out=X@W+b
- o u t : [ 0.1 , 0.8 , 0.02 , 0.08 ] out:[0.1,0.8,0.02,0.08] out:[0.1,0.8,0.02,0.08]
(3) 预测
-
p
r
e
d
=
a
r
g
m
a
x
(
o
u
t
)
pred=argmax(out)
pred=argmax(out)
- p r e d : 1 pred:1 pred:1
- l a b e l : 2 label:2 label:2
6. Computation Graph
- o u t = X @ W + b out=X@W+b out=X@W+b
- X : [ b , 784 ] X:[b,784] X:[b,784]
- W : [ 784 , 10 ] W:[784,10] W:[784,10]
- b : [ 10 ] b:[10] b:[10]
7. 两个问题
(1) It’s Linear!
- o u t = X @ W + b out=X@W+b out=X@W+b
- → \to →
- o u t = f ( X @ W + b ) out=f(X@W+b) out=f(X@W+b)
- o u t = r e l u ( X @ W + b ) out=relu(X@W+b) out=relu(X@W+b)
(2) It’s too simple!
- o u t = r e l u ( X @ W + b ) out=relu(X@W+b) out=relu(X@W+b)
- → \to →
- h 1 = r e l u ( X @ W 1 + b 1 ) h_1=relu(X@W_1+b_1) h1=relu(X@W1+b1)
- h 2 = r e l u ( h 1 @ W 2 + b 2 ) h_2=relu(h_1@W_2+b_2) h2=relu(h1@W2+b2)
- o u t = r e l u ( h 2 @ W 3 + b 3 ) out=relu(h_2@W_3+b_3) out=relu(h2@W3+b3)
8. Particularly
(1) X : [ v 1 , v 2 , … , v 7 84 ] X:[v_1,v_2,…,v_784] X:[v1,v2,…,v784]
- X:[1,784]
(2) h 1 = r e l u ( X @ W 1 + b 1 ) h_1=relu(X@W_1+b_1) h1=relu(X@W1+b1)
- W_1:[784,512]
→ [ 1 , 784 ] @ [ 784 , 512 ] + [ 512 ] = [ 1 , 512 ] + [ 512 ] = [ 1 , 512 ] \to [1,784]@[784,512]+[512]=[1,512]+[512]=[1,512] →[1,784]@[784,512]+[512]=[1,512]+[512]=[1,512] - b 1 : [ 1 , 512 ] b_1:[1,512] b1:[1,512]
(3) h 2 = r e l u ( h 1 @ W 2 + b 2 ) h_2=relu(h_1@W_2+b_2) h2=relu(h1@W2+b2)
- W_2:[512,256]
→ [ 1 , 512 ] @ [ 512 , 256 ] + [ 256 ] = [ 1 , 256 ] + [ 256 ] = [ 1 , 256 ] \to [1,512]@[512,256]+[256]=[1,256]+[256]=[1,256] →[1,512]@[512,256]+[256]=[1,256]+[256]=[1,256] - b 2 : [ 1 , 256 ] b_2:[1,256] b2:[1,256]
(4) o u t = r e l u ( h 2 @ W 3 + b 3 ) out=relu(h_2@W_3+b_3) out=relu(h2@W3+b3)
- W_3:[256,10]
→ [ 1 , 256 ] @ [ 256 , 10 ] + [ 10 ] = [ 1 , 10 ] + [ 10 ] = [ 1 , 10 ] \to [1,256]@[256,10]+[10]=[1,10]+[10]=[1,10] →[1,256]@[256,10]+[10]=[1,10]+[10]=[1,10] -
b
3
:
[
1
,
10
]
b_3:[1,10]
b3:[1,10]
从以上计算过程可以看出,神经网络其实是一个降维的过程,图片由原来的 [ 1 , 784 ] [1,784] [1,784]降为 [ 1 , 512 ] [1,512] [1,512],再降为 [ 1 , 256 ] [1,256] [1,256],最后降为 [ 1 , 10 ] [1,10] [1,10]。
→ [ 0 , 0 , 0.01 , 0.1 , 0.8 , 0 , … ] \to [0,0,0.01,0.1,0.8,0,…] →[0,0,0.01,0.1,0.8,0,…]
根据以上输出可以判断这张图片为“5”的概率最大,所以这张图片的预测值为“5”。
9. 如何训练模型? → \to → Loss
- out:[1,10]
→ \to → - Y/label: 0~9
- eg.: 1 → \to → [0,1,0,0,0,0,0,0,0,0]
- eg.: 3 → \to → [0,0,0,1,0,0,0,0,0,0]
→
\to
→
- Euclidean Distance(欧式距离):
o
u
t
→
L
a
b
e
l
out \to Label
out→Label
- MSE,即 ∑ ( y − o u t ) 2 \sum(y-out)^2 ∑(y−out)2
10. 总结
- o u t = r e l u { r e l u { r e l u [ X @ W 1 + b 1 ] @ W 2 + b 2 } @ W 3 + b 3 } out=relu\{relu\{relu[X@W_1+b_1]@W_2+b_2\}@W_3+b_3\} out=relu{relu{relu[X@W1+b1]@W2+b2}@W3+b3}
- p r e d = a r g m a x ( o u t ) pred=argmax(out) pred=argmax(out)
- l o s s = M S E ( o u t , l a b e l ) loss=MSE(out,label) loss=MSE(out,label)
-
m
i
n
i
m
i
z
e
l
o
s
s
minimize loss
minimizeloss
- [ W 1 ′ , b 1 ′ , W 2 ′ , b 2 ′ , W 3 ′ , b 3 ′ ] [W_1',b_1',W_2',b_2',W_3',b_3'] [W1′,b1′,W2′,b2′,W3′,b3′]
11. Deep Learning?
- We have not seen it.
- But we already master it.
- We will show you It’s(almost)Deep Learning!
12. Classification Procedure
- Step1. Compute h 1 , h 2 , o u t h_1,h_2,out h1,h2,out
- Step2. Compute L o s s Loss Loss
- Step3. Compute gradient and update [ W 1 ′ , b 1 ′ , W 2 ′ , b 2 ′ , W 3 ′ , b 3 ′ ] [W_1',b_1',W_2',b_2',W_3',b_3'] [W1′,b1′,W2′,b2′,W3′,b3′]
- Step4. Loop
13. We need TensorFlow
- 数据量庞大;
- TensorFlow计算和处理更快。
14. Next
- Step1. have fun on MNIST classification
- Step2. and we learn TensorFlow
- Step3. and we implement Step1. by ourselves!
参考文献:
[1] 龙良曲:《深度学习与TensorFlow2入门实战》