深度学习（3）手写数字识别问题

最新推荐文章于 2023-10-29 00:05:00 发布

炎武丶航

最新推荐文章于 2023-10-29 00:05:00 发布

阅读量1.4k

点赞数

分类专栏：深度学习文章标签：深度学习 tensorflow

本文链接：https://blog.csdn.net/weixin_43360025/article/details/119477881

版权

深度学习专栏收录该内容

125 篇文章 52 订阅

订阅专栏

深度学习（3）手写数字识别问题

1. 问题归类
2. 数据集
3. Image
4. Input and Output
5. Regression VS Classification
6. Computation Graph
7. 两个问题
8. Particularly
9. 如何训练模型？ $\to$ Loss
10. 总结
11. Deep Learning?
12. Classification Procedure
13. We need TensorFlow
14. Next

1. 问题归类

Discrete Prediction（离散值的预测）

$y = w * x + b$
[up, left, down, right]
[dog, cat, whale, bird, …]
手写数字识别问题属于离散值的预测。

2. 数据集

MNIST
- 7000 images per category
- train/test splitting: 60k vs 10k

3. Image

[28, 28, 1]
图片是由28行×28列，共784个像素点组成，[0, 255]代表图片像素的灰度值，其中0代表纯白色，255代表纯黑色，1代表每个像素点的灰度值，也就是每个像素点只有1个维度，就是其灰度值。
$\to$ [784]
将28×28的数据变为一维，将第二行的像素点拼接到一行后，后面26行同理，这样一张图片就变为了拥有784个元素的一维数据。

4. Input and Output

(1) 输入
$x : [b, 784]$
输入是[b, 784]，b可以理解为共有多少张图片，784表示每张图片有784个像素点。
(2) 编码方式

dog=0, cat=1, fish=2, …
缺点: 不确定性高，例如要是预测值为1.5，就会产生判断失误。
dog = [1, 0, 0, …]，其中“1”表示该预测值为“dog”的概率，“0”表示该预测值为“cat”的概率，…，这些概率和为1。
cat = [0, 1, 0, …]
fish = [0, 0, 1, …]

这种编码方式被称为one-hot编码。

5. Regression VS Classification

(1) 模型

$y = x * x + b$
$y∈R^d$

(2) 输出

$o u t = X @ W + b$
$o u t : [0.1, 0.8, 0.02, 0.08]$

(3) 预测

$p r e d = a r g m a x (o u t)$
- $p r e d : 1$
- $l a b e l : 2$

6. Computation Graph

$o u t = X @ W + b$
$X : [b, 784]$
$W : [784, 10]$
$b : [10]$

7. 两个问题

(1) It’s Linear!

$o u t = X @ W + b$
$\to$
$o u t = f (X @ W + b)$

在这里插入图片描述

$o u t = r e l u (X @ W + b)$

(2) It’s too simple!

$o u t = r e l u (X @ W + b)$
$\to$
$h_1=relu(X@W_1+b_1)$
$h_2=relu(h_1@W_2+b_2)$
$out=relu(h_2@W_3+b_3)$

8. Particularly

(1) $X:[v_1,v_2,…,v_784]$

X:[1,784]

(2) $h_1=relu(X@W_1+b_1)$

W_1:[784,512]
$\to [1,784]@[784,512]+[512]=[1,512]+[512]=[1,512]$
$b_1:[1,512]$

(3) $h_2=relu(h_1@W_2+b_2)$

W_2:[512,256]
$\to [1,512]@[512,256]+[256]=[1,256]+[256]=[1,256]$
$b_2:[1,256]$

(4) $out=relu(h_2@W_3+b_3)$

W_3:[256,10]
$\to [1,256]@[256,10]+[10]=[1,10]+[10]=[1,10]$
$b_3:[1,10]$
从以上计算过程可以看出，神经网络其实是一个降维的过程，图片由原来的 $[1, 784]$ 降为 $[1, 512]$ ，再降为 $[1, 256]$ ，最后降为 $[1, 10]$ 。

$\to [0,0,0.01,0.1,0.8,0,…]$
根据以上输出可以判断这张图片为“5”的概率最大，所以这张图片的预测值为“5”。

9. 如何训练模型？ $\to$ Loss

out:[1,10]
$\to$
Y/label: 0~9
- eg.: 1 $\to$ [0,1,0,0,0,0,0,0,0,0]
- eg.: 3 $\to$ [0,0,0,1,0,0,0,0,0,0]

在这里插入图片描述
$\to$

Euclidean Distance（欧式距离）: $\to Label$
- MSE，即 $\sum(y-out)^2$

10. 总结

$out=relu\{relu\{relu[X@W_1+b_1]@W_2+b_2\}@W_3+b_3\}$
$p r e d = a r g m a x (o u t)$
$l o s s = M S E (o u t, l a b e l)$
$m i n i m i z e l o s s$
- $W_1',b_1',W_2',b_2',W_3',b_3']$

11. Deep Learning?

We have not seen it.
But we already master it.
We will show you It’s（almost）Deep Learning!

12. Classification Procedure

Step1. Compute $h_1,h_2,out$
Step2. Compute $L o s s$
Step3. Compute gradient and update $W_1',b_1',W_2',b_2',W_3',b_3']$
Step4. Loop

13. We need TensorFlow

数据量庞大;
TensorFlow计算和处理更快。

14. Next

Step1. have fun on MNIST classification
Step2. and we learn TensorFlow
Step3. and we implement Step1. by ourselves!

参考文献:
[1] 龙良曲:《深度学习与TensorFlow2入门实战》

炎武丶航

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
深度学习（3）手写数字识别问题

深度学习（3）手写数字识别问题1. 问题归类2. 数据集3. Image4. Input and Output5. Regression VS Classification6. Computation Graph7. 两个问题8. Particularly9. 如何训练模型？→\to→ Loss10. 总结11. Deep Learning?12. Classification Procedure13. We need TensorFlow14. Next1. 问题归类Discrete Predicti
复制链接

扫一扫