手写数字识别
大致思路
X
X
X:[1, 784], 灰度图像[28,28],打平之后得到[1, 784]
W
W
W:[784, 10]
b
b
b:[10]
o
u
t
=
X
W
+
b
out=XW+b
out=XW+b[1, 784]
⋅
\cdot
⋅ [784, 10] + [10] → [1, 10]
线性分类方法能解决这个问题吗?
灰度图像打平之后是个高维数据,线性模型很难完成这个问题
现在在线性分类的式子中加入ReLU激活函数
o
u
t
=
f
(
X
W
+
b
)
out=f(XW+b)
out=f(XW+b) 即
o
u
t
=
R
e
L
U
(
X
W
+
b
)
out=ReLU(XW+b)
out=ReLU(XW+b)
ReLU激活函数:
这样能解决手写数字识别的问题吗?还是不行!
那就多加几层,怎么加呢?往下看:
第一层(输入层):
h
1
=
r
e
l
u
(
X
W
1
+
b
1
)
h_1=relu(XW_1+b_1)
h1=relu(XW1+b1)第二层(隐藏层):
h
2
=
r
e
l
u
(
h
1
W
2
+
b
2
)
h_2=relu(h_1W_2+b_2)
h2=relu(h1W2+b2)第三层(输出层):
o
u
t
=
r
e
l
u
(
h
2
W
3
+
b
3
)
out=relu(h_2W_3+b_3)
out=relu(h2W3+b3)
这样就能解决上述的手写数字识别问题了
现在参照上面的流程来大致理一下手写数字识别的过程:
- 样本数据:
X : [ v 1 , v 2 , v 3 , . . . , v 784 ] X : [v_1, v_2, v_3, ..., v_{784}] X:[v1,v2,v3,...,v784] X ← [ 1 , 784 ] X ← [1, 784] X←[1,784] - 第一层(输入层):
h 1 = r e l u ( X W 1 + b 1 ) h_1=relu(XW_1+b_1) h1=relu(XW1+b1) [ 1 , 512 ] ← [ 1 , 784 ] ⋅ [ 784 , 512 ] + [ 512 ] [1, 512] ← [1, 784]\cdot [784, 512] + [512] [1,512]←[1,784]⋅[784,512]+[512] - 第二层(隐藏层):
h 2 = r e l u ( h 1 W 2 + b 2 ) h_2=relu(h_1W_2+b_2) h2=relu(h1W2+b2) [ 1 , 256 ] ← [ 1 , 512 ] ⋅ [ 512 , 256 ] + [ 256 ] [1, 256] ← [1, 512]\cdot [512, 256] + [256] [1,256]←[1,512]⋅[512,256]+[256] - 第三层(输出层):
o u t = r e l u ( h 2 W 3 + b 3 ) ) out=relu(h_2W_3+b_3)) out=relu(h2W3+b3)) [ 1 , 10 ] ← [ 1 , 256 ] ⋅ [ 256 , 10 ] + [ 10 ] [1, 10] ← [1, 256]\cdot [256, 10] + [10] [1,10]←[1,256]⋅[256,10]+[10]
那如何训练这个模型呢?这就要考虑损失函数
l
o
s
s
loss
loss 的问题
用欧式距离计算
o
u
t
out
out 和
L
a
b
e
l
Label
Label 之间的距离,即,真实值
y
y
y 与
o
u
t
out
out 之间的距离平方和记为
l
o
s
s
loss
loss
l
o
s
s
=
∑
(
y
−
o
u
t
)
2
loss=\sum(y-out)^2
loss=∑(y−out)2
那么现在总结一下上述的过程:
第一步:计算
h
1
,
h
2
,
o
u
t
,
p
r
e
d
h_1,h_2, out, pred
h1,h2,out,pred
o
u
t
=
r
e
l
u
{
r
e
l
u
{
r
e
l
u
[
X
W
1
+
b
1
]
W
2
+
b
2
}
W
3
+
b
3
}
out=relu\{relu\{relu[XW_1+b_1]W_2+b_2\}W_3+b_3\}
out=relu{relu{relu[XW1+b1]W2+b2}W3+b3}
p
r
e
d
=
a
r
g
m
a
x
(
o
u
t
)
pred=argmax(out)
pred=argmax(out)第二步:计算损失函数
l
o
s
s
=
M
S
E
(
o
u
t
,
l
a
b
e
l
)
loss=MSE(out, label)
loss=MSE(out,label)第三步:计算梯度和更新
用第二步中计算得到的
l
o
s
s
loss
loss 来优化
W
1
,
b
1
,
W
2
,
b
2
,
W
3
,
b
3
W_1,b_1,W_2,b_2,W_3, b_3
W1,b1,W2,b2,W3,b3,目的是使
l
o
s
s
loss
loss 最小化,求得参数
W
1
′
,
b
1
′
,
W
2
′
,
b
2
′
,
W
3
′
,
b
3
′
W'_1,b'_1,W'_2,b'_2,W'_3, b'_3
W1′,b1′,W2′,b2′,W3′,b3′,最终使得
o
u
t
out
out 更接近真实的
y
y
y
import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, optimizers, datasets
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
(x, y), (x_val, y_val) = datasets.mnist.load_data()
print(x.shape, y.shape, x_val.shape, y_val.shape)
# (60000, 28, 28) (60000,) (10000, 28, 28) (10000,)
print(type(x), type(y), type(x_val), type(y_val))
# <class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.ndarray'>
x = tf.convert_to_tensor(x, dtype=tf.float32) / 255. # 将数据从numpy转换为tensor格式,使用tf进行处理
y = tf.convert_to_tensor(y, dtype=tf.int32)
print(type(x), type(y))
# <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'>
print(x.shape, y.shape)
# (60000, 28, 28) (60000,)
y = tf.one_hot(y, depth=10)
print(x.shape, y.shape)
# (60000, 28, 28) (60000, 10)
train_dataset = tf.data.Dataset.from_tensor_slices((x, y))
train_dataset = train_dataset.batch(200) # batch=200
# 784 --> 512 --> 256 --> 10
model = keras.Sequential([
layers.Dense(512, activation='relu'), # 784 --> 512
layers.Dense(256, activation='relu'), # 512 --> 256
layers.Dense(10) # 256 --> 10
])
optimizer = optimizers.SGD(learning_rate=0.001)
def train_epoch(epoch): # epoch对数据集迭代一次, step对epoch迭代一次
# Step4.loop
for step, (x, y) in enumerate(train_dataset):
with tf.GradientTape() as tape:
# [b, 28, 28] => [b, 784]
x = tf.reshape(x, (-1, 28 * 28))
# Step1. compute output
# [b, 784] => [b, 10]
out = model(x)
# Step2. compute loss
loss = tf.reduce_sum(tf.square(out - y)) / x.shape[0]
# Step3. optimize and update w1, w2, w3, b1, b2, b3
grads = tape.gradient(loss, model.trainable_variables) # 自动求导
# w' = w - lr * grad
optimizer.apply_gradients(zip(grads, model.trainable_variables)) # 更新
if step % 100 == 0:
print(epoch, step, 'loss:', loss.numpy())
def train():
for epoch in range(30):
train_epoch(epoch)
if __name__ == '__main__':
train()