自动求导机制
- 在机器学习中,梯度下降时候需要计算函数的导数,TensorFlow 提供了强大的自动求导机制来计算导数, 使用tf.GradientTape()自动求导;
- tf.Variable():定义变量,变量同样具有形状、类型和值三种属性。使用变量需要有一个初始化过程,可以通过在 tf.Variable() 中指定 initial_value 参数来指定初始值;
- 变量与普通张量的一个重要区别是其默认能够被 TensorFlow 的自动求导机制所求导,因此往往被用于定义机器学习模型的参数。
import tensorflow as tf
x = tf.Variable(initial_value=3.) # 定义一个变量,初始值为3
b = tf.Variable(initial_value=1.)
c = tf.constant([1.])
with tf.GradientTape() as tape: # 在 tf.GradientTape() 的上下文内,所有计算步骤都会被记录以用于求导
y = tf.square(x)+ b + c # y = x^2+b+c
y_grad = tape.gradient(y, [x,b,c]) # 计算y关于x,b的导数
print([y, y_grad])
[<tf.Tensor: id=4581884, shape=(1,), dtype=float32, numpy=array([11.], dtype=float32)>, [<tf.Tensor: id=4581898, shape=(), dtype=float32, numpy=6.0>, <tf.Tensor: id=4581893, shape=(), dtype=float32, numpy=1.0>, None]]
tf.GradientTape() 是一个自动求导的记录器,在其中的变量和计算步骤都会被自动记录。在上面的示例中,变量 x 和计算步骤 y = tf.square(x) 被自动记录,因此可以通过 y_grad = tape.gradient(y, x) 求张量 y 对变量 x 的导数。
一、数据获取及预处理: tf.keras.datasets
(train_data, train_label), (test_data, test_label) = tf.keras.datasets.fashion_mnist.load_data()
train_label.shape
(60000,)
train_label[0]
9
def conver(img, label):
img = tf.cast(img,tf.float32)/255.0
label = tf.cast(label,tf.int32)
return img, label
train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_label))
train_dataset = train_dataset.map(conver,num_parallel_calls=tf.data.experimental.AUTOTUNE)
train_dataset = train_dataset.shuffle(buffer_size=1024)
train_dataset = train_dataset.batch(256)
train_dataset = train_dataset.prefetch(tf.data.experimental.AUTOTUNE)
test_dataset= tf.data.Dataset.from_tensor_slices((test_data, test_label))
test_dataset = test_dataset.map(conver)
test_dataset = test_dataset.shuffle(buffer_size=1024)
test_dataset = test_dataset.batch(256)
test_dataset = test_dataset.prefetch(tf.data.experimental.AUTOTUNE)
二 、模型的构建:tf.keras.Model 和 tf.keras.layers
class MLP(tf.keras.Model):
def __init__(self):
super().__init__()
self.flatten = tf.keras.layers.Flatten() # Flatten层将除第一维(batch_size)以外的维度展平
self.dense1 = tf.keras.layers.Dense(units=100, activation=tf.nn.relu)
self.dense2 = tf.keras.layers.Dense(units=10)
def call(self, inputs): # [batch_size, 28, 28, 1]
x = self.flatten(inputs) # [batch_size, 784]
x = self.dense1(x) # [batch_size, 100]
x = self.dense2(x) # [batch_size, 10]
output = tf.nn.softmax(x)
return output
三、模型的训练: tf.keras.losses 和 tf.keras.optimizer
定义模型超参数:
# 学习率
learning_rate = 0.001
model = MLP()
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
实例化一个 tf.keras.metrics.SparseCategoricalAccuracy 评估器
sparse_categorical_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
使用 result() 方法输出最终的评估指标值:
-
我们迭代测试数据集,每次通过 update_state() 方法向评估器输入两个参数: y_pred 和 y_true
-
sparse_categorical_accuracy.update_state(y_true=y,y_pred=y_pred)
-
使用 result() 方法输出最终的评估指标值
-
sparse_categorical_accuracy.result()
迭代进行以下步骤:
-
从 DataLoader 中随机取一批训练数据;
-
将这批数据送入模型,计算出模型的预测值;
-
将模型预测值与真实值进行比较,计算损失函数(loss)。这里使用 tf.keras.losses 中的交叉熵函数作为损失函数;
-
计算损失函数关于模型变量的导数,自动求导tf.GradientTape();
-
将求出的导数值传入优化器
def val_data():
for img,label in test_dataset:
y_pred = model(img)
sparse_categorical_accuracy.update_state(y_true=label, y_pred=y_pred)
val_loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=label, y_pred=y_pred)
val_loss = tf.reduce_mean(val_loss)
return sparse_categorical_accuracy.result(),val_loss
step = 0
epoch_num = 20
for epoch in range(epoch_num):
for img,label in train_dataset:
with tf.GradientTape() as tape:
y_pred = model(img)
loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=label, y_pred=y_pred)
loss = tf.reduce_mean(loss)
sparse_categorical_accuracy.update_state(y_true=label,y_pred=y_pred)
train_acc = sparse_categorical_accuracy.result()
grads = tape.gradient(loss, model.variables)
optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables))
if step%200 == 0:
val_acc,val_loss = val_data()
print("step %d: loss:%f,train_accuracy:%f,val_loss:%f,val_accuracy:%f" % (step, loss.numpy(),train_acc,val_loss,val_acc))
step += 1
step 0: loss:2.407882,train_accuracy:0.125000,val_loss:2.074099,val_accuracy:0.250780
step 200: loss:0.453060,train_accuracy:0.695961,val_loss:0.562909,val_accuracy:0.713320
step 400: loss:0.408312,train_accuracy:0.767617,val_loss:0.159222,val_accuracy:0.773065
step 600: loss:0.394144,train_accuracy:0.796274,val_loss:0.445736,val_accuracy:0.798978
step 800: loss:0.392936,train_accuracy:0.812353,val_loss:0.153734,val_accuracy:0.814024
step 1000: loss:0.325823,train_accuracy:0.823798,val_loss:0.672390,val_accuracy:0.824885
step 1200: loss:0.357862,train_accuracy:0.831875,val_loss:0.131720,val_accuracy:0.832449
step 1400: loss:0.342837,train_accuracy:0.838212,val_loss:0.408839,val_accuracy:0.838842
step 1600: loss:0.281384,train_accuracy:0.843795,val_loss:0.134211,val_accuracy:0.844272
step 1800: loss:0.249147,train_accuracy:0.848121,val_loss:0.686004,val_accuracy:0.848393
step 2000: loss:0.293836,train_accuracy:0.851793,val_loss:0.373876,val_accuracy:0.852171
step 2200: loss:0.309232,train_accuracy:0.855372,val_loss:0.411929,val_accuracy:0.855624
step 2400: loss:0.267641,train_accuracy:0.858417,val_loss:0.955346,val_accuracy:0.858634
step 2600: loss:0.253091,train_accuracy:0.861050,val_loss:0.502215,val_accuracy:0.861142
step 2800: loss:0.309148,train_accuracy:0.863410,val_loss:0.260722,val_accuracy:0.863584
step 3000: loss:0.286947,train_accuracy:0.865758,val_loss:0.073445,val_accuracy:0.865869
step 3200: loss:0.246876,train_accuracy:0.867789,val_loss:0.245684,val_accuracy:0.867833
step 3400: loss:0.255481,train_accuracy:0.869581,val_loss:0.210822,val_accuracy:0.869663
step 3600: loss:0.247359,train_accuracy:0.871342,val_loss:0.328454,val_accuracy:0.871410
step 3800: loss:0.227694,train_accuracy:0.873003,val_loss:0.140022,val_accuracy:0.873055
step 4000: loss:0.237679,train_accuracy:0.874515,val_loss:0.266574,val_accuracy:0.874473
step 4200: loss:0.260514,train_accuracy:0.875931,val_loss:0.505690,val_accuracy:0.875966
step 4400: loss:0.207988,train_accuracy:0.877374,val_loss:0.382154,val_accuracy:0.877417
step 4600: loss:0.255797,train_accuracy:0.878670,val_loss:0.079232,val_accuracy:0.878677
模型的评估:
def test_data():
for img,label in test_dataset:
y_pred = model(img)
sparse_categorical_accuracy.update_state(y_true=label, y_pred=y_pred)
return sparse_categorical_accuracy.result()
test_acc = test_data()
print(test_acc)
tf.Tensor(0.87931794, shape=(), dtype=float32)