前言
在上一章《Pytorch神经网络》中说过,我们将探究优化器。但是,在进入优化器章节之前,我们还需要聊聊批次训练。代码中出现的starknn是来自《Numpy神经网络》中的代码。
import numpy as np
import time
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import starknn#导入自己写的模块
#初始化模型参数
nn_cfg = [{"in_features": 2, "out_features": 25, "activation": "relu"},#(2,25)
{"in_features": 25, "out_features": 50, "activation": "relu"},#(25,50)
{"in_features": 50, "out_features": 50, "activation": "relu"},#(50,50)
{"in_features": 50, "out_features": 25, "activation": "relu"},#(50,25)
{"in_features": 25, "out_features": 2, "activation": "sigmoid"}]#(25,2)
#准备数据
X, y = make_moons(n_samples = 1000, noise=0.3)#数据和标签
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)#拆分数据集,训练集:测试集=9:1
y_train = starknn.idx2onehot(y_train)#标签转化,独热编码
一、全部数据训练
start = time.time()
params, acc_history, cost_history = starknn.train(x_train, y_train, nn_cfg, 10000, 0.01)
end = time.time()
print('The full data time is {:.2f} second.'.format(end-start))
#测试
y_hat, _ = starknn.forward_full_layer(x_test, params, nn_cfg)
test_accuracy = starknn.calc_accuracy(y_hat, y_test, train=False)
print('The accuracy of this test dataset is {}%.'.format(test_accuracy * 100))
The full data time is 48.71 second.
The accuracy of this test dataset is 93.0%.
上面的训练方法是,一次性把所有的数据都拿给神经网络训练,其实不太利于模型对数据的学习。熟话说得好啊,饭吃七分饱,话说三分满,所以我们尝试使用批次训练。
batch_size = 32#批次大小
二、批次训练(1)
#批次数据训练
def batch_train(X, Y, nn_cfg, epochs, learning_rate, batch_size, train=True):
params = starknn.init_layers(nn_cfg, 2)
num_batch = X.shape[0] // batch_size#数据数量整除批次大小
acc_history = []
cost_history = []
for i in range(epochs):
offset_idx = i % num_batch#一批次训练
X_batch = X[offset_idx: (offset_idx + 1) * batch_size, :]
Y_batch = Y[offset_idx: (offset_idx + 1) * batch_size, :]
#前向传播
Y_hat, memory = starknn.forward_full_layer(X_batch, params, nn_cfg)
#计算准确率
accuracy = starknn.calc_accuracy(Y_hat, Y_batch, train=train)
#计算损失
cost = starknn.calc_cost(Y_hat, Y_batch)
acc_history.append(accuracy)
cost_history.append(cost)
#反向传播
grads = starknn.full_backward_propagation(Y_hat, Y_batch, memory, params, nn_cfg)
#更新参数
params = starknn.update(params, grads, nn_cfg, learning_rate)
return params, acc_history, cost_history
start = time.time()
params, acc_history, cost_history = batch_train(x_train, y_train, nn_cfg, 10000, 0.01, batch_size)
end = time.time()
print('The mini batch data time is {:.2f} second.'.format(end-start))
#测试
y_hat, _ = starknn.forward_full_layer(x_test, params, nn_cfg)
test_accuracy = starknn.calc_accuracy(y_hat, y_test, train=False)
print('The accuracy of this test dataset is {}%.'.format(test_accuracy * 100))
The mini batch data time is 32.16 second.
The accuracy of this test dataset is 91.0%.
三、批次训练(2)
def another_batch(X, Y, nn_cfg, epochs, learning_rate, batch_size, train=True):
params = starknn.init_layers(nn_cfg, 2)
num_batch = X.shape[0] // batch_size#数据数量整除批次大小
acc_history = []
cost_history = []
for epoch in range(epochs):
for offset_idx in range(num_batch):
X_batch = X[offset_idx: (offset_idx + 1) * batch_size, :]
Y_batch = Y[offset_idx: (offset_idx + 1) * batch_size, :]
#前向传播
Y_hat, memory = starknn.forward_full_layer(X_batch, params, nn_cfg)
#计算准确率
accuracy = starknn.calc_accuracy(Y_hat, Y_batch, train=train)
#计算损失
cost = starknn.calc_cost(Y_hat, Y_batch)
acc_history.append(accuracy)
cost_history.append(cost)
#反向传播
grads = starknn.full_backward_propagation(Y_hat, Y_batch, memory, params, nn_cfg)
#更新参数
params = starknn.update(params, grads, nn_cfg, learning_rate)
return params, acc_history, cost_history
start = time.time()
params, acc_history, cost_history = another_batch(x_train, y_train, nn_cfg, 10000//(x_train.shape[0]//batch_size), 0.01, batch_size)
end = time.time()
print('The another batch data time is {:.2f} second.'.format(end-start))
#测试
y_hat, _ = starknn.forward_full_layer(x_test, params, nn_cfg)
test_accuracy = starknn.calc_accuracy(y_hat, y_test, train=False)
print('The accuracy of this test dataset is {}%.'.format(test_accuracy * 100))
The another batch data time is 34.70 second.
The accuracy of this test dataset is 93.0%.
(1)和(2)中的批次训练是一致的,只是换了一种写法。注意在(2)中传入的epochs就不再是10000,而是10000整除num_batch。在Pytorch中我一般采用第(2)种方式。
总结
3种方法的结果都比较相近。批次训练的方法使用的时间要少一些,因为每次进行的矩阵(batch_size,2)要比全数据运算的矩阵(900, 2)计算量要小很多。每个epoch计算量较小,计算的速度就要快点。ok,准备就绪。下一章节就开始聊聊优化器,敬请期待。