小批量梯度下降法在趋向最小损失方面更快,可以加速网络模型的训练速度,可能还会占有较少的内存资源。采用这样的方式加速模型的训练速度,会为我们之后的调参带来好处。
在之前所用的梯度下降法,被称之为批量梯度下降法,它是将所有的训练样本聚类成一个大的批量,然后计算整个批量的损失函数。而小批量梯度下降法是将训练集分割成更小的批量,然后对每个批量进行单步的梯度下降迭代计算。
我们依旧采用MNIST数据集,先要对整个训练样本数据集分成若干个批量,用函数分装起来,我们的代码是延续这个博客MNIST数据集,图像识别(四)_悠悠海风的博客-CSDN博客
在训练模型的代码里,写批量处理的函数:
# 将训练样本数据集分成若干个批量
def prepare_batches(X_train, Y_train, batch_size):
x_batches = []
y_batches = []
n_examples = X_train.shape[0]
for batch in range(0, n_examples, batch_size):
batch_end = batch + batch_size
x_batches.append(X_train[batch:batch_end])
y_batches.append(Y_train[batch:batch_end])
return x_batches, y_batches
修改report()函数和train()函数:
def report(epoch, batch, X_train, Y_train, X_test, Y_test, w1, w2):
y_hat, _ = forward(X_train, w1, w2)
training_loss = loss(Y_train, y_hat)
classifications = classify(X_test, w1, w2)
accuracy = np.average(classifications == Y_test) * 100.0
print("%5d-%d, Loss: %.8f, Accuracy: %.2f%%" %
(epoch, batch, training_loss, accuracy))
def train(X_train, Y_train, X_test, Y_test, n_hidden_nodes, epochs, batch_size, lr):
n_input_variables = X_train.shape[1]
n_classes = Y_train.shape[1]
w1, w2 = initialize_weights(n_input_variables, n_hidden_nodes, n_classes)
x_batches, y_batches = prepare_batches(X_train, Y_train, batch_size)
# epoch是历元的意思,遍历训练集中的所有小批量样本数据
for epoch in range(epochs):
# 对单个小批量样本数据进行梯度下降的一步迭代计算
for batch in range(len(x_batches)):
y_hat, h = forward(x_batches[batch], w1, w2)
w1_gradient, w2_gradient = back(x_batches[batch], y_batches[batch], y_hat, w2, h)
w1 = w1 - (w1_gradient * lr)
w2 = w2 - (w2_gradient * lr)
report(epoch, batch, X_train, Y_train, X_test, Y_test, w1, w2)
return (w1, w2)
训练网络模型,超参数epochs可以任意取,但要在样本数量的范围内,batch_size也是,学习率lr试了0.01,发现效果可以:
# 只有将文件作为程序的运行时才执行以下代码
if __name__ == "__main__":
w1, w2 = train(mi.X_train, mi.Y_train,
mi.X_test, mi.Y_test,
n_hidden_nodes=200, epochs=10, batch_size=20000, lr=0.01)
参考文献:
Programming Machine Learning: Form Coding to Deep Learning.[M],Paolo Perrotta,2021.6.