【DL】mini-batch策略

最新推荐文章于 2024-07-29 15:57:11 发布

北境の守卫

最新推荐文章于 2024-07-29 15:57:11 发布

阅读量904

点赞数 1

分类专栏： DL 文章标签： batch epoch

本文链接：https://blog.csdn.net/baishuo8/article/details/90260099

版权

本文介绍了深度学习中的基础概念，如epoch、batch_size和iterations，并探讨了它们在训练过程中的作用。batch_size作为核心超参数，其大小直接影响训练速度、硬件资源利用及模型性能。过大可能导致内存不足，过小则可能使梯度更新过于随机。通过实验对比，建议在特定硬件条件下，batch_size=32是训练MNIST数据集时的最优选择。

摘要由CSDN通过智能技术生成

Backto Deep Learning Index

基础概念

epoch : 所有训练集使用一次，叫一个 epoch

one epoch = one forward pass and one backward pass of all the training examples

对应代码中的参数是 n_epochs.

batch_size : 一个 batch 中 samples 的个数

一般情况下，一个训练集中会有大量的samples，受限于内存大小通常无法一次加载，同时为了提高训练速度，会将整个training set分为n_batch组，每组包含batch_size个samples

train_set = batch_size * n_batch

batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you’ll need.

iterations : 利用某个 batch 中的所有samples 进行一次训练，叫一次 iteration

number of iterations = number of passes, each pass using batch_size number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes)
n_iterations = n_epoch * n_batch

具体流程是

# epoch个数
n_epochs = 100
# 样本总个数
numSamples = 100 000
# 要将样本分割为n_batch组
n_batch = 10
# 每个batch包含的samples
batch_size = numSamples / n_batch 
# 进行训练
iterations = 0
for i in range(n_epochs ):
    for j in range (n_batch):
       #利用第j组batch进行training
       train (j) 
       # iterations个数加1
       iterations = iterations  +1

超参数设定

讲道理，n_epochs 是个单向的参数：数量越多，耗时越长，效果越好。即使是陷入坑里，多来几次epoch，也不会变差。n_iterarions 是个从属量。

batch_size 是最核心的一个，涉及的软硬件方面很多:

size 大了，一下子进入内存送入GPU，硬件利用效率高；梯度方向更能代表整体的梯度方向，但是更新一次梯度变慢了，达到同样精度可能需要多来几个epoch；再调大可能硬件资源吃不消就报错了，或者早早进坑跳不出来了。
size 小了，硬件资源浪费，梯度更新随机性太高，但是灵活多变，梯度快速更新多次之后，可能一团乱麻，可能乱中出奇迹。

所以，这是一个技术活，需要经验感觉。举一个例子，很多helloword.demo 用MINST识别做入门，通常会把所有的图像拼接为一个tensor，开场就是 batch_size = 128，在我的 RTX2080 （8G）上，跑三个epoch之后就会爆 GPU sync failed 错误。调成 batch_size = 32<