1.梯度累积
在训练神经网络的时候,batch size越大模型训练越稳定,但是往往受制于显存大小而不能设置较大的batch size,为了达到和设置大batch size一样的训练效果,可以采用梯度累积的方式来进行训练。传统的训练方式都是训练一个batch的样本就执行一次梯度下降算法更新参数,梯度累积则是设置一个累积步数 n n n,每训练 n n n个batch才更新一次参数。例如,batch size设置为32,传统的算法每次扫过32个样本后即更新一次参数,假设我们的显存最大只支持batch size是8,那么我们设置累积步数为4,达到的效果和传统的算法是一样的。
以简单的线性回归为例:
1.传统的算法
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
weights = torch.tensor([5, 10], dtype=torch.float32)
bias = torch.tensor(1, dtype=torch.float32)
data = torch.randn(size=(640, 2), dtype=torch.float32)
target = torch.matmul(data, weights) + bias
target = target.reshape(640, 1)
class MyDataset(Dataset):
def __init__(self, data, target):
self.data = data
self.target = target
def __getitem__(self, index):
return self.data[index], self.target[index]
def __len__(self):
return len(self.data)
mydataset = MyDataset(data, target)
trainLoader = DataLoader(dataset=mydataset, batch_size=64)
model = nn.Linear(in_features=2, out_features=1)
loss = nn.MSELoss()
optimizer = torch.optim.SGD(params=model.parameters(),lr=0.01)
for _ in range(500):
for data, target in trainLoader:
optimizer.zero_grad()
y_hat = model(data)
loss_val = loss(y_hat, target)
loss_val.backward()
optimizer.step()
- 梯度累积算法
trainLoader = DataLoader(dataset=mydataset, batch_size=16)
accumlation_step = 4 # 遍历4个batch反向传播一次
for _ in range(500):
for i, (data, target) in enumerate(trainLoader):
y_hat = model(data)
loss_val = loss(y_hat, target)
loss_val = loss_val / accumlation_step
loss_val.backward()
if (i + 1) % accumlation_step == 0:
optimizer.step()
optimizer.zero_grad()