今天看到师兄的代码里面用到了amp包,但是我在使用的时候遇到了apx无法使用的问题,后来知道pytorch已经集成了amp,因此学习了一下pytorch中amp的使用。
官网https://pytorch.org/docs/stable/amp.html?highlight=amp
torch.cuda.amp
作用:
torch.cuda.amp提供了可以使用混合精度的方便方法,以加速训练。在网络中,有一些操作,例如linear layer和convolution,在float16时会更加快速,而另外一些操作,例如reduction,会需要float32的动态范围,混合精度就是在尽可能地将每一种操作匹配到最合适的精度。
使用方法:
import torch.cuda.amp.autocast as aotucast
import torch.cuda.amp.GradScaler as GradScaler
- Typical Mixed Precision Training
# create model and optimizer in default precision
model = Net().cuda()
optimizer = optim.SGD(model.patameters(), ...)
scaler = GradScaler()
for epoch in epochs:
for input, target in data:
optimizer.zero_grad()
with aotucast():
output = model(input)
loss = loss_fn(output, target)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
- Working with unscaled Gradients
scaler = GradScaler()
for epoch in epochs:
for input, target in data:
optimizer.zero_grad()
with aotucast():
output = model(input)
loss = loss_fn(output, target)
scaler.scale(loss).backward()
scaler.unscale_(optimizer)
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
scaler.step(optimizer)
scaler.update()
- Working with Scaled Gradients (gradient accumulation)
scaler = GradScaler()
for epoch in epochs:
for i (input, target) in enumerate(data):
with autocast():
output = model(input)
loss = loss_fn(output, target)
loss = loss / iters_to_accumulate
if((i+1)%iters_to_accumulate == 0):
scaler.step(optimizer)
scaler.update()
optimizer.zero_grad()
Working with Multiple Models, Losses, and Optimizers
scaler = GradScaler()
for epoch in epochs:
for input, target in data:
optimizer0.zero_grad()
optimizer1.zero_grad()
with autocast():
output0 = model0(input)
output1 = model1(input)
loss0 = loss_fn(2*output0 + 3*output1, target)
loss1 = loss_fn(3*output0 - 5*output1, target)
scaler.scale(loss0).backward(retain_graph = True)
scaler.scale(loss1).backward()
scaler.unscale_(optimizer0)
scaler.step(optimizer0)
scaler.step(optimizer1)
scaler.update()