pytorch-lightning使用笔记

最新推荐文章于 2024-07-18 20:02:41 发布

真炎破天

最新推荐文章于 2024-07-18 20:02:41 发布

阅读量688

点赞数 2

分类专栏：深度学习基础知识 nlp 文章标签： pytorch 深度学习

本文链接：https://blog.csdn.net/u012409283/article/details/119912238

版权

10 篇文章 0 订阅

订阅专栏

6 篇文章 0 订阅

订阅专栏

LightningDataModule API

定义5个方法：

方法：

training_step
training_step_end
training_epoch_end
validation_step
test_step
predict_step
configure_optimizers:输出optimizer或者optimizer和lr_scheduler， type: single optimizer, List, Dict
freeze(): 固定所有的参数进行推理
log(): log a key, val pair
log_dict(): log a dictionary of values at once
manual_backward()
save_hyperparameters(): save arguments to hparams attribute
to_onnx(): save the model in ONNX format
to_torchscript()
属性
self.current_epoch
self.device
self.global_rank
self.global_step
self.local_rank
self.hparams
self.logger
self.precision
self.trainer
self.amp: True if using Automatic Mixed Precision
self.automatic_optimization: When set to False, Lightning does not automate the optimization process. This means you are responsible for handling your optimizers. However, we do take care of precision and any accelerators used

在DataLoader中采用多进程，进程数一般满足：num_workers = 4 * num_GPU
Pin memory使用。原因是部分显存被预留，不能被使用。使能Pin memory可以避免这种情况。等效torch中的torch.cuda.empty_cache()

data_loader = DataLoader(dataset, num_workers=8， pin_memory=True)

避免tensor从CPU转移到GPU
1. 不提倡调用.item(), .numpy(), .cpu()，采用.detach()替换。（效果未验证，待定）
2. 直接在GPU上创建tensor。用t = torch.rand(2, 2, device=self.device)替换`t = torch.rand(2, 2).cuda()
采用DistributedDataParallel 而不提倡使用DataParallel进行并行训练
采用16-bit procision精度训练进行加速。（在此过程中，并非所有参数都转换成了16-bit）

trainer = Trainer(distributed_backend='ddp', gpus=8, precision=16)

https://towardsdatascience.com/7-tips-for-squeezing-maximum-performance-from-pytorch-ca4a40951259
https://pytorch-lightning.readthedocs.io/en/latest/guides/speed.html?highlight=numpy()#item-numpy-cpu

关注