LightningDataModule API
定义5个方法:
- prepare_data (how to download(), tokenize, etc…)
在这个方法中处理需要写入磁盘或者需要单进程完成的任务 - setup (how to split, etc…)
一些data operations希望在每块GPU上运行。包括但不限于:
(1)count number of classes
(2)build vocabulary
(3)perform train/val/test splits
(4)apply transforms (defined explicitly in your datamodule) - train_dataloader
- val_dataloader(s)
- test_dataloader(s)
LightningModule API
方法:
- training_step
- training_step_end
- training_epoch_end
- validation_step
- test_step
- predict_step
- configure_optimizers:输出optimizer或者optimizer和lr_scheduler, type: single optimizer, List, Dict
- freeze(): 固定所有的参数进行推理
- log(): log a key, val pair
- log_dict(): log a dictionary of values at once
- manual_backward()
- save_hyperparameters(): save arguments to hparams attribute
- to_onnx(): save the model in ONNX format
- to_torchscript()
属性 - self.current_epoch
- self.device
- self.global_rank
- self.global_step
- self.local_rank
- self.hparams
- self.logger
- self.precision
- self.trainer
- self.amp: True if using Automatic Mixed Precision
- self.automatic_optimization: When set to False, Lightning does not automate the optimization process. This means you are responsible for handling your optimizers. However, we do take care of precision and any accelerators used
tips汇总
- 在DataLoader中采用多进程,进程数一般满足:num_workers = 4 * num_GPU
- Pin memory使用。原因是部分显存被预留,不能被使用。使能Pin memory可以避免这种情况。等效torch中的
torch.cuda.empty_cache()
data_loader = DataLoader(dataset, num_workers=8, pin_memory=True)
- 避免tensor从CPU转移到GPU
- 不提倡调用
.item(), .numpy(), .cpu()
, 采用.detach()
替换。(效果未验证,待定) - 直接在GPU上创建tensor。用
t = torch.rand(2, 2, device=self.device)
替换`t = torch.rand(2, 2).cuda()
- 不提倡调用
- 采用DistributedDataParallel 而不提倡使用DataParallel进行并行训练
- 采用16-bit procision精度训练进行加速。(在此过程中,并非所有参数都转换成了16-bit)
trainer = Trainer(distributed_backend='ddp', gpus=8, precision=16)
参考文献
- https://towardsdatascience.com/7-tips-for-squeezing-maximum-performance-from-pytorch-ca4a40951259
- https://pytorch-lightning.readthedocs.io/en/latest/guides/speed.html?highlight=numpy()#item-numpy-cpu