安装pytorch_lightning要点
torch和lightning都使用pip安装
安装GPU版本的pytorch(注意cuda版本):pip install torch==1.8.1+cu101 torchvision==0.9.1+cu101 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
(直接官网找)
安装pytorch_lightning(我安装的是1.4.9版本):python -m pip install lightning==1.4.9
(或者是pip install lightning==1.4.9
,不记得是哪一个了,重要的是后面跟版本号)(安装完了之后记得重新打印一下torch版本看一下,有可能安装lightning时把原先的GPU版torch卸载,安装了CPU版的torch,这也是坑所在,注意注意!)
此时出现报错:ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data'
降低torchmetrics版本:pip install torchmetrics==0.6.0
(0.6.0其实是瞎碰的,然后报错竟然没有了)
验证:import pytorch_lighrning as pl
没出现报错就是安装成功了(或者直接运行下面的最简LigntningModule)
- 实现最简LigntningModule
参考pytorch lightning最简上手
import os
import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms
import pytorch_lightning as pl
class MNISTModel(pl.LightningModule):
# 实现 training_step,train_dataloader, configure_optimizer,已经是最简单的 LightningModule 的实现了。如果连这三个方法都没有实现的话,将会报错:
# No `xxx` method defined. Lightning `Trainer` expects as minimum a `training_step()`, `train_dataloader()` and `configure_optimizers()` to be defined
def __init__(self):
super(MNISTModel, self).__init__()
self.l1 = torch.nn.Linear(28 * 28, 10)
def forward(self, x):
return torch.relu(self.l1(x.view(x.size(0), -1)))
def train_dataloader(self):
return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)
def training_step(self, batch, batch_nb):
x, y = batch
loss = F.cross_entropy(self(x), y)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)
# 启动训练的最简实现非常简单,只需三行:实例化模型、实例化训练器、开始训练!
model = MNISTModel()
trainer = pl.Trainer(accelerator="auto", max_epochs=2)
trainer.fit(model)
运行结果显示:
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
--------------------------------
0 | l1 | Linear | 7.9 K
--------------------------------
7.9 K Trainable params
0 Non-trainable params
7.9 K Total params
0.031 Total estimated model params size (MB)
/home/jhk/anaconda3/envs/openg/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:105: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 16 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
Epoch 1: 100%|████████████████████████████████| 1875/1875 [00:05<00:00, 338.58it/s, loss=0.709, v_num=7]
至此,应该是安装成功了。
(折腾了一上午,简直崩溃!)
以上为个人经验记录,仅供参考。