PyTorch Lightning - 预训练、微调和部署AI模型

在这里插入图片描述


一、关于 PyTorch Lightning

PyTorch Lightning 是用于预训练、微调和部署AI模型的深度学习框架。

新-部署模型?查看LitServe,用于模型服务的PyTorch Lightning


Lightning有2个核心包

PyTorchLightning:大规模训练和部署PyTorch
Lightning结构:专家控制

Lightning让您可以精细控制要在PyTorch上添加多少抽象。

在这里插入图片描述


二、快速启动

安装Lightning:

pip install lightning

高级安装选项


使用可选依赖项安装
pip install lightning['extra']

Conda
conda install lightning -c conda-forge

安装稳定版

从源代码安装未来版本

pip install https://github.com/Lightning-AI/lightning/archive/refs/heads/release/stable.zip -U

安装 bleeding-edge

从源代码安装 nightly(不保证)

pip install https://github.com/Lightning-AI/lightning/archive/refs/heads/master.zip -U

或从测试PyPI

pip install -iU https://test.pypi.org/simple/ pytorch-lightning

PyTorchLightning示例

定义培训工作流程。这是一个玩具示例(探索真实示例):

# main.py
# ! pip install torchvision
import torch, torch.nn as nn, torch.utils.data as data, torchvision as tv, torch.nn.functional as F
import lightning as L

# --------------------------------
# Step 1: Define a LightningModule
# --------------------------------
# A LightningModule (nn.Module subclass) defines a full *system*
# (ie: an LLM, diffusion model, autoencoder, or simple image classifier).

class LitAutoEncoder(L.LightningModule):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 3))
        self.decoder = nn.Sequential(nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 28 * 28))

    def forward(self, x):
        # in lightning, forward defines the prediction/inference actions
        embedding = self.encoder(x)
        return embedding

    def training_step(self, batch, batch_idx):
        # training_step defines the train loop. It is independent of forward
        x, _ = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = F.mse_loss(x_hat, x)
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

# -------------------
# Step 2: Define data
# -------------------
dataset = tv.datasets.MNIST(".", download=True, transform=tv.transforms.ToTensor())
train, val = data.random_split(dataset, [55000, 5000])

# -------------------
# Step 3: Train
# -------------------
autoencoder = LitAutoEncoder()
trainer = L.Trainer()
trainer.fit(autoencoder, data.DataLoader(train), data.DataLoader(val))

在您的终端上运行模型

pip install torchvision
python main.py

三、为什么选择PyTorchLightning?

PyTorch Lightning只是有组织的PyTorch-Lightning解开PyTorch代码以将科学与工程解耦。

在这里插入图片描述


例子

探索使用PyTorch Lightning可能进行的各种类型的训练。预训练和微调任何类型的模型以执行任何任务,例如分类、分割、汇总等:

任务描述运行
你好世界预训练-你好世界示例Open in Studio
图像分类Finetune-ResNet-34模型对汽车图像进行分类Open in Studio
图像分割Finetune-ResNet-50模型对图像进行分割Open in Studio
对象检测Finetune-Faster R-CNN模型检测对象Open in Studio
文本分类Finetune-文本分类器(BERT模型)Open in Studio
文本摘要Finetune-文本摘要(Hugging Face Transformers 模型)
音频生成Finetune-音频生成器(Transformers 模型)Open in Studio
LLM微调Finetune-LLM(Meta Llama 3.18B)Open in Studio
图像生成预训练-图像生成器(扩散模型)Open in Studio
推荐系统训练-推荐系统(因式分解和嵌入)Open in Studio
时间序列预测Open in Studio)


高级功能

Lightning拥有超过40多种高级功能,专为大规模的专业人工智能研究而设计。

这里有一些例子:

在这里插入图片描述


无需更改代码即可在1000多个GPU上进行训练

# 8 GPUs
# no code changes needed
trainer = Trainer(accelerator="gpu", devices=8)

# 256 GPUs
trainer = Trainer(accelerator="gpu", devices=8, num_nodes=32)

无需更改代码即可在TPU等其他加速器上进行训练

# no code changes needed
trainer = Trainer(accelerator="tpu", devices=8)

16-bit precision

# no code changes needed
trainer = Trainer(precision=16)

Experiment managers

from lightning import loggers

# tensorboard
trainer = Trainer(logger=TensorBoardLogger("logs/"))

# weights and biases
trainer = Trainer(logger=loggers.WandbLogger())

# comet
trainer = Trainer(logger=loggers.CometLogger())

# mlflow
trainer = Trainer(logger=loggers.MLFlowLogger())

# neptune
trainer = Trainer(logger=loggers.NeptuneLogger())

# ... and dozens more

Early Stopping

es = EarlyStopping(monitor="val_loss")
trainer = Trainer(callbacks=[es])

Checkpointing

checkpointing = ModelCheckpoint(monitor="val_loss")
trainer = Trainer(callbacks=[checkpointing])

Export to torchscript (JIT) (production use)

# torchscript
autoencoder = LitAutoEncoder()
torch.jit.save(autoencoder.to_torchscript(), "model.pt")

Export to ONNX (production use)

# onnx
with tempfile.NamedTemporaryFile(suffix=".onnx", delete=False) as tmpfile:
    autoencoder = LitAutoEncoder()
    input_sample = torch.randn((1, 64))
    autoencoder.to_onnx(tmpfile.name, input_sample, export_params=True)
    os.path.isfile(tmpfile.name)

与非结构化PyTorch相比的优势

  • 模型变得与硬件无关
  • 代码清晰易读,因为工程代码被抽象掉了
  • 更容易繁殖
  • 少犯错误,因为Lightning处理棘手的工程
  • 保持所有的灵活性(LightningModules仍然是PyTorch模块),但删除了大量样板文件
  • Lightning与流行的机器学习工具有几十个集成。
  • 用每一个新的公关严格测试。我们测试PyTorch和Python支持的版本、每个操作系统、多GPU甚至TPU的每一个组合。
  • 最小的运行速度开销(与纯PyTorch相比,每个时期约300 ms)。

阅读PyTorch Lightning文档


四、Lightning Fabric:专家控制

在任何规模的任何设备上运行,对PyTorch训练循环和扩展策略进行专家级控制。您甚至可以编写自己的Trainer。

Fabric专为最复杂的模型而设计,如基础模型缩放、LLM、扩散、Transformers 、强化学习、主动学习。任何尺寸。

要更改什么

+ import lightning as L
  import torch; import torchvision as tv

 dataset = tv.datasets.CIFAR10("data", download=True,
                               train=True,
                               transform=tv.transforms.ToTensor())

+ fabric = L.Fabric()
+ fabric.launch()

  model = tv.models.resnet18()
  optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
- device = "cuda" if torch.cuda.is_available() else "cpu"
- model.to(device)
+ model, optimizer = fabric.setup(model, optimizer)

  dataloader = torch.utils.data.DataLoader(dataset, batch_size=8)
+ dataloader = fabric.setup_dataloaders(dataloader)

  model.train()
  num_epochs = 10
  for epoch in range(num_epochs):
      for batch in dataloader:
          inputs, labels = batch
-         inputs, labels = inputs.to(device), labels.to(device)
          optimizer.zero_grad()
          outputs = model(inputs)
          loss = torch.nn.functional.cross_entropy(outputs, labels)
-         loss.backward()
+         fabric.backward(loss)
          optimizer.step()
          print(loss.data)

结果结构代码(复制我!)

import lightning as L
import torch; import torchvision as tv

dataset = tv.datasets.CIFAR10("data", download=True,
                              train=True,
                              transform=tv.transforms.ToTensor())

fabric = L.Fabric()
fabric.launch()

model = tv.models.resnet18()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
model, optimizer = fabric.setup(model, optimizer)

dataloader = torch.utils.data.DataLoader(dataset, batch_size=8)
dataloader = fabric.setup_dataloaders(dataloader)

model.train()
num_epochs = 10
for epoch in range(num_epochs):
    for batch in dataloader:
        inputs, labels = batch
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = torch.nn.functional.cross_entropy(outputs, labels)
        fabric.backward(loss)
        optimizer.step()
        print(loss.data)

关键功能

松从在CPU上运行切换到GPU(Apple Si、CUDA、…)、TPU、多GPU甚至多节点训练

# Use your available hardware
# no code changes needed
fabric = Fabric()

# Run on GPUs (CUDA or MPS)
fabric = Fabric(accelerator="gpu")

# 8 GPUs
fabric = Fabric(accelerator="gpu", devices=8)

# 256 GPUs, multi-node
fabric = Fabric(accelerator="gpu", devices=8, num_nodes=32)

# Run on TPUs
fabric = Fabric(accelerator="tpu")

使用最先进的分布式训练策略(DDP、FSDP、DeepSpeed)和开箱即用的混合精度

# Use state-of-the-art distributed training techniques
fabric = Fabric(strategy="ddp")
fabric = Fabric(strategy="deepspeed")
fabric = Fabric(strategy="fsdp")

# Switch the precision
fabric = Fabric(precision="16-mixed")
fabric = Fabric(precision="64")

所有设备逻辑样板都为您处理

# no more of this!
- model.to(device)
- batch.to(device)

使用Fabric原语构建您自己的自定义训练器,用于训练检查点、日志记录等

import lightning as L

class MyCustomTrainer:
    def __init__(self, accelerator="auto", strategy="auto", devices="auto", precision="32-true"):
        self.fabric = L.Fabric(accelerator=accelerator, strategy=strategy, devices=devices, precision=precision)

    def fit(self, model, optimizer, dataloader, max_epochs):
        self.fabric.launch()

        model, optimizer = self.fabric.setup(model, optimizer)
        dataloader = self.fabric.setup_dataloaders(dataloader)
        model.train()

        for epoch in range(max_epochs):
            for batch in dataloader:
                input, target = batch
                optimizer.zero_grad()
                output = model(input)
                loss = loss_fn(output, target)
                self.fabric.backward(loss)
                optimizer.step()

你可以找到一个更广泛的例子在我们的例子


示例


Self-supervised Learning

Convolutional Architectures

Reinforcement Learning

GANs

Classic ML

持续集成

Lightning在多个CPU、GPU和TPU以及主要Python和PyTorch版本上经过严格测试。


*Codecov>90%+但构建延迟可能会显示更少

2025-01-19(日)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

编程乐园

请我喝杯伯爵奶茶~!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值