分布式训练-模型并行:流水线并行【PP(Pipeline Parallelism)】【不是用于提速,是用于解决一张卡放不下模型】

通常来讲,训练更大规模的网络模型可以在多种任务上取得更好的效果,如提升图像分类任务的准确率。然而,随着参数规模的扩大,AI 加速卡存储(如 GPU 显存)容量问题和卡的协同计算问题成为了训练超大模型的瓶颈。流水线并行从模型切分和调度执行两个角度解决了这些问题,下面将以飞桨流水线并行为例,介绍下基本原理和使用方法。

一、原理介绍

pipeline

与数据并行不同,流水线并行将模型的不同层放置到不同的计算设备,降低单个计算设备的显存消耗,从而实现超大规模模型训练。以上图为例,示例模型包含四个模型层。该模型被切分为三个部分,并分别放置到三个不同的计算设备。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
[2023-05-31 11:07:02] Started by user coding [2023-05-31 11:07:02] Running in Durability level: MAX_SURVIVABILITY [2023-05-31 11:07:04] [Pipeline] Start of Pipeline [2023-05-31 11:07:06] [Pipeline] getContext [2023-05-31 11:07:07] [Pipeline] node [2023-05-31 11:07:07] Running on Jenkins in /root/codingci/tools/jenkins_home/workspace/2553946-cci-31810232-464995 [2023-05-31 11:07:07] [Pipeline] { [2023-05-31 11:07:08] [Pipeline] withEnv [2023-05-31 11:07:08] [Pipeline] { [2023-05-31 11:07:08] [Pipeline] withDockerRegistry [2023-05-31 11:07:08] [Pipeline] { [2023-05-31 11:07:08] [Pipeline] isUnix [2023-05-31 11:07:08] [Pipeline] sh [2023-05-31 11:07:08] + docker inspect -f . public/docker/nodejs:18-2022 [2023-05-31 11:07:08] /root/codingci/tools/jenkins_home/workspace/2553946-cci-31810232-464995@tmp/durable-221f7a67/script.sh: 1: docker: not found [2023-05-31 11:07:08] [Pipeline] isUnix [2023-05-31 11:07:08] [Pipeline] sh [2023-05-31 11:07:09] + docker inspect -f . coding-public-docker.pkg.coding.net/public/docker/nodejs:18-2022 [2023-05-31 11:07:09] /root/codingci/tools/jenkins_home/workspace/2553946-cci-31810232-464995@tmp/durable-4892b310/script.sh: 1: docker: not found [2023-05-31 11:07:09] [Pipeline] isUnix [2023-05-31 11:07:09] [Pipeline] sh [2023-05-31 11:07:09] + docker pull coding-public-docker.pkg.coding.net/public/docker/nodejs:18-2022 [2023-05-31 11:07:09] /root/codingci/tools/jenkins_home/workspace/2553946-cci-31810232-464995@tmp/durable-0770ad1b/script.sh: 1: docker: not found [2023-05-31 11:07:09] [Pipeline] } [2023-05-31 11:07:09] [Pipeline] // withDockerRegistry [2023-05-31 11:07:09] [Pipeline] } [2023-05-31 11:07:09] [Pipeline] // withEnv [2023-05-31 11:07:09] [Pipeline] } [2023-05-31 11:07:09] [Pipeline] // node [2023-05-31 11:07:09] [Pipeline] End of Pipeline [2023-05-31 11:07:09] ERROR: script returned exit code 127 [2023-05-31 11:07:09] Finished: FAILURE
06-01

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值