Horovod 分布式深度学习框架

最新推荐文章于 2022-11-21 17:56:26 发布

还卿一钵无情泪

最新推荐文章于 2022-11-21 17:56:26 发布

阅读量364

点赞数

本文链接：https://blog.csdn.net/weixin_48185819/article/details/113700827

版权

Horovod是一个由Uber开源的分布式深度学习框架，支持TensorFlow、Keras、PyTorch和Apache MXNet。它旨在简化并加速分布式训练过程。文章提到了在标准TensorFlow基准测试中，即使使用128块GPU，也未能充分利用硬件资源。相比之下，Horovod展示了其在提升训练效率和资源利用率方面的优势，特别是在Inception V3和ResNet-101模型上的表现。

摘要由CSDN通过智能技术生成

Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use.

https://github.com/horovod/horovod

Horovod 是 Uber 开源的又一个深度学习工具，它的发展吸取了 Facebook「一小时训练 ImageNet 论文」与百度 Ring Allreduce 的优点，可为用户实现分布式训练提供帮助

标准 TensorFlow 基准套件，使用英伟达 Pascal GPU（从 1 块到 128 块）运行 Inception V3 和 ResNet-101 模型，与理想状态下的分布式计算（单 GPU 算力简单叠加）每秒处理的图像数量对比。从中我们发现标准方法很难释放出硬件的全部潜能。

当我们使用标准 TensorFlow 基准测试套件在 128 块英伟达 Pascal GPU 上进行测试时（如图 1 所示），无论是 Inception V3 还是 ResNet-101 都浪费了将近一半 GPU 算力。

充分利用 GPU 资源是目前大规模训练的一大课题，此前 Facebook 的一小时训练 ImageNet 论文《Accurate, Large Minibatch SGD: Training ImageNet i

最低0.47元/天解锁文章

还卿一钵无情泪

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Horovod 分布式深度学习框架

Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use.https://github.com/horovod/horovodHorovod 是 Uber 开源的又一个深度学习工具.
复制链接

扫一扫