机器学习系统--Parameter Server

最新推荐文章于 2024-05-10 09:35:35 发布

zealscott

最新推荐文章于 2024-05-10 09:35:35 发布

阅读量291

点赞数

分类专栏： DistributionSystem 文章标签： Parameter Server 机器学习分布式系统

本文链接：https://blog.csdn.net/crazy_scott/article/details/85796517

版权

32 篇文章 1 订阅

订阅专栏

介绍分布式机器学习系统架构-- Parameter Server。

Parameter Server简介

是一个以参数为中心的机器学习系统。

某些learning算法的模型复杂，参数很大
- Complex Models with Billions and Trillions of Parameters
- e.g. LDA
某些Learning过程呈现线性，需要同步
- Sequential ML jobs require barriers and hurt performance by blocking
- BSP model是我们想要的，但如何平衡性能？
容错很重要，尤其是参数的容错
- At scale, Fault Tolerance is required as these jobs run in a cloud where machines are unreliable and jobs can be preempted
- 大规模的机器学习算法参数很多，需要进行容错

Up to the algorithm designer to choose the flexible consistency model
- 其实让编程变得更复杂
Trade-off between Algorithm Efficiency and System Performance
- 计算考虑
  - 异步可能是错的
- 性能方面
  - 异步更好

使用一致性哈希和备份的方式

GraphLab和PS中sequential一样吗?
- GraphLab强调数据点之间的顺序计算关系
- PS不考察训练数据点之间的关系，强调多次迭代之间的顺序关系
GraphLab中的consistency和PS中的 consistency是一样的吗?
- GraphLab中的consistency解决可串行问题
- PS中的consistency解决同步/异步计算问题

关注

专栏目录