并行计算课程自学1

白泽白泽呀

已于 2024-01-02 15:38:52 修改

阅读量982

点赞数 23

分类专栏：并行计算学习文章标签：服务器

于 2023-12-29 14:07:25 首次发布

本文链接：https://blog.csdn.net/weixin_45498320/article/details/135289735

版权

并行计算学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

本文介绍了如何使用MapReduce模型进行同步并行梯度计算，包括广播、本地计算和reduce阶段。讨论了通信开销、同步开销以及节点失败对性能的影响，强调了数据并行性在降低计算成本中的作用，以及实际速度提升受限于最慢节点（straggler效应）。

摘要由CSDN通过智能技术生成

利用MapReduse进行同步并行梯度计算

什么是MapReduce

MapReduce is a programming model and software system developedby Google [1].

Characters: client-server architecture, message-passing communication,and bulk synchronous parallel.

Apache Hadoop[2]: is an open-source implementation of MapReduce.

Apache Spark [3]: is an improved open-source MapReduce.

Reference

1.Dean and Ghemawat: MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008.

2.Apache Hadoop

3.Zaharia and others: Apache Spark: a unified engine for big data processing. Communications of the ACM, 2016.

Parallel Gradient Descent Using MapReduce

利用MapReduse进行同步并行梯度计算的步骤

Broadcast广播: Server broadcast the up-to-date parameters $w_t$ to workers.

Map(不需要通信): Workers do computation locally.

Map $(x_i,y_i,\color{red}{w_t}\color{black})$ to $\color{blue}g_i\color{black}=(x^T_iw-y_i)x_i$ 将数据映射到向量g上

Obtain $n$ vectors: $\color{blue}g_1,g_2,g_3,...,g_n$

Reduce: Compute the sum: $g=\sum_{i=1}^n \color{blue}g_i$

Every worker sums all the { $g_i$ } stored in its local memory to get a vector

Then, the server sums the resulting $m$ vectors(there are m wokers)

Server updates the parameters: $\color{red}{w_{t+1}}\color{black}=\color{red}{w_t}\color{black}-\alpha*g$

一直重复直到算法收敛

Every worker stores $\color{red}\frac {1}{m}$ of the data

Every worker does $\color{red}\frac {1}{m}$ of the computation

Is the runtime reduce to $\color{red}\frac {1}{m}$ ？

No. Because communication and synchronization must be considered

Communication Cost 通信开销

Communication complexity通信复杂度：How many words are transmitted between server and workers

Proportional to number of parameters

Grow eith number of worker nodes

Latency: How much time it takes for a packet of data to get from onepoint to another.(Determined by the compute network.)

Communication time: complexity/bandwith + latency

Synchronization Cost 同步开销：

Question: What if a node fails and then restart?

This node will be much slower than all the others.

It is called straggler.

Straggler effect:

The wall-clock time is determined by the slowest node.

It is a consequence of synchronization.

总结 Recap：

Gradient descent can be implemented using MapReduce.

Data parallelism: Data are partitioned among the workers

One gradient descent step requires a broadcast, a map, and areduce.

Cost: computation, communication, and synchronization.

Using m workers, the speedup ratio is lower than m

白泽白泽呀

关注

23
点赞
踩
18

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录