并行计算课程自学1

本文介绍了如何使用MapReduce模型进行同步并行梯度计算,包括广播、本地计算和reduce阶段。讨论了通信开销、同步开销以及节点失败对性能的影响,强调了数据并行性在降低计算成本中的作用,以及实际速度提升受限于最慢节点(straggler效应)。
摘要由CSDN通过智能技术生成

利用MapReduse进行同步并行梯度计算

什么是MapReduce

MapReduce is a programming model and software system developedby Google [1].

Characters: client-server architecture, message-passing communication,and bulk synchronous parallel.

Apache Hadoop[2]: is an open-source implementation of MapReduce.

Apache Spark [3]: is an improved open-source MapReduce.

Reference

1.Dean and Ghemawat: MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008.

2.Apache Hadoop

3.Zaharia and others: Apache Spark: a unified engine for big data processing. Communications of the ACM, 2016.

Parallel Gradient Descent Using MapReduce

利用MapReduse进行同步并行梯度计算的步骤

Broadcast广播: Server broadcast the up-to-date parameters w_t to workers.

Map(不需要通信): Workers do computation locally.

Map (x_i,y_i,\color{red}{w_t}\color{black}) to \color{blue}g_i\color{black}=(x^T_iw-y_i)x_i 将数据映射到向量g上

Obtain n vectors: \color{blue}g_1,g_2,g_3,...,g_n

Reduce: Compute the sum:g=\sum_{i=1}^n \color{blue}g_i

Every worker sums all the {g_i} stored in its local memory to get a vector

Then, the server sums the resulting m vectors(there are m wokers)

Server updates the parameters: \color{red}{w_{t+1}}\color{black}=\color{red}{w_t}\color{black}-\alpha*g

一直重复直到算法收敛

Every worker stores \color{red}\frac {1}{m} of the data

Every worker does \color{red}\frac {1}{m} of the computation

Is the runtime reduce to \color{red}\frac {1}{m} ?

No. Because communication and synchronization must be considered

Communication Cost 通信开销

Communication complexity通信复杂度:How many words are transmitted between server and workers

Proportional to number of parameters

Grow eith number of worker nodes

Latency: How much time it takes for a packet of data to get from onepoint to another.(Determined by the compute network.)

Communication time: complexity/bandwith + latency

Synchronization Cost 同步开销:

Question: What if a node fails and then restart?

This node will be much slower than all the others.

It is called straggler.

Straggler effect:

The wall-clock time is determined by the slowest node.

It is a consequence of synchronization.

总结 Recap:

Gradient descent can be implemented using MapReduce.

Data parallelism: Data are partitioned among the workers

One gradient descent step requires a broadcast, a map, and areduce.

Cost: computation, communication, and synchronization.

Using m workers, the speedup ratio is lower than m

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值