Reining in the Outliers in Map-Reduce Clusters using Mantri

最新推荐文章于 2024-10-05 11:27:00 发布

weixin_33777877

最新推荐文章于 2024-10-05 11:27:00 发布

阅读量101

点赞数

文章标签：大数据

原文链接：http://blog.51cto.com/daisy8867/719826

版权

From OSDI'10

Summary:

There has been more and more work focusing on large scale data parallel computing.

This one is the first to characterize the prevalence of stragglers in production and their various causes. By understanding the causes -- i) machine characteristics - both hardware reliability as well as run-time contention for processor, mem. and other resouces; ii) network characteristics with varying bandwidths and congestion along paths; iii) imbalance in workload among tasks, addressing stragglers early and scheduling duplicates only when there is a fair chance that the speculation saves both time and resources, Mantri greatly reduce the job completion time while using fewer resources than prior strategies that duplicate tasks towards the end of a phase.

Related works mentioned in the paper that worth reading:

Dryad: which investigates programming models for writing parallel and distributed programs to scale from a small cluster to a large data-center. (By Microsoft Research)

LATE(Longest Approximate Time to End), which is highly robust to heterogeneity.

Hadoop, MapReduce

转载于:https://blog.51cto.com/daisy8867/719826