MapReduce的适用场合

 

MapReduce is a good fit for problems

that need to analyze the whole dataset, in a batch fashion, particularly for ad hoc analysis.

 

MapReduce suits applications where the data is written once, and read many

times, whereas a relational database is good for datasets that are continually updated.

 

MapReduce works well on unstructured or semistructured

data, since it is designed to interpret the data at processing time.

 

MapReduce is a linearly scalable programming model.

 

but becomes a problem when nodes need to

access larger data volumes (hundreds of gigabytes, the point at which MapReduce really

starts to shine), since the network bandwidth is the bottleneck, and compute nodes

become idle.

MapReduce tries to colocate the data with the compute node, so data access is fast

since it is local.* This feature, known as data locality, is at the heart of MapReduce and

is the reason for its good performance.

MPI gives great control to the programmer, but requires that he or she explicitly handle

the mechanics of the data flow, exposed via low-level C routines and constructs, such

as sockets, as well as the higher-level algorithm for the analysis. MapReduce operates

only at the higher level: the programmer thinks in terms of functions of key and value

pairs, and the data flow is implicit.

MapReduce is designed to run jobs that last minutes or hours on trusted, dedicated

hardware running in a single data center with very high aggregate bandwidth interconnects.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值