mapred.tasktracker.map.tasks.maximum 和 cpu核数的关系

The number of tasks that can be run simultaneously on a tasktracker is related to the
number of processors available on the machine. Because MapReduce jobs are normally
I/O-bound,  it  makes  sense  to  have  more  tasks  than  processors  to  get  better
utilization. The amount of oversubscription depends on the CPU utilization of jobs
you run, but a good rule of thumb is to have a factor of between one and two more
tasks (counting both map and reduce tasks) than processors.
For example, if you had 8 processors and you wanted to run 2 processes on each pro-cessor,  then  you  could  set  each  of mapred.tasktracker.map.tasks.maximum  and
mapred.tasktracker.reduce.tasks.maximum to 7 (not 8, since the datanode and the
tasktracker each take one slot). If you also increased the memory available to each child
task to 400 MB, then the total memory usage would be 7,600 MB

 

                                                                    --《Hadoop: The Definitive Guide》

 

mapred.tasktracker.map.tasks.maximum 一般跟物理核数有关

 

如果有8个核,最好有16个线程或者进程,这样,每两个进程竞争一个核,系统CPU利用率最高
然后,每个tasktracker还有一些其他服务线程(Hadoop自带的),需要为这些进程预留1~2个核比较好

所以,如果有8个核,可以有2*8-2=14个task
这14个task,可按照你自己需求分,如:8个map slot,6个reduce slot
实际上,只考虑核是不行的,还需要考虑内存,磁盘等

如果你有8个核,但只有1G内存,如果跑16个task,则每个task只有64MB内存,你的task能跑起来吗?

以上为请教dongxicheng内容,感谢! http://dongxicheng.org/
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值