Hadoop性能调优不仅是自身的调优,还应包括底层硬件、操作系统等。下面逐一介绍:
1、底层硬件
Hadoop采用的是master/slave的架构,master(resourcemanager或namenode)要维护元数据信息、调度等,任务量及重要性远大于slave,因此尽量将master高配置。
2、操作系统
1)增大最大文件描述符的数量和网络连接上限(作用明显)
当任务较多时,OS内核受到这两方面的限制。
ulimit – n 2000;限制最大可以使用 2000 个文件描述符。我的系统是1024
sysctl -a#会显示所有的kernel参数及值。
sysctl -w net.core.somaxconn=500 #默认为125,应于集群的ipc.server.listen.queue.size一致
3、Hadoop(2.5.1版本)
mapred-default.xml:
1)tasktracker并发任务数
建议:map+reduce+1==num_cpu_cores
mapreduce.tasktracker.map.tasks.maximum | 2 | The maximum number of map tasks that will be run simultaneously by a task tracker. |
mapreduce.tasktracker.reduce.tasks.maximum | 2 | The maximum number of reduce tasks that will be run simultaneously by a task tracker. |
2)调整心跳间隔,值可改为300。
yarn.app.mapreduce.am.scheduler.heartbeat.interval-ms | 1000 | The interval in ms at which the MR AppMaster should send heartbeats to the ResourceManager |
mapreduce.tasktracker.outofband.heartbeat | false | Expert: Set this to true to let the tasktracker send an out-of-band heartbeat on task-completion for better latency. |
mapreduce.cluster.local.dir | ${hadoop.tmp.dir}/mapred/local | The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. |
mapreduce.jobtracker.handler.count | 10 | The number of server threads for the JobTracker. This should be roughly 4% of the number of tasktracker nodes. |
在shuffle阶段,reduce task通过http请求从各个tasktracker上读取map task中间结果。
mapreduce.tasktracker.http.threads | 40 | The number of worker threads that for the http server. This is used for map output fetching |
mapreduce.ifile.readahead | true | Configuration key to enable/disable IFile readahead. |
mapreduce.ifile.readahead.bytes | 4194304 | Configuration key to set the IFile readahead length in bytes. |
当集群的资源紧张时,应提高该值。
mapreduce.job.reduce.slowstart.completedmaps | 0.05 | Fraction of the number of maps in the job which should be complete before reduces are scheduled for the job. |