http://www.blogjava.net/paulwong/archive/2013/01/31/394997.html
采用Cloudera版本的hadoop/hbase:
hadoop-0.20.2-cdh3u0
hbase-0.90.1-cdh3u0
zookeeper-3.3.3-cdh3u0
默认已支持FairScheduler调度算法.
只需改配置使期用FairSchedule而非默认的JobQueueTaskScheduler即可.
配置fair-scheduler.xml (/$HADOOP_HOME/conf/):
<?
xml version="1.0"
?>
< property >
< name >mapred.fairscheduler.allocation.file </ name >
< value >[HADOOP_HOME]/conf/fair-scheduler.xml </ value >
</ property >
< allocations >
< pool name ="qiji-task-pool" >
< minMaps >5 </ minMaps >
< minReduces >5 </ minReduces >
< maxRunningJobs >
< maxRunningJobs >5 </ maxRunningJobs >
< minSharePreemptionTimeout >300 </ minSharePreemptionTimeout >
< weight >1.0 </ weight >
</ pool >
< user name ="ecap" >
< maxRunningJobs >
< maxRunningJobs >6 </ maxRunningJobs >
</ user >
< poolMaxJobsDefault >10 </ poolMaxJobsDefault >
< userMaxJobsDefault >8 </ userMaxJobsDefault >
< defaultMinSharePreemptionTimeout >600
</ defaultMinSharePreemptionTimeout >
< fairSharePreemptionTimeout >600 </ fairSharePreemptionTimeout >
</ allocations >
< property >
< name >mapred.fairscheduler.allocation.file </ name >
< value >[HADOOP_HOME]/conf/fair-scheduler.xml </ value >
</ property >
< allocations >
< pool name ="qiji-task-pool" >
< minMaps >5 </ minMaps >
< minReduces >5 </ minReduces >
< maxRunningJobs >
< maxRunningJobs >5 </ maxRunningJobs >
< minSharePreemptionTimeout >300 </ minSharePreemptionTimeout >
< weight >1.0 </ weight >
</ pool >
< user name ="ecap" >
< maxRunningJobs >
< maxRunningJobs >6 </ maxRunningJobs >
</ user >
< poolMaxJobsDefault >10 </ poolMaxJobsDefault >
< userMaxJobsDefault >8 </ userMaxJobsDefault >
< defaultMinSharePreemptionTimeout >600
</ defaultMinSharePreemptionTimeout >
< fairSharePreemptionTimeout >600 </ fairSharePreemptionTimeout >
</ allocations >
配置$HADOOP_HOME/conf/mapred-site.xml,最后添加:
<
property
>
< name >mapred.jobtracker.taskScheduler </ name >
< value >org.apache.hadoop.mapred.FairScheduler </ value >
</ property >
< property >
< name >mapred.fairscheduler.allocation.file </ name >
< value >/opt/hadoop/conf/fair-scheduler.xml </ value >
</ property >
< property >
< name >mapred.fairscheduler.assignmultiple </ name >
< value >true </ value >
</ property >
< property >
< name >mapred.fairscheduler.sizebasedweight </ name >
< value >true </ value >
</ property >
< name >mapred.jobtracker.taskScheduler </ name >
< value >org.apache.hadoop.mapred.FairScheduler </ value >
</ property >
< property >
< name >mapred.fairscheduler.allocation.file </ name >
< value >/opt/hadoop/conf/fair-scheduler.xml </ value >
</ property >
< property >
< name >mapred.fairscheduler.assignmultiple </ name >
< value >true </ value >
</ property >
< property >
< name >mapred.fairscheduler.sizebasedweight </ name >
< value >true </ value >
</ property >
然后重新运行集群,这样有几个Job(上面配置是5个并行)并行运行时,不会因为一个Job把Map/Reduce占满而使其它Job处于Pending状态.
可从: http://<masterip>:50030/scheduler查看并行运行的状态.
posted on 2013-01-31 17:30 paulwong 阅读(990) 评论(1) 编辑 收藏 所属分类: HADOOP 、云计算
Feedback
# re: 配置Hadoop M/R 采用Fair Scheduler算法代替FIFO2013-05-17 10:07Christopher
楼主你好,最近我也在配置hadoop的fair scheduler, 但是遇到了一些问题。首先我使用的是cloudera-cdh-demo-vm-4.2.0-kvm。当我配置$HADOOP_HOME/conf/mapred-site.xml时,指定pool的分配文件,
<property>
<name>mapred.fairscheduler.allocation.file</name>
<value>/usr/lib/hadoop-0.20-mapreduce/conf/fair-scheduler.xml</value>
</property>
重启cluster之后,无法登录http://<masterip>:50030/scheduler查看并行运行的状态.