Yarn的配置与启动

按照http://www.powerxing.com/install-hadoop/这篇博客的教程,接下来继续进行Yarn的配置与运行。
首先修改配置文件 mapred-site.xml,需要先进行重命名:

mv ./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml

然后再进行编辑,同样使用 gedit 编辑会比较方便些 gedit ./etc/hadoop/mapred-site.xml

<configuration>
        <property>
             <name>mapreduce.framework.name</name>
             <value>yarn</value>
        </property>
</configuration>

接着修改配置文件 yarn-site.xml:

<configuration>
 <property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
 </property>
 <property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
</configuration>

注意:原文中的yarn-site.xml配置中只有一项

<configuration>
 <property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
 </property>

我按这个做了以后,在运行MapReduce任务的时候卡住了。后来改成以上两项,顺利运行了。

####后来经过测试,发现只配置一项也可以####

可以启动 YARN 了(需要先执行过 ./sbin/start-dfs.sh):

./sbin/start-yarn.sh      # 启动YARN
./sbin/mr-jobhistory-daemon.sh start historyserver  # 开启历史服务器,才能在Web中查看任务运行情况

开启后通过 jps 查看,可以看到多了 NodeManager 和 ResourceManager 两个后台进程,如下图所示。
这里写图片描述

启动 YARN 之后,运行实例的方法还是一样的,仅仅是资源管理方式、任务调度不同。观察日志信息可以发现,不启用 YARN 时,是 “mapred.LocalJobRunner” 在跑任务,启用 YARN 之后,是 “mapred.YARNRunner” 在跑任务。启动 YARN 有个好处是可以通过 Web 界面查看任务的运行情况:http://localhost:8088/cluster,如下图所示。
这里写图片描述

Yarn运行时MapReduce任务的执行情况:

hadoop@hadoop-virtual-machine:/usr/local/hadoop$ jps
11028 DataNode
10867 NameNode
14721 Jps
14685 JobHistoryServer
11233 SecondaryNameNode
14250 NodeManager
14409 ResourceManager
hadoop@hadoop-virtual-machine:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
17/04/11 09:43:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/11 09:43:34 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/11 09:43:35 INFO input.FileInputFormat: Total input paths to process : 8
17/04/11 09:43:35 INFO mapreduce.JobSubmitter: number of splits:8
17/04/11 09:43:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491874816300_0001
17/04/11 09:43:37 INFO impl.YarnClientImpl: Submitted application application_1491874816300_0001
17/04/11 09:43:37 INFO mapreduce.Job: The url to track the job: http://hadoop-virtual-machine:8088/proxy/application_1491874816300_0001/
17/04/11 09:43:37 INFO mapreduce.Job: Running job: job_1491874816300_0001
17/04/11 09:43:53 INFO mapreduce.Job: Job job_1491874816300_0001 running in uber mode : false
17/04/11 09:43:53 INFO mapreduce.Job:  map 0% reduce 0%
17/04/11 09:46:03 INFO mapreduce.Job:  map 13% reduce 0%
17/04/11 09:46:06 INFO mapreduce.Job:  map 25% reduce 0%
17/04/11 09:46:11 INFO mapreduce.Job:  map 50% reduce 0%
17/04/11 09:46:12 INFO mapreduce.Job:  map 75% reduce 0%
17/04/11 09:46:44 INFO mapreduce.Job:  map 88% reduce 0%
17/04/11 09:46:45 INFO mapreduce.Job:  map 100% reduce 0%
17/04/11 09:46:47 INFO mapreduce.Job:  map 100% reduce 100%
17/04/11 09:46:47 INFO mapreduce.Job: Job job_1491874816300_0001 completed successfully
17/04/11 09:46:47 INFO mapreduce.Job: Counters: 50
    File System Counters
        FILE: Number of bytes read=115
        FILE: Number of bytes written=1073427
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=27718
        HDFS: Number of bytes written=219
        HDFS: Number of read operations=27
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Killed map tasks=1
        Launched map tasks=9
        Launched reduce tasks=1
        Data-local map tasks=9
        Total time spent by all maps in occupied slots (ms)=932519
        Total time spent by all reduces in occupied slots (ms)=18392
        Total time spent by all map tasks (ms)=932519
        Total time spent by all reduce tasks (ms)=18392
        Total vcore-milliseconds taken by all map tasks=932519
        Total vcore-milliseconds taken by all reduce tasks=18392
        Total megabyte-milliseconds taken by all map tasks=954899456
        Total megabyte-milliseconds taken by all reduce tasks=18833408
    Map-Reduce Framework
        Map input records=765
        Map output records=4
        Map output bytes=101
        Map output materialized bytes=157
        Input split bytes=957
        Combine input records=4
        Combine output records=4
        Reduce input groups=4
        Reduce shuffle bytes=157
        Reduce input records=4
        Reduce output records=4
        Spilled Records=8
        Shuffled Maps =8
        Failed Shuffles=0
        Merged Map outputs=8
        GC time elapsed (ms)=14455
        CPU time spent (ms)=84430
        Physical memory (bytes) snapshot=1797099520
        Virtual memory (bytes) snapshot=7518842880
        Total committed heap usage (bytes)=1612709888
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=26761
    File Output Format Counters 
        Bytes Written=219
17/04/11 09:46:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/11 09:46:48 INFO input.FileInputFormat: Total input paths to process : 1
17/04/11 09:46:48 INFO mapreduce.JobSubmitter: number of splits:1
17/04/11 09:46:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491874816300_0002
17/04/11 09:46:48 INFO impl.YarnClientImpl: Submitted application application_1491874816300_0002
17/04/11 09:46:48 INFO mapreduce.Job: The url to track the job: http://hadoop-virtual-machine:8088/proxy/application_1491874816300_0002/
17/04/11 09:46:48 INFO mapreduce.Job: Running job: job_1491874816300_0002
17/04/11 09:47:04 INFO mapreduce.Job: Job job_1491874816300_0002 running in uber mode : false
17/04/11 09:47:04 INFO mapreduce.Job:  map 0% reduce 0%
17/04/11 09:47:13 INFO mapreduce.Job:  map 100% reduce 0%
17/04/11 09:47:25 INFO mapreduce.Job:  map 100% reduce 100%
17/04/11 09:47:25 INFO mapreduce.Job: Job job_1491874816300_0002 completed successfully
17/04/11 09:47:25 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=115
        FILE: Number of bytes written=237641
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=350
        HDFS: Number of bytes written=77
        HDFS: Number of read operations=7
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=6886
        Total time spent by all reduces in occupied slots (ms)=7536
        Total time spent by all map tasks (ms)=6886
        Total time spent by all reduce tasks (ms)=7536
        Total vcore-milliseconds taken by all map tasks=6886
        Total vcore-milliseconds taken by all reduce tasks=7536
        Total megabyte-milliseconds taken by all map tasks=7051264
        Total megabyte-milliseconds taken by all reduce tasks=7716864
    Map-Reduce Framework
        Map input records=4
        Map output records=4
        Map output bytes=101
        Map output materialized bytes=115
        Input split bytes=131
        Combine input records=0
        Combine output records=0
        Reduce input groups=1
        Reduce shuffle bytes=115
        Reduce input records=4
        Reduce output records=4
        Spilled Records=8
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=77
        CPU time spent (ms)=2800
        Physical memory (bytes) snapshot=446259200
        Virtual memory (bytes) snapshot=1683599360
        Total committed heap usage (bytes)=276824064
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=219
    File Output Format Counters 
        Bytes Written=77

然后查看执行结果:

hadoop@hadoop-virtual-machine:/usr/local/hadoop$ ./bin/hdfs dfs -cat output/*
17/04/11 09:53:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1   dfsadmin
1   dfs.replication
1   dfs.namenode.name.dir
1   dfs.datanode.data.dir
hadoop@hadoop-virtual-machine:/usr/local/hadoop$ 

如果不想启动 YARN,务必把配置文件 mapred-site.xml 重命名,改成 mapred-site.xml.template,需要用时改回来就行。否则在该配置文件存在,而未开启 YARN 的情况下,运行程序会提示 “Retrying connect to server: 0.0.0.0/0.0.0.0:8032” 的错误,这也是为何该配置文件初始文件名为 mapred-site.xml.template

小结:经过不断的配置测试,Hadoop在Linux下的安装、测试终于成功实现了。中间也遇到了一些问题,从解决问题的过程也学到了不少东西。
下一步,准备在Hadoop框架下自己测试一些算法。

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值