按照http://www.powerxing.com/install-hadoop/这篇博客的教程,接下来继续进行Yarn的配置与运行。
首先修改配置文件 mapred-site.xml,需要先进行重命名:
mv ./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml
然后再进行编辑,同样使用 gedit 编辑会比较方便些 gedit ./etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
接着修改配置文件 yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
注意:原文中的yarn-site.xml配置中只有一项
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
我按这个做了以后,在运行MapReduce任务的时候卡住了。后来改成以上两项,顺利运行了。
####后来经过测试,发现只配置一项也可以####
可以启动 YARN 了(需要先执行过 ./sbin/start-dfs.sh):
./sbin/start-yarn.sh # 启动YARN
./sbin/mr-jobhistory-daemon.sh start historyserver # 开启历史服务器,才能在Web中查看任务运行情况
开启后通过 jps 查看,可以看到多了 NodeManager 和 ResourceManager 两个后台进程,如下图所示。
启动 YARN 之后,运行实例的方法还是一样的,仅仅是资源管理方式、任务调度不同。观察日志信息可以发现,不启用 YARN 时,是 “mapred.LocalJobRunner” 在跑任务,启用 YARN 之后,是 “mapred.YARNRunner” 在跑任务。启动 YARN 有个好处是可以通过 Web 界面查看任务的运行情况:http://localhost:8088/cluster,如下图所示。
Yarn运行时MapReduce任务的执行情况:
hadoop@hadoop-virtual-machine:/usr/local/hadoop$ jps
11028 DataNode
10867 NameNode
14721 Jps
14685 JobHistoryServer
11233 SecondaryNameNode
14250 NodeManager
14409 ResourceManager
hadoop@hadoop-virtual-machine:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
17/04/11 09:43:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/11 09:43:34 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/11 09:43:35 INFO input.FileInputFormat: Total input paths to process : 8
17/04/11 09:43:35 INFO mapreduce.JobSubmitter: number of splits:8
17/04/11 09:43:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491874816300_0001
17/04/11 09:43:37 INFO impl.YarnClientImpl: Submitted application application_1491874816300_0001
17/04/11 09:43:37 INFO mapreduce.Job: The url to track the job: http://hadoop-virtual-machine:8088/proxy/application_1491874816300_0001/
17/04/11 09:43:37 INFO mapreduce.Job: Running job: job_1491874816300_0001
17/04/11 09:43:53 INFO mapreduce.Job: Job job_1491874816300_0001 running in uber mode : false
17/04/11 09:43:53 INFO mapreduce.Job: map 0% reduce 0%
17/04/11 09:46:03 INFO mapreduce.Job: map 13% reduce 0%
17/04/11 09:46:06 INFO mapreduce.Job: map 25% reduce 0%
17/04/11 09:46:11 INFO mapreduce.Job: map 50% reduce 0%
17/04/11 09:46:12 INFO mapreduce.Job: map 75% reduce 0%
17/04/11 09:46:44 INFO mapreduce.Job: map 88% reduce 0%
17/04/11 09:46:45 INFO mapreduce.Job: map 100% reduce 0%
17/04/11 09:46:47 INFO mapreduce.Job: map 100% reduce 100%
17/04/11 09:46:47 INFO mapreduce.Job: Job job_1491874816300_0001 completed successfully
17/04/11 09:46:47 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=115
FILE: Number of bytes written=1073427
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=27718
HDFS: Number of bytes written=219
HDFS: Number of read operations=27
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=1
Launched map tasks=9
Launched reduce tasks=1
Data-local map tasks=9
Total time spent by all maps in occupied slots (ms)=932519
Total time spent by all reduces in occupied slots (ms)=18392
Total time spent by all map tasks (ms)=932519
Total time spent by all reduce tasks (ms)=18392
Total vcore-milliseconds taken by all map tasks=932519
Total vcore-milliseconds taken by all reduce tasks=18392
Total megabyte-milliseconds taken by all map tasks=954899456
Total megabyte-milliseconds taken by all reduce tasks=18833408
Map-Reduce Framework
Map input records=765
Map output records=4
Map output bytes=101
Map output materialized bytes=157
Input split bytes=957
Combine input records=4
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=157
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =8
Failed Shuffles=0
Merged Map outputs=8
GC time elapsed (ms)=14455
CPU time spent (ms)=84430
Physical memory (bytes) snapshot=1797099520
Virtual memory (bytes) snapshot=7518842880
Total committed heap usage (bytes)=1612709888
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=26761
File Output Format Counters
Bytes Written=219
17/04/11 09:46:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/11 09:46:48 INFO input.FileInputFormat: Total input paths to process : 1
17/04/11 09:46:48 INFO mapreduce.JobSubmitter: number of splits:1
17/04/11 09:46:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491874816300_0002
17/04/11 09:46:48 INFO impl.YarnClientImpl: Submitted application application_1491874816300_0002
17/04/11 09:46:48 INFO mapreduce.Job: The url to track the job: http://hadoop-virtual-machine:8088/proxy/application_1491874816300_0002/
17/04/11 09:46:48 INFO mapreduce.Job: Running job: job_1491874816300_0002
17/04/11 09:47:04 INFO mapreduce.Job: Job job_1491874816300_0002 running in uber mode : false
17/04/11 09:47:04 INFO mapreduce.Job: map 0% reduce 0%
17/04/11 09:47:13 INFO mapreduce.Job: map 100% reduce 0%
17/04/11 09:47:25 INFO mapreduce.Job: map 100% reduce 100%
17/04/11 09:47:25 INFO mapreduce.Job: Job job_1491874816300_0002 completed successfully
17/04/11 09:47:25 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=115
FILE: Number of bytes written=237641
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=350
HDFS: Number of bytes written=77
HDFS: Number of read operations=7
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=6886
Total time spent by all reduces in occupied slots (ms)=7536
Total time spent by all map tasks (ms)=6886
Total time spent by all reduce tasks (ms)=7536
Total vcore-milliseconds taken by all map tasks=6886
Total vcore-milliseconds taken by all reduce tasks=7536
Total megabyte-milliseconds taken by all map tasks=7051264
Total megabyte-milliseconds taken by all reduce tasks=7716864
Map-Reduce Framework
Map input records=4
Map output records=4
Map output bytes=101
Map output materialized bytes=115
Input split bytes=131
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=115
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=77
CPU time spent (ms)=2800
Physical memory (bytes) snapshot=446259200
Virtual memory (bytes) snapshot=1683599360
Total committed heap usage (bytes)=276824064
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=219
File Output Format Counters
Bytes Written=77
然后查看执行结果:
hadoop@hadoop-virtual-machine:/usr/local/hadoop$ ./bin/hdfs dfs -cat output/*
17/04/11 09:53:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1 dfsadmin
1 dfs.replication
1 dfs.namenode.name.dir
1 dfs.datanode.data.dir
hadoop@hadoop-virtual-machine:/usr/local/hadoop$
如果不想启动 YARN,务必把配置文件 mapred-site.xml 重命名,改成 mapred-site.xml.template,需要用时改回来就行。否则在该配置文件存在,而未开启 YARN 的情况下,运行程序会提示 “Retrying connect to server: 0.0.0.0/0.0.0.0:8032” 的错误,这也是为何该配置文件初始文件名为 mapred-site.xml.template
小结:经过不断的配置测试,Hadoop在Linux下的安装、测试终于成功实现了。中间也遇到了一些问题,从解决问题的过程也学到了不少东西。
下一步,准备在Hadoop框架下自己测试一些算法。