一、通过Eclipse下本地运行
17507 DataNode
2721
22413 ResourceManager
可以参考 【hadoop】 3002-mapreduce程序统计单词个数示例 章节的演示
二、集群方式通过jar包形式运行
1、处理数据的作业达成jar包并上传hdfs
[hadoop@cloud01 HDFSdemo]$ pwd
/home/hadoop/workspace/HDFSdemo
[hadoop@cloud01 HDFSdemo]$ ll
total 139844
drwxrwxr-x. 5 hadoop hadoop 4096 Feb 24 18:10 bin
-rw-rw-r--. 1 hadoop hadoop 440 Feb 20 06:56 core-site.xml
-rw-rw-r--. 1 hadoop hadoop 256 Feb 20 06:56 hdfs-site.xml
drwxrwxr-x. 2 hadoop hadoop 4096 Feb 20 06:34 lib
-rw-rw-r--. 1 hadoop hadoop 253 Feb 20 06:56 mapred-site.xml
drwxrwxr-x. 5 hadoop hadoop 4096 Feb 24 18:10 src
-rw-rw-r--. 1 hadoop hadoop 143167974 Feb 24 21:41 wc.jar
-rw-rw-r--. 1 hadoop hadoop 434 Feb 20 06:56 yarn-site.xml
/home/hadoop/workspace/HDFSdemo
[hadoop@cloud01 HDFSdemo]$ ll
total 139844
drwxrwxr-x. 5 hadoop hadoop 4096 Feb 24 18:10 bin
-rw-rw-r--. 1 hadoop hadoop 440 Feb 20 06:56 core-site.xml
-rw-rw-r--. 1 hadoop hadoop 256 Feb 20 06:56 hdfs-site.xml
drwxrwxr-x. 2 hadoop hadoop 4096 Feb 20 06:34 lib
-rw-rw-r--. 1 hadoop hadoop 253 Feb 20 06:56 mapred-site.xml
drwxrwxr-x. 5 hadoop hadoop 4096 Feb 24 18:10 src
-rw-rw-r--. 1 hadoop hadoop 143167974 Feb 24 21:41 wc.jar
-rw-rw-r--. 1 hadoop hadoop 434 Feb 20 06:56 yarn-site.xml
2、启动yarn,执行start-yarn.sh 命令
[hadoop@cloud01 HDFSdemo]$ start-yarn.sh
[hadoop@cloud01 HDFSdemo]$ jps
22901 Jps
17507 DataNode
22510
NodeManager
17414 NameNode
2721
22413 ResourceManager
3、分布式执行wc.jar
[hadoop@cloud01 ~]$ hadoop jar workspace/HDFSdemo/wc.jar mapreduce.WordCount
3.1 执行过程日志情况
-- 连接ResourceManager: client.RMProxy: Connecting to ResourceManager
-- 获取分片,每个分片对应一个Map任务:input.FileInputFormat: Total input paths to process : 1
--生成本次运行的job编码:mapreduce.JobSubmitter: Submitting tokens for job: job_1424843731958_0002
--运行要执行的jar文件:mapreduce.Job: Running job: job_1424843731958_0002
--显示map和reduce执行进度
15/02/24 22:09:30 INFO mapreduce.Job: map 0% reduce 0%
15/02/24 22:09:39 INFO mapreduce.Job: map 100% reduce 0%
15/02/24 22:09:52 INFO mapreduce.Job: map 100% reduce 100%
15/02/24 22:09:53 INFO mapreduce.Job: Job job_1424843731958_0002 completed successfully
15/02/24 22:09:39 INFO mapreduce.Job: map 100% reduce 0%
15/02/24 22:09:52 INFO mapreduce.Job: map 100% reduce 100%
15/02/24 22:09:53 INFO mapreduce.Job: Job job_1424843731958_0002 completed successfully
3.2 MR整个过程的进程变化情况
ResourceManage,NodeManager->RunJar->MRAppMaster->YarnChild
随着MR程序进度的执行,响应的进程也随着退出,退出的顺序为
YarnChild->MRAppMaster->RunJar
3.3 图形方式给出对应的处理流程
图1
图2
常见问题
1、INFO ipc.Client: Retrying connect to server: cloud01/192.168.2.31:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
该问题是因为yarn没有启动,需要执行start-yarn.sh