文章目录
1. 定位到测试jar包位置
进入Hadoop目录下的share,找到hadoop-mapreduce-examples-2.10.1.jar测试包
# 定位目录
/root/dong/program/hadoop-2.10.1/share/hadoop/mapreduce
# 查看目录 找到hadoop-mapreduce-examples-2.10.1.jar测试包
root@hecs-x-large-2-linux-20200618145835:~/dong/program/hadoop-2.10.1/share/hadoop/mapreduce# ll
total 5256
drwxr-xr-x 6 1000 qa 4096 Sep 14 21:39 ./
drwxr-xr-x 9 1000 qa 4096 Sep 14 21:39 ../
-rw-r--r-- 1 1000 qa 586815 Sep 14 21:39 hadoop-mapreduce-client-app-2.10.1.jar
-rw-r--r-- 1 1000 qa 787989 Sep 14 21:39 hadoop-mapreduce-client-common-2.10.1.jar
-rw-r--r-- 1 1000 qa 1613911 Sep 14 21:39 hadoop-mapreduce-client-core-2.10.1.jar
-rw-r--r-- 1 1000 qa 199675 Sep 14 21:39 hadoop-mapreduce-client-hs-2.10.1.jar
-rw-r--r-- 1 1000 qa 32779 Sep 14 21:39 hadoop-mapreduce-client-hs-plugins-2.10.1.jar
-rw-r--r-- 1 1000 qa 72212 Sep 14 21:39 hadoop-mapreduce-client-jobclient-2.10.1.jar
-rw-r--r-- 1 1000 qa 1652223 Sep 14 21:39 hadoop-mapreduce-client-jobclient-2.10.1-tests.jar
-rw-r--r-- 1 1000 qa 84008 Sep 14 21:39 hadoop-mapreduce-client-shuffle-2.10.1.jar
-rw-r--r-- 1 1000 qa 303324 Sep 14 21:39 hadoop-mapreduce-examples-2.10.1.jar
drwxr-xr-x 2 1000 qa 4096 Sep 14 21:39 jdiff/
drwxr-xr-x 2 1000 qa 4096 Sep 14 21:39 lib/
drwxr-xr-x 2 1000 qa 4096 Sep 14 21:39 lib-examples/
drwxr-xr-x 2 1000 qa 4096 Sep 14 21:39 sources/
2. 运行测试包
# 执行jar包 pi为主类 3 为map任务数量 3为map取样数
# hadoop jar hadoop-mapreduce-examples-2.10.1.jar pi 3 3
Number of Maps = 3
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Starting Job
21/01/16 11:03:45 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
21/01/16 11:03:46 INFO input.FileInputFormat: Total input files to process : 3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: number of splits:3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1610510670587_0001
21/01/16 11:03:47 INFO conf.Configuration: resource-types.xml not found
21/01/16 11:03:47 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/01/16 11:03:47 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
21/01/16 11:03:47 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
21/01/16 11:03:47 INFO impl.YarnClientImpl: Submitted application application_1610510670587_0001
21/01/16 11:03:47 INFO mapreduce.Job: The url to track the job: http://localhost.vm:8088/proxy/application_1610510670587_0001/
21/01/16 11:03:47 INFO mapreduce.Job: Running job: job_1610510670587_0001
21/01/16 11:03:55 INFO mapreduce.Job: Job job_1610510670587_0001 running in uber mode : false
21/01/16 11:03:55 INFO mapreduce.Job: map 0% reduce 0%
21/01/16 11:04:03 INFO mapreduce.Job: map 100% reduce 0%
21/01/16 11:04:15 INFO mapreduce.Job: map 100% reduce 100%
21/01/16 11:04:16 INFO mapreduce.Job: Job job_1610510670587_0001 completed successfully
21/01/16 11:04:17 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=72
FILE: Number of bytes written=835625
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=792
HDFS: Number of bytes written=215
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=3
Launched reduce tasks=1
Data-local map tasks=3
Total time spent by all maps in occupied slots (ms)=17266
Total time spent by all reduces in occupied slots (ms)=8882
Total time spent by all map tasks (ms)=17266
Total time spent by all reduce tasks (ms)=8882
Total vcore-milliseconds taken by all map tasks=17266
Total vcore-milliseconds taken by all reduce tasks=8882
Total megabyte-milliseconds taken by all map tasks=17680384
Total megabyte-milliseconds taken by all reduce tasks=9095168
Map-Reduce Framework
Map input records=3
Map output records=6
Map output bytes=54
Map output materialized bytes=84
Input split bytes=438
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=84
Reduce input records=6
Reduce output records=0
Spilled Records=12
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=486
CPU time spent (ms)=2020
Physical memory (bytes) snapshot=1026621440
Virtual memory (bytes) snapshot=7678922752
Total committed heap usage (bytes)=701497344
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=354
File Output Format Counters
Bytes Written=97
Job Finished in 31.276 seconds
Estimated value of Pi is 3.5555555555555555555
3. 发现
得到Hadoop能干什么,先执行一个正常的Hadoop mr例子,从中发现什么?
1. 发现 map任务数可自定义
命令指定的任务数3,和抽样数3
发现可以根据需求,自定义指定map数量
Number of Maps = 3
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
2. 发现 提交任务后处理过程
任务启动后,
第一步:代理客户端连接ResourceManager
第二步:FileInputFormat指定由三个input files进程
第三步:JobSubmitter 提交后有三个split
第四步:创建job序号,并提交job_1610510670587_0001
Starting Job
21/01/16 11:03:45 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
21/01/16 11:03:46 INFO input.FileInputFormat: Total input files to process : 3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: number of splits:3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1610510670587_0001
3.发现 任务执行流程
任务执行流程
第一步:yarn客户端进行提交应用
第二步:mapreduce根据url进行处理应用
第三步:mr运行job_1610510670587_0001
第四步:发现map和reduce处于传递执行(一方处理完后,传递到下一方),没有同时执行任务
21/01/16 11:03:47 INFO impl.YarnClientImpl: Submitted application application_1610510670587_0001
21/01/16 11:03:47 INFO mapreduce.Job: The url to track the job: http://localhost.vm:8088/proxy/application_1610510670587_0001/
21/01/16 11:03:47 INFO mapreduce.Job: Running job: job_1610510670587_0001
21/01/16 11:03:55 INFO mapreduce.Job: Job job_1610510670587_0001 running in uber mode : false
21/01/16 11:03:55 INFO mapreduce.Job: map 0% reduce 0%
21/01/16 11:04:03 INFO mapreduce.Job: map 100% reduce 0%
21/01/16 11:04:15 INFO mapreduce.Job: map 100% reduce 100%
21/01/16 11:04:16 INFO mapreduce.Job: Job job_1610510670587_0001 completed successfully
4. 发现 整个任务从开始到结束有哪些组件参与
1. file system
2. job
3. Map-Reduce
4. Shuffle
5. File Input
6. File output
21/01/16 11:04:17 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=72
FILE: Number of bytes written=835625
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=792
HDFS: Number of bytes written=215
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=3
Launched reduce tasks=1
Data-local map tasks=3
Total time spent by all maps in occupied slots (ms)=17266
Total time spent by all reduces in occupied slots (ms)=8882
Total time spent by all map tasks (ms)=17266
Total time spent by all reduce tasks (ms)=8882
Total vcore-milliseconds taken by all map tasks=17266
Total vcore-milliseconds taken by all reduce tasks=8882
Total megabyte-milliseconds taken by all map tasks=17680384
Total megabyte-milliseconds taken by all reduce tasks=9095168
Map-Reduce Framework
Map input records=3
Map output records=6
Map output bytes=54
Map output materialized bytes=84
Input split bytes=438
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=84
Reduce input records=6
Reduce output records=0
Spilled Records=12
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=486
CPU time spent (ms)=2020
Physical memory (bytes) snapshot=1026621440
Virtual memory (bytes) snapshot=7678922752
Total committed heap usage (bytes)=701497344
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=354
File Output Format Counters
Bytes Written=97
Job Finished in 31.276 seconds