测试Hadoop,执行mapreduce测试程序

1. 定位到测试jar包位置

进入Hadoop目录下的share,找到hadoop-mapreduce-examples-2.10.1.jar测试包

# 定位目录
/root/dong/program/hadoop-2.10.1/share/hadoop/mapreduce

# 查看目录  找到hadoop-mapreduce-examples-2.10.1.jar测试包
root@hecs-x-large-2-linux-20200618145835:~/dong/program/hadoop-2.10.1/share/hadoop/mapreduce# ll
total 5256
drwxr-xr-x 6 1000 qa    4096 Sep 14 21:39 ./
drwxr-xr-x 9 1000 qa    4096 Sep 14 21:39 ../
-rw-r--r-- 1 1000 qa  586815 Sep 14 21:39 hadoop-mapreduce-client-app-2.10.1.jar
-rw-r--r-- 1 1000 qa  787989 Sep 14 21:39 hadoop-mapreduce-client-common-2.10.1.jar
-rw-r--r-- 1 1000 qa 1613911 Sep 14 21:39 hadoop-mapreduce-client-core-2.10.1.jar
-rw-r--r-- 1 1000 qa  199675 Sep 14 21:39 hadoop-mapreduce-client-hs-2.10.1.jar
-rw-r--r-- 1 1000 qa   32779 Sep 14 21:39 hadoop-mapreduce-client-hs-plugins-2.10.1.jar
-rw-r--r-- 1 1000 qa   72212 Sep 14 21:39 hadoop-mapreduce-client-jobclient-2.10.1.jar
-rw-r--r-- 1 1000 qa 1652223 Sep 14 21:39 hadoop-mapreduce-client-jobclient-2.10.1-tests.jar
-rw-r--r-- 1 1000 qa   84008 Sep 14 21:39 hadoop-mapreduce-client-shuffle-2.10.1.jar
-rw-r--r-- 1 1000 qa  303324 Sep 14 21:39 hadoop-mapreduce-examples-2.10.1.jar
drwxr-xr-x 2 1000 qa    4096 Sep 14 21:39 jdiff/
drwxr-xr-x 2 1000 qa    4096 Sep 14 21:39 lib/
drwxr-xr-x 2 1000 qa    4096 Sep 14 21:39 lib-examples/
drwxr-xr-x 2 1000 qa    4096 Sep 14 21:39 sources/

2. 运行测试包

# 执行jar包 pi为主类  3 为map任务数量  3为map取样数
# hadoop jar hadoop-mapreduce-examples-2.10.1.jar pi 3 3
Number of Maps  = 3
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Starting Job
21/01/16 11:03:45 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
21/01/16 11:03:46 INFO input.FileInputFormat: Total input files to process : 3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: number of splits:3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1610510670587_0001
21/01/16 11:03:47 INFO conf.Configuration: resource-types.xml not found
21/01/16 11:03:47 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/01/16 11:03:47 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
21/01/16 11:03:47 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
21/01/16 11:03:47 INFO impl.YarnClientImpl: Submitted application application_1610510670587_0001
21/01/16 11:03:47 INFO mapreduce.Job: The url to track the job: http://localhost.vm:8088/proxy/application_1610510670587_0001/
21/01/16 11:03:47 INFO mapreduce.Job: Running job: job_1610510670587_0001
21/01/16 11:03:55 INFO mapreduce.Job: Job job_1610510670587_0001 running in uber mode : false
21/01/16 11:03:55 INFO mapreduce.Job:  map 0% reduce 0%
21/01/16 11:04:03 INFO mapreduce.Job:  map 100% reduce 0%
21/01/16 11:04:15 INFO mapreduce.Job:  map 100% reduce 100%
21/01/16 11:04:16 INFO mapreduce.Job: Job job_1610510670587_0001 completed successfully
21/01/16 11:04:17 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=72
                FILE: Number of bytes written=835625
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=792
                HDFS: Number of bytes written=215
                HDFS: Number of read operations=15
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
        Job Counters 
                Launched map tasks=3
                Launched reduce tasks=1
                Data-local map tasks=3
                Total time spent by all maps in occupied slots (ms)=17266
                Total time spent by all reduces in occupied slots (ms)=8882
                Total time spent by all map tasks (ms)=17266
                Total time spent by all reduce tasks (ms)=8882
                Total vcore-milliseconds taken by all map tasks=17266
                Total vcore-milliseconds taken by all reduce tasks=8882
                Total megabyte-milliseconds taken by all map tasks=17680384
                Total megabyte-milliseconds taken by all reduce tasks=9095168
        Map-Reduce Framework
                Map input records=3
                Map output records=6
                Map output bytes=54
                Map output materialized bytes=84
                Input split bytes=438
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=84
                Reduce input records=6
                Reduce output records=0
                Spilled Records=12
                Shuffled Maps =3
                Failed Shuffles=0
                Merged Map outputs=3
                GC time elapsed (ms)=486
                CPU time spent (ms)=2020
                Physical memory (bytes) snapshot=1026621440
                Virtual memory (bytes) snapshot=7678922752
                Total committed heap usage (bytes)=701497344
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=354
        File Output Format Counters 
                Bytes Written=97
Job Finished in 31.276 seconds
Estimated value of Pi is 3.5555555555555555555

3. 发现

得到Hadoop能干什么,先执行一个正常的Hadoop mr例子,从中发现什么?

1. 发现 map任务数可自定义

命令指定的任务数3,和抽样数3
发现可以根据需求,自定义指定map数量

Number of Maps  = 3
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2

2. 发现 提交任务后处理过程

任务启动后,
第一步:代理客户端连接ResourceManager
第二步:FileInputFormat指定由三个input files进程
第三步:JobSubmitter 提交后有三个split
第四步:创建job序号,并提交job_1610510670587_0001

Starting Job
21/01/16 11:03:45 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
21/01/16 11:03:46 INFO input.FileInputFormat: Total input files to process : 3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: number of splits:3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1610510670587_0001

3.发现 任务执行流程

任务执行流程
第一步:yarn客户端进行提交应用
第二步:mapreduce根据url进行处理应用
第三步:mr运行job_1610510670587_0001
第四步:发现map和reduce处于传递执行(一方处理完后,传递到下一方),没有同时执行任务

21/01/16 11:03:47 INFO impl.YarnClientImpl: Submitted application application_1610510670587_0001
21/01/16 11:03:47 INFO mapreduce.Job: The url to track the job: http://localhost.vm:8088/proxy/application_1610510670587_0001/
21/01/16 11:03:47 INFO mapreduce.Job: Running job: job_1610510670587_0001
21/01/16 11:03:55 INFO mapreduce.Job: Job job_1610510670587_0001 running in uber mode : false
21/01/16 11:03:55 INFO mapreduce.Job:  map 0% reduce 0%
21/01/16 11:04:03 INFO mapreduce.Job:  map 100% reduce 0%
21/01/16 11:04:15 INFO mapreduce.Job:  map 100% reduce 100%
21/01/16 11:04:16 INFO mapreduce.Job: Job job_1610510670587_0001 completed successfully

4. 发现 整个任务从开始到结束有哪些组件参与

1. file system

2. job

3. Map-Reduce

4. Shuffle

5. File Input

6. File output

21/01/16 11:04:17 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=72
                FILE: Number of bytes written=835625
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=792
                HDFS: Number of bytes written=215
                HDFS: Number of read operations=15
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
        Job Counters 
                Launched map tasks=3
                Launched reduce tasks=1
                Data-local map tasks=3
                Total time spent by all maps in occupied slots (ms)=17266
                Total time spent by all reduces in occupied slots (ms)=8882
                Total time spent by all map tasks (ms)=17266
                Total time spent by all reduce tasks (ms)=8882
                Total vcore-milliseconds taken by all map tasks=17266
                Total vcore-milliseconds taken by all reduce tasks=8882
                Total megabyte-milliseconds taken by all map tasks=17680384
                Total megabyte-milliseconds taken by all reduce tasks=9095168
        Map-Reduce Framework
                Map input records=3
                Map output records=6
                Map output bytes=54
                Map output materialized bytes=84
                Input split bytes=438
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=84
                Reduce input records=6
                Reduce output records=0
                Spilled Records=12
                Shuffled Maps =3
                Failed Shuffles=0
                Merged Map outputs=3
                GC time elapsed (ms)=486
                CPU time spent (ms)=2020
                Physical memory (bytes) snapshot=1026621440
                Virtual memory (bytes) snapshot=7678922752
                Total committed heap usage (bytes)=701497344
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=354
        File Output Format Counters 
                Bytes Written=97
Job Finished in 31.276 seconds
  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
要使用Hadoop进行测试,你可以按照以下步骤进行操作: 1. 安装Hadoop:首先,你需要在你的系统上安装Hadoop。你可以从Apache官方网站上下载Hadoop的最新版本,并按照它们提供的安装指南进行安装。 2. 配置Hadoop:一旦安装完成,你需要配置Hadoop以适应你的测试环境。这包括设置Hadoop集群的节点、端口和其他参数。你可以编辑Hadoop的配置文件,如`hadoop-env.sh`和`core-site.xml`来进行配置。 3. 准备测试数据:在进行测试之前,你需要准备一些测试数据。这些数据可以是文本文件、日志文件、图像等。确保你的数据存储在Hadoop分布式文件系统(HDFS)中,这样Hadoop才能对其进行处理。 4. 编写MapReduce程序:Hadoop使用MapReduce模型来处理和分析大规模数据。你需要编写MapReduce程序来定义你的数据处理逻辑。这包括编写Mapper和Reducer函数,并指定它们的输入和输出格式。 5. 打包和部署程序:将你编写的MapReduce程序打包成一个JAR文件,并将其部署到Hadoop集群上。你可以使用Hadoop提供的命令行工具或Hadoop API来完成这个步骤。 6. 运行测试:一切准备就绪后,你可以使用Hadoop提供的命令行工具或API来运行你的测试程序Hadoop会将你的程序分发到集群的各个节点上,并根据你的指示执行MapReduce任务。 7. 检查结果:当你的测试程序运行完成后,你可以检查Hadoop的输出结果。这可能包括生成的文件、日志或其他形式的输出。确保结果与你预期的一致,并根据需要进行调试和优化。 这些是使用Hadoop进行测试的基本步骤。根据你的具体需求和环境,可能还需要进行一些额外的配置和操作。建议参考Hadoop的官方文档和其他相关资源,以获取更详细的指导和帮助。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值