Hadoop2.7.0学习——基准测试
Hadoop 带有一些基准测试程序,可以最少的准备成本轻松运行。基准测试被打包在测试程序JAR文件中,通过无参调用JAR文件可以得到其列表
——《Hadoop权威指南》
查看信息
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.0-tests.ja
执行结果
[root@hadoop-master hadoop-2.7.0]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.0-tests.jar
An example program must be given as the first argument.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
NNdataGenerator: Generate the data to be used by NNloadGenerator
NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
NNstructureGenerator: Generate the structure to be used by NNdataGenerator
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
写入速度测试
测试写入10个10M文件
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.0-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 10MB
[root@hadoop-master hadoop-2.7.0]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.0-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 10MB
16/08/05 10:11:42 INFO fs.TestDFSIO: TestDFSIO.1.8
16/08/05 10:11:42 INFO fs.TestDFSIO: nrFiles = 10
16/08/05 10:11:42 INFO fs.TestDFSIO: nrBytes (MB) = 10.0
16/08/05 10:11:42 INFO fs.TestDFSIO: bufferSize = 1000000
16/08/05 10:11:42 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
16/08/05 10:11:43 INFO fs.TestDFSIO: creating control file: 10485760 bytes, 10 files
16/08/05 10:11:44 INFO fs.TestDFSIO: created control files for: 10 files
16/08/05 10:11:45 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/192.168.20.141:8032
16/08/05 10:11:45 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/192.168.20.141:8032
16/08/05 10:11:46 INFO mapred.FileInputFormat: Total input paths to process : 10
16/08/05 10:11:46 INFO mapreduce.JobSubmitter: number of splits:10
16/08/05 10:11:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470403413050_0001
16/08/05 10:11:46 INFO impl.YarnClientImpl: Submitted application application_1470403413050_0001
16/08/05 10:11:47 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1470403413050_0001/
16/08/05 10:11:47 INFO mapreduce.Job: Running job: job_1470403413050_0001
16/08/05 10:11:57 INFO mapreduce.Job: Job job_1470403413050_0001 running in uber mode : false
16/08/05 10:11:57 INFO mapreduce.Job: map 0% reduce 0%
16/08/05 10:12:27 INFO mapreduce.Job: map 40% reduce 0%
16/08/05 10:12:53 INFO mapreduce.Job: map 47% reduce 13%
16/08/05 10:12:54 INFO mapreduce.Job: map 53% reduce 13%
16/08/05 10:13:03 INFO mapreduce.Job: map 60% reduce 13%
16/08/05 10:13:04 INFO mapreduce.Job: map 67% reduce 13%
16/08/05 10:13:06 INFO mapreduce.Job: map 80% reduce 13%
16/08/05 10:13:08 INFO mapreduce.Job: map 83% reduce 13%
16/08/05 10:13:10 INFO mapreduce.Job: map 90% reduce 13%
16/08/05 10:13:11 INFO mapreduce.Job: map 100% reduce 13%
16/08/05 10:13:12 INFO mapreduce.Job: map 100% reduce 17%
16/08/05 10:13:18 INFO mapreduce.Job: map 100% reduce 20%
16/08/05 10:13:30 INFO mapreduce.Job: map 100% reduce 67%
16/08/05 10:13:36 INFO mapreduce.Job: map 100% reduce 100%
16/08/05 10:13:37 INFO mapreduce.Job: Job job_1470403413050_0001 completed successfully
16/08/05 10:13:37 INFO mapreduce.Job: Counters: 51
File System Counters
FILE: Number of bytes read=840
FILE: Number of bytes written=1269539
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2390
HDFS: Number of bytes written=104857676
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=12
Job Counters
Killed map tasks=2
Launched map tasks=13
Launched reduce tasks=1
Data-local map tasks=11
Rack-local map tasks=2
Total time spent by all maps in occupied slots (ms)=646035
Total time spent by all reduces in occupied slots (ms)=66302
Total time spent by all map tasks (ms)=646035
Total time spent by all reduce tasks (ms)=66302
Total vcore-seconds taken by all map tasks=646035
Total vcore-seconds taken by all reduce tasks=66302
Total megabyte-seconds taken by all map tasks=661539840
Total megabyte-seconds taken by all reduce tasks=67893248
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=734
Map output materialized bytes=894
Input split bytes=1270
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=894
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=8218
CPU time spent (ms)=15840
Physical memory (bytes) snapshot=2184921088
Virtual memory (bytes) snapshot=9295441920
Total committed heap usage (bytes)=1374396416
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=76
16/08/05 10:13:37 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/08/05 10:13:37 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
16/08/05 10:13:37 INFO fs.TestDFSIO: Date & time: Fri Aug 05 10:13:37 EDT 2016
16/08/05 10:13:37 INFO fs.TestDFSIO: Number of files: 10
16/08/05 10:13:37 INFO fs.TestDFSIO: Total MBytes processed: 100.0
16/08/05 10:13:37 INFO fs.TestDFSIO: Throughput mb/sec: 1.4243796826482067
16/08/05 10:13:37 INFO fs.TestDFSIO: Average IO rate mb/sec: 6.660604000091553
16/08/05 10:13:37 INFO fs.TestDFSIO: IO rate std deviation: 8.936692949846902
16/08/05 10:13:37 INFO fs.TestDFSIO: Test exec time sec: 112.884
16/08/05 10:13:37 INFO fs.TestDFSIO:
测试读文件速度
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.0-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 10MB
执行结果
[root@hadoop-master hadoop-2.7.0]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.0-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 10MB
16/08/05 10:25:55 INFO fs.TestDFSIO: TestDFSIO.1.8
16/08/05 10:25:55 INFO fs.TestDFSIO: nrFiles = 10
16/08/05 10:25:55 INFO fs.TestDFSIO: nrBytes (MB) = 10.0
16/08/05 10:25:55 INFO fs.TestDFSIO: bufferSize = 1000000
16/08/05 10:25:55 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
16/08/05 10:25:56 INFO fs.TestDFSIO: creating control file: 10485760 bytes, 10 files
16/08/05 10:25:57 INFO fs.TestDFSIO: created control files for: 10 files
16/08/05 10:25:57 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/192.168.20.141:8032
16/08/05 10:25:57 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/192.168.20.141:8032
16/08/05 10:25:58 INFO mapred.FileInputFormat: Total input paths to process : 10
16/08/05 10:25:58 INFO mapreduce.JobSubmitter: number of splits:10
16/08/05 10:25:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470403413050_0002
16/08/05 10:25:59 INFO impl.YarnClientImpl: Submitted application application_1470403413050_0002
16/08/05 10:25:59 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1470403413050_0002/
16/08/05 10:25:59 INFO mapreduce.Job: Running job: job_1470403413050_0002
16/08/05 10:26:07 INFO mapreduce.Job: Job job_1470403413050_0002 running in uber mode : false
16/08/05 10:26:07 INFO mapreduce.Job: map 0% reduce 0%
16/08/05 10:26:36 INFO mapreduce.Job: map 20% reduce 0%
16/08/05 10:26:37 INFO mapreduce.Job: map 30% reduce 0%
16/08/05 10:26:38 INFO mapreduce.Job: map 50% reduce 0%
16/08/05 10:26:39 INFO mapreduce.Job: map 100% reduce 0%
16/08/05 10:26:44 INFO mapreduce.Job: map 100% reduce 100%
16/08/05 10:26:46 INFO mapreduce.Job: Job job_1470403413050_0002 completed successfully
16/08/05 10:26:46 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=832
FILE: Number of bytes written=1269501
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=104859990
HDFS: Number of bytes written=77
HDFS: Number of read operations=53
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=291173
Total time spent by all reduces in occupied slots (ms)=5582
Total time spent by all map tasks (ms)=291173
Total time spent by all reduce tasks (ms)=5582
Total vcore-seconds taken by all map tasks=291173
Total vcore-seconds taken by all reduce tasks=5582
Total megabyte-seconds taken by all map tasks=298161152
Total megabyte-seconds taken by all reduce tasks=5715968
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=726
Map output materialized bytes=886
Input split bytes=1270
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=886
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=3177
CPU time spent (ms)=7570
Physical memory (bytes) snapshot=2147774464
Virtual memory (bytes) snapshot=9255862272
Total committed heap usage (bytes)=1374396416
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=77
16/08/05 10:26:46 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/08/05 10:26:46 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
16/08/05 10:26:46 INFO fs.TestDFSIO: Date & time: Fri Aug 05 10:26:46 EDT 2016
16/08/05 10:26:46 INFO fs.TestDFSIO: Number of files: 10
16/08/05 10:26:46 INFO fs.TestDFSIO: Total MBytes processed: 100.0
16/08/05 10:26:46 INFO fs.TestDFSIO: Throughput mb/sec: 29.197080291970803
16/08/05 10:26:46 INFO fs.TestDFSIO: Average IO rate mb/sec: 47.35454177856445
16/08/05 10:26:46 INFO fs.TestDFSIO: IO rate std deviation: 29.781282953365924
16/08/05 10:26:46 INFO fs.TestDFSIO: Test exec time sec: 48.942
16/08/05 10:26:46 INFO fs.TestDFSIO:
命令查看
安装目录执行
cat TestDFSIO_results.log
执行结果
[root@hadoop-master hadoop-2.7.0]# cat TestDFSIO_results.log
----- TestDFSIO ----- : write
Date & time: Fri Aug 05 10:13:37 EDT 2016
Number of files: 10
Total MBytes processed: 100.0
Throughput mb/sec: 1.4243796826482067
Average IO rate mb/sec: 6.660604000091553
IO rate std deviation: 8.936692949846902
Test exec time sec: 112.884
----- TestDFSIO ----- : read
Date & time: Fri Aug 05 10:26:46 EDT 2016
Number of files: 10
Total MBytes processed: 100.0
Throughput mb/sec: 29.197080291970803
Average IO rate mb/sec: 47.35454177856445
IO rate std deviation: 29.781282953365924
Test exec time sec: 48.942
web查看
清除测试数据
将测试数据删除
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.0-tests.jar TestDFSIO -clean
单词统计测试
新建一个words.txt
vim words.txt
内容
hello hadoop hbase mytest
hadoop-node1
hadoop-master
hadoop-node2
this is my test
上传文件
bin/hadoop fs -put words.txt /tmp/
使用mapreduce统计单词个数
统计指定文件单词个数,并将结果输入到指定文件
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar wordcount /tmp/words.txt /tmp/words_result.txt
结果
hadoop 1
hadoop-master 1
hadoop-node1 1
hadoop-node2 1
hbase 1
hello 1
is 1
my 1
mytest 1
test 1
this 1
代码查看
查看根目录
bin/hadoop fs -ls /
查看指定文件
bin/hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00000
执行结果
[root@hadoop-master hadoop-2.7.0]# bin/hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00000
hadoop 1
hadoop-master 1
hadoop-node1 1
hadoop-node2 1
hbase 1
hello 1
is 1
my 1
mytest 1
test 1
this 1
更多Hadoop2.7.0常用操作HDFS的Shell命令
http://www.cuiweiyou.com/1405.html