Hadoop 基准测试
TestDFSIO的使用
该测试为Hadoop自带的测试工具,位于$HADOOP_HOME/share/hadoop/mapreduce目录中,主要用于测试DFS的IO性能,使用方法如下
1、写入测试,向DFS中写入10个5GB的文件
1)生成数据,teragen会按行生成数据,每行100字节,所以要生成100G的数据,只要100G/100b就好了,生成命令如下
TestDFSIO的使用
该测试为Hadoop自带的测试工具,位于$HADOOP_HOME/share/hadoop/mapreduce目录中,主要用于测试DFS的IO性能,使用方法如下
[bc110 mapreduce]$ yarn jar hadoop-mapreduce-client-jobclient-2.2.0-tests.jar
An example program must be given as the first argument.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
1、写入测试,向DFS中写入10个5GB的文件
[bc110 ~]$ yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 10 -size 5GB -resFile /tmp/dfsio.txt
14/02/21 11:37:26 INFO fs.TestDFSIO: TestDFSIO.1.7
14/02/21 11:37:26 INFO fs.TestDFSIO: nrFiles = 10
14/02/21 11:37:26 INFO fs.TestDFSIO: nrBytes (MB) = 5120.0
14/02/21 11:37:26 INFO fs.TestDFSIO: bufferSize = 1000000
14/02/21 11:37:26 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
.....
.....
14/02/21 11:41:11 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
14/02/21 11:41:11 INFO fs.TestDFSIO: Date & time: Fri Feb 21 11:41:11 CST 2014
14/02/21 11:41:11 INFO fs.TestDFSIO: Number of files: 10
14/02/21 11:41:11 INFO fs.TestDFSIO: Total MBytes processed: 51200.0
14/02/21 11:41:11 INFO fs.TestDFSIO: Throughput mb/sec: 29.51076681883156
14/02/21 11:41:11 INFO fs.TestDFSIO: Average IO rate mb/sec: 32.61860275268555
14/02/21 11:41:11 INFO fs.TestDFSIO: IO rate std deviation: 12.55715046895572
14/02/21 11:41:11 INFO fs.TestDFSIO: Test exec time sec: 223.124
14/02/21 11:41:11 INFO fs.TestDFSIO:
2、清除写入的文件
[bc110 ~]$ yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -clean
3、读取测试,在HDFS中读取10个5GB的文件
[bc110 ~]$ yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -read -nrFiles 10 -size 5GB -resFile /tmp/dfsio.txt
14/02/21 11:48:57 INFO fs.TestDFSIO: TestDFSIO.1.7
14/02/21 11:48:57 INFO fs.TestDFSIO: nrFiles = 10
14/02/21 11:48:57 INFO fs.TestDFSIO: nrBytes (MB) = 5120.0
14/02/21 11:48:57 INFO fs.TestDFSIO: bufferSize = 1000000
14/02/21 11:48:57 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
.....
.....
14/02/21 11:51:31 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
14/02/21 11:51:31 INFO fs.TestDFSIO: Date & time: Fri Feb 21 11:51:31 CST 2014
14/02/21 11:51:31 INFO fs.TestDFSIO: Number of files: 10
14/02/21 11:51:31 INFO fs.TestDFSIO: Total MBytes processed: 51200.0
14/02/21 11:51:31 INFO fs.TestDFSIO: Throughput mb/sec: 86.95209143555866
14/02/21 11:51:31 INFO fs.TestDFSIO: Average IO rate mb/sec: 216.0893096923828
14/02/21 11:51:31 INFO fs.TestDFSIO: IO rate std deviation: 158.02797196752692
14/02/21 11:51:31 INFO fs.TestDFSIO: Test exec time sec: 152.721
14/02/21 11:51:31 INFO fs.TestDFSIO:
terasort的使用
1)生成数据,teragen会按行生成数据,每行100字节,所以要生成100G的数据,只要100G/100b就好了,生成命令如下
yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar teragen -Dmapred.map.tasks=20 1073741824 /tmp/teradata
2)排序数据
yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar terasort -Dmapred.reduce.tasks=18 /tmp/teradata /tmp/teraout
3)校验数据
yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar teravalidate /tmp/teraout /tmp/teravalidate
nnbench的使用,因为选项解释很明显,不多介绍
yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar nnbench \
-operation create_write \
-maps 18 \
-reduces 6 \
-blockSize 1 \
-bytesToWrite 0 \
-numberOfFiles 10000 \
-replicationFactorPerFile 3 \
-readFileAfterOpen true \
-baseDir /benchmarks/nnbench
mrbench的使用也比较简单,如下:
yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar mrbench -baseDir /tmp/mrbench -maps 100 -reduces 100 -numRuns 1