Hadoop基准测试

Hadoop 基准测试


TestDFSIO的使用
该测试为Hadoop自带的测试工具,位于$HADOOP_HOME/share/hadoop/mapreduce目录中,主要用于测试DFS的IO性能,使用方法如下
[bc110 mapreduce]$ yarn jar hadoop-mapreduce-client-jobclient-2.2.0-tests.jar 
An example program must be given as the first argument.
Valid program names are:
  DFSCIOTest: Distributed i/o benchmark of libhdfs.
  DistributedFSCheck: Distributed checkup of the file system consistency.
  JHLogAnalyzer: Job History Log analyzer.
  MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
  SliveTest: HDFS Stress Test and Live Data Verification.
  TestDFSIO: Distributed i/o benchmark.
  fail: a job that always fails
  filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
  loadgen: Generic map/reduce load generator
  mapredtest: A map/reduce test check.
  minicluster: Single process HDFS and MR cluster.
  mrbench: A map/reduce benchmark that can create many small jobs
  nnbench: A benchmark that stresses the namenode.
  sleep: A job that sleeps at each map and reduce task.
  testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
  testfilesystem: A test for FileSystem read/write.
  testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
  testsequencefile: A test for flat files of binary key value pairs.
  testsequencefileinputformat: A test for sequence file input format.
  testtextinputformat: A test for text input format.
  threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
  
1、写入测试,向DFS中写入10个5GB的文件
	[bc110 ~]$ yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 10 -size 5GB -resFile /tmp/dfsio.txt
	14/02/21 11:37:26 INFO fs.TestDFSIO: TestDFSIO.1.7
	14/02/21 11:37:26 INFO fs.TestDFSIO: nrFiles = 10
	14/02/21 11:37:26 INFO fs.TestDFSIO: nrBytes (MB) = 5120.0
	14/02/21 11:37:26 INFO fs.TestDFSIO: bufferSize = 1000000
	14/02/21 11:37:26 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
  .....
  .....
	14/02/21 11:41:11 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
	14/02/21 11:41:11 INFO fs.TestDFSIO:            Date & time: Fri Feb 21 11:41:11 CST 2014
	14/02/21 11:41:11 INFO fs.TestDFSIO:        Number of files: 10
	14/02/21 11:41:11 INFO fs.TestDFSIO: Total MBytes processed: 51200.0
	14/02/21 11:41:11 INFO fs.TestDFSIO:      Throughput mb/sec: 29.51076681883156
	14/02/21 11:41:11 INFO fs.TestDFSIO: Average IO rate mb/sec: 32.61860275268555
	14/02/21 11:41:11 INFO fs.TestDFSIO:  IO rate std deviation: 12.55715046895572
	14/02/21 11:41:11 INFO fs.TestDFSIO:     Test exec time sec: 223.124
	14/02/21 11:41:11 INFO fs.TestDFSIO: 
2、清除写入的文件
	[bc110 ~]$ yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar  TestDFSIO -clean
3、读取测试,在HDFS中读取10个5GB的文件
	[bc110 ~]$ yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -read -nrFiles 10 -size 5GB -resFile /tmp/dfsio.txt
	14/02/21 11:48:57 INFO fs.TestDFSIO: TestDFSIO.1.7
	14/02/21 11:48:57 INFO fs.TestDFSIO: nrFiles = 10
	14/02/21 11:48:57 INFO fs.TestDFSIO: nrBytes (MB) = 5120.0
	14/02/21 11:48:57 INFO fs.TestDFSIO: bufferSize = 1000000
	14/02/21 11:48:57 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
	.....
	.....
	14/02/21 11:51:31 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
	14/02/21 11:51:31 INFO fs.TestDFSIO:            Date & time: Fri Feb 21 11:51:31 CST 2014
	14/02/21 11:51:31 INFO fs.TestDFSIO:        Number of files: 10
	14/02/21 11:51:31 INFO fs.TestDFSIO: Total MBytes processed: 51200.0
	14/02/21 11:51:31 INFO fs.TestDFSIO:      Throughput mb/sec: 86.95209143555866
	14/02/21 11:51:31 INFO fs.TestDFSIO: Average IO rate mb/sec: 216.0893096923828
	14/02/21 11:51:31 INFO fs.TestDFSIO:  IO rate std deviation: 158.02797196752692
	14/02/21 11:51:31 INFO fs.TestDFSIO:     Test exec time sec: 152.721
	14/02/21 11:51:31 INFO fs.TestDFSIO: 
terasort的使用


1)生成数据,teragen会按行生成数据,每行100字节,所以要生成100G的数据,只要100G/100b就好了,生成命令如下
	yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar teragen -Dmapred.map.tasks=20 1073741824 /tmp/teradata
2)排序数据
	yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar terasort -Dmapred.reduce.tasks=18 /tmp/teradata /tmp/teraout
3)校验数据
	yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar teravalidate /tmp/teraout /tmp/teravalidate
nnbench的使用,因为选项解释很明显,不多介绍
yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar nnbench \
-operation create_write \
-maps 18 \
-reduces 6 \
-blockSize 1 \
-bytesToWrite 0 \
-numberOfFiles 10000 \
-replicationFactorPerFile 3 \
-readFileAfterOpen true \
-baseDir /benchmarks/nnbench
mrbench的使用也比较简单,如下:
yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar mrbench -baseDir /tmp/mrbench -maps 100 -reduces 100 -numRuns 1 


  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值