Hadoop基准测试

最新推荐文章于 2022-05-04 23:01:27 发布

北极蛤

最新推荐文章于 2022-05-04 23:01:27 发布

阅读量3.2k

点赞数 2

分类专栏： hadoop

本文链接：https://blog.csdn.net/u013551220/article/details/19631425

版权

hadoop 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Hadoop 基准测试

TestDFSIO的使用
该测试为Hadoop自带的测试工具，位于$HADOOP_HOME/share/hadoop/mapreduce目录中，主要用于测试DFS的IO性能，使用方法如下

[bc110 mapreduce]$ yarn jar hadoop-mapreduce-client-jobclient-2.2.0-tests.jar 
An example program must be given as the first argument.
Valid program names are:
  DFSCIOTest: Distributed i/o benchmark of libhdfs.
  DistributedFSCheck: Distributed checkup of the file system consistency.
  JHLogAnalyzer: Job History Log analyzer.
  MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
  SliveTest: HDFS Stress Test and Live Data Verification.
  TestDFSIO: Distributed i/o benchmark.
  fail: a job that always fails
  filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
  loadgen: Generic map/reduce load generator
  mapredtest: A map/reduce test check.
  minicluster: Single process HDFS and MR cluster.
  mrbench: A map/reduce benchmark that can create many small jobs
  nnbench: A benchmark that stresses the namenode.
  sleep: A job that sleeps at each map and reduce task.
  testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
  testfilesystem: A test for FileSystem read/write.
  testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
  testsequencefile: A test for flat files of binary key value pairs.
  testsequencefileinputformat: A test for sequence file input format.
  testtextinputformat: A test for text input format.
  threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

1、写入测试，向DFS中写入10个5GB的文件

	[bc110 ~]$ yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 10 -size 5GB -resFile /tmp/dfsio.txt
	14/02/21 11:37:26 INFO fs.TestDFSIO: TestDFSIO.1.7
	14/02/21 11:37:26 INFO fs.TestDFSIO: nrFiles = 10
	14/02/21 11:37:26 INFO fs.TestDFSIO: nrBytes (MB) = 5120.0
	14/02/21 11:37:26 INFO fs.TestDFSIO: bufferSize = 1000000
	14/02/21 11:37:26 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
  .....
  .....
	14/02/21 11:41:11 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
	14/02/21 11:41:11 INFO fs.TestDFSIO:            Date & time: Fri Feb 21 11:41:11 CST 2014
	14/02/21 11:41:11 INFO fs.TestDFSIO:        Number of files: 10
	14/02/21 11:41:11 INFO fs.TestDFSIO: Total MBytes processed: 51200.0
	14/02/21 11:41:11 INFO fs.TestDFSIO:      Throughput mb/sec: 29.51076681883156
	14/02/21 11:41:11 INFO fs.TestDFSIO: Average IO rate mb/sec: 32.61860275268555
	14/02/21 11:41:11 INFO fs.TestDFSIO:  IO rate std deviation: 12.55715046895572
	14/02/21 11:41:11 INFO fs.TestDFSIO:     Test exec time sec: 223.124
	14/02/21 11:41:11 INFO fs.TestDFSIO:

2、清除写入的文件

	[bc110 ~]$ yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar  TestDFSIO -clean

3、读取测试，在HDFS中读取10个5GB的文件

	[bc110 ~]$ yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -read -nrFiles 10 -size 5GB -resFile /tmp/dfsio.txt
	14/02/21 11:48:57 INFO fs.TestDFSIO: TestDFSIO.1.7
	14/02/21 11:48:57 INFO fs.TestDFSIO: nrFiles = 10
	14/02/21 11:48:57 INFO fs.TestDFSIO: nrBytes (MB) = 5120.0
	14/02/21 11:48:57 INFO fs.TestDFSIO: bufferSize = 1000000
	14/02/21 11:48:57 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
	.....
	.....
	14/02/21 11:51:31 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
	14/02/21 11:51:31 INFO fs.TestDFSIO:            Date & time: Fri Feb 21 11:51:31 CST 2014
	14/02/21 11:51:31 INFO fs.TestDFSIO:        Number of files: 10
	14/02/21 11:51:31 INFO fs.TestDFSIO: Total MBytes processed: 51200.0
	14/02/21 11:51:31 INFO fs.TestDFSIO:      Throughput mb/sec: 86.95209143555866
	14/02/21 11:51:31 INFO fs.TestDFSIO: Average IO rate mb/sec: 216.0893096923828
	14/02/21 11:51:31 INFO fs.TestDFSIO:  IO rate std deviation: 158.02797196752692
	14/02/21 11:51:31 INFO fs.TestDFSIO:     Test exec time sec: 152.721
	14/02/21 11:51:31 INFO fs.TestDFSIO:

terasort的使用

1）生成数据，teragen会按行生成数据，每行100字节，所以要生成100G的数据，只要100G/100b就好了，生成命令如下

	yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar teragen -Dmapred.map.tasks=20 1073741824 /tmp/teradata

2）排序数据

	yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar terasort -Dmapred.reduce.tasks=18 /tmp/teradata /tmp/teraout

3）校验数据

	yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar teravalidate /tmp/teraout /tmp/teravalidate

nnbench的使用，因为选项解释很明显，不多介绍

yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar nnbench \
-operation create_write \
-maps 18 \
-reduces 6 \
-blockSize 1 \
-bytesToWrite 0 \
-numberOfFiles 10000 \
-replicationFactorPerFile 3 \
-readFileAfterOpen true \
-baseDir /benchmarks/nnbench

mrbench的使用也比较简单，如下：

yarn jar hadoop220/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar mrbench -baseDir /tmp/mrbench -maps 100 -reduces 100 -numRuns 1

北极蛤

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Hadoop基准测试

Hadoop 基准测试TestDFSIO的使用该测试为Hadoop自带的测试工具，位于$HADOOP_HOME/share/hadoop/mapreduce目录中，主要用于测试DFS的IO性能，使用方法如下[bc110 mapreduce]$ yarn jar hadoop-mapreduce-client-jobclient-2.2.0-tests.jar An examp
复制链接

扫一扫

专栏目录