hadoop集群基准测试

本文介绍了Hadoop自带的几个基准测试工具,包括TestDFSIO、nnbench和mrbench的使用方法及示例,并展示了如何通过这些工具测试Hadoop系统的读写性能、NameNode负载以及MapReduce任务的执行效率。
摘要由CSDN通过智能技术生成

Hadoop自带了几个基准测试,本文使用的是hadoop-2.6.0


一、Hadoop Test 的测试

[root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar  
An example program must be given as the first argument.
Valid program names are:
  DFSCIOTest: Distributed i/o benchmark of libhdfs.
  DistributedFSCheck: Distributed checkup of the file system consistency.
  JHLogAnalyzer: Job History Log analyzer.
  MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
  SliveTest: HDFS Stress Test and Live Data Verification.
  TestDFSIO: Distributed i/o benchmark.
  fail: a job that always fails
  filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
  largesorter: Large-Sort tester
  loadgen: Generic map/reduce load generator
  mapredtest: A map/reduce test check.
  minicluster: Single process HDFS and MR cluster.
  mrbench: A map/reduce benchmark that can create many small jobs
  nnbench: A benchmark that stresses the namenode.
  sleep: A job that sleeps at each map and reduce task.
  testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
  testfilesystem: A test for FileSystem read/write.
  testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
  testsequencefile: A test for flat files of binary key value pairs.
  testsequencefileinputformat: A test for sequence file input format.
  testtextinputformat: A test for text input format.
  threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill


这些例子从多个角度对Hadoop进行测试,其中 TestDFSIO、mrbench和nnbench是三个广泛被使用的测试。


1、TestDFSIO 测试

① TestDFSIO write

测试hadoop写的速度。


TestDFSIO的用法如下:

Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]


向HDFS文件系统中写入数据,10个文件,每个文件10MB,文件存放到/benchmarks/TestDFSIO/io_data下面。

[root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar TestDFSIO -write -nrFiles 10 -size 10MB


跑出来的数据如下图:



查看写入的结果:

[root@master hadoop-2.6.0]# cat TestDFSIO_results.log
----- TestDFSIO ----- : write
           Date & time: Fri Sep 23 19:21:01 CST 2016
       Number of files: 10
Total MBytes processed: 100.0
     Throughput mb/sec: 1.7217037980785785
Average IO rate mb/sec: 1.9971516132354736
 IO rate std deviation: 0.9978736646901237
    Test exec time sec: 81.711


② TestDFSIO read

测试hadoop读文件的速度

从HDFS文件系统中读入10个文件,每个文件大小为10MB

[root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar TestDFSIO -read -nrFiles 10 -size 10


[root@master hadoop-2.6.0]# cat TestDFSIO_results.log

----- TestDFSIO ----- : write
           Date & time: Fri Sep 23 19:21:01 CST 2016
       Number of files: 10
Total MBytes processed: 100.0
     Throughput mb/sec: 1.7217037980785785
Average IO rate mb/sec: 1.9971516132354736
 IO rate std deviation: 0.9978736646901237
    Test exec time sec: 81.711


----- TestDFSIO ----- : read
           Date & time: Fri Sep 23 19:37:21 CST 2016
       Number of files: 10
Total MBytes processed: 100.0
     Throughput mb/sec: 14.85001485001485
Average IO rate mb/sec: 16.221948623657227
 IO rate std deviation: 4.983088493832205
    Test exec time sec: 50.188


③ 清空测试数据

[root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar TestDFSIO -clean


如下图所示:



2、nnbench 测试 [NameNode benchmark (nnbench)]

nnbench用于测试NameNode的负载,它会生成很多与HDFS相关的请求,给NameNode施加较大的压力。

这个测试能在HDFS上创建、读取、重命名和删除文件操作。


nnbench 的用法:

[root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar nnbench 
NameNode Benchmark 0.4
Usage: nnbench <options>
Options:
        -operation <Available operations are create_write open_read rename delete. This option is mandatory>
         * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
        -maps <number of maps. default is 1. This is not mandatory>
        -reduces <number of reduces. default is 1. This is not mandatory>
        -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory 
        -blockSize <Block size in bytes. default is 1. This is not mandatory>
        -bytesToWrite <Bytes to write. default is 0. This is not mandatory>
        -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>
        -numberOfFiles <number of files to create. default is 1. This is not mandatory>
        -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>
        -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>
        -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>
        -help: Display the help statement


以下例子使用10个mapper和5个reducer来创建1000个文件

[root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar nnbench -operation create_write -maps 10 -reduces 5 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true 



3、mrbench测试[MapReduce benchmark (mrbench)]

mrbench会多次重复执行一个小作业,用于检查在机群上小作业的运行是否可重复以及运行是否高效。

[root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar mrbench --help
MRBenchmark.0.0.2
Usage: mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] [-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] [-numRuns <number of times to run the job, default is 1>] [-maps <number of maps for each run, default is 2>] [-reduces <number of reduces for each run, default is 1>] [-inputLines <number of input lines to generate, default is 1>] [-inputType <type of input to generate, one of ascending (default), descending, random>] [-verbose]


下面的例子会运行一个小作业50次:

[root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar mrbench -numRuns 50


这样会运行50次。



二、Hadoop Examples 的测试

[root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar 
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.


最常用的就是 wordcount。


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值