hdfs benchmark

最新推荐文章于 2023-03-18 09:44:01 发布

shahaizimxm

最新推荐文章于 2023-03-18 09:44:01 发布

阅读量1.5k

点赞数

分类专栏：大数据文章标签： hdfs benchmark hadoop hadoop集群测试测试工具

大数据专栏收录该内容

22 篇文章 3 订阅

订阅专栏

hdfs benchmark

1. TestDFSIO

1)简介

该benchmark用于测试hdfs的读写速率，发现所在网络的性能瓶颈。其在hdfs上的默认输出目录为/benchmarks/TestDFSIO；在本地的默认输出路径为当前目录（可通过设置-D test.build.data来修改），默认文件名为TestDFSIO_results.log（可以通过-resFile参数来修改）。一个文件会发起一个map。

2)测试

#输出为十个文件，每个文件1G来进行写测试

$ hadoop jar/usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

#用10个大小为1G的文件进行读测试

$ hadoop jar/usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

#清除hdfs上的测试文件

$ hadoop jar/usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -clean

图1.1

#输出为十个文件，每个文件100M来进行写测试

$ hadoop jar/usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -read -nrFiles 10 -fileSize 100

#用10个大小为100M的文件进行读测试

$ hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jarTestDFSIO -read -nrFiles 10 -fileSize 100

图1.2

#输出为100个文件，每个文件100M来进行写测试

$ hadoop jar/usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -read -nrFiles 100 -fileSize 100

#用100个大小为100M的文件进行读测试

$ hadoop jar/usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -read -nrFiles 100 -fileSize 100

图1.3

3)公式

2.TeraSort benchmark suite

1)简介

Typical areas where TeraSort is helpful isto determine whether your map and reduce slot assignments are sound (as theydepend on the variables such as the number of cores per TaskTracker node andthe available RAM), whether other MapReduce-related parameters such asio.sort.mb and mapred.child.java.opts are set to proper values, or whether theFairScheduler configuration you came up with really behaves as expected.

2)测试

#利用TeraGen产生输入数据

$ hadoop jar/usr/lib/hadoop-0.20/hadoop-examples.jar teragen 10000000<number of 100-byterows> /user/leixf/terasort-input<output dir>

图1.4

#在输入数据上运行TeraSort

$ hadoop jar/usr/lib/hadoop-0.20/hadoop-examples.jar terasort /user/leixf/terasort-input/user/leixf/terasort-output

图1.5

#对排序结果进行校验，以确保排序是在全局范围内进行的

$ hadoop jar/usr/lib/hadoop-0.20/hadoop-examples.jar teravalidate/user/leixf/terasort-output /user/leixf/teravalidate-output

图1.6

3.查看作业运行历史

$ hadoop job -history all /user/leixf/teravalidate-output

4.NameNode benchmark(nnbnech)

1)简介

对namenode的硬件及配置进行负载测试

2)测试

$ hadoop jar/usr/lib/hadoop-0.20/hadoop-test.jar nnbench -operation create_write -maps 12-reduces 6 -blockSize 1 -bytesWrite 0 -numberOfFile 1000 -replicationFactorPerFile3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-`hostname -s`

图1.7

5. MapReduce benchmark

1)简介

测试小型作业在cluster上的响应和运行情况是否高效

2)测试

$ hadoop jar/usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 30

图1.8

$ hadoop jar/usr/lib/hadoop-0.20/hadoop-test.jar mrbench –numRuns 50

图1.9