当我们部署完一个新的集群,或者对集群做了升级,或调整集群中的性能参数后,想观察集群性能的变化,那么我们就需要一些集群测试工具。
hadoop自带测试包,在这个测试包下我们也看到了很多测试工具,其中DFSCIOTest、mrbench、nnbench应用广泛。
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar
An example program must be given as the first argument.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
等等。。。
使用方法:http://www.voidcn.com/article/p-gvogohdn-bqy.html
例如:
测试写:$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar TestDFSIO -write -nrFiles 10 -size 1000MB
测试读:$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar TestDFSIO -read -nrFiles 10 -size 10
另外一个用来测试集群性能的基准测试工具TeraSort,也是hadoop自带的,该工具通过对1TB的数据进行排序作为基准,来测试集群性能。
TeraSort工具分为三部分:TeraGen生成数据,TeraSort执行排序,TeraValidate执行验证。
先生产1TB数据,执行TeraGen:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar teragen 10000000000 /user/data/terasortinput
注意:这里的路径/user/data/terasortinput 是HDFS上的路径。
执行TeraSort:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar terasort /user/data/terasortinput /user/data/terasortoutput
执行验证:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar teravalidate /user/data/terasortoutput/user/data/terasortvalidate
其他的测试工具:
Inter的HiBench
hadoop GridMix:hadoop自带的测试工具
TPC-DS:广泛应用于SQL on Hadoop的产品评测
Berkeley BigDataBench:随着Spark的推出,由AMPLab开发的一套大数据基准测试工具,官网:https://amplab.cs.berkeley.edu/benchmark/
测试磁盘i/o速度
$hdparm -t /dev/sda1
至少大于70MB/s以上的速度