Hadoop性能测试-Benchmarking

最新推荐文章于 2024-06-24 18:44:17 发布

上苍保佑吃饱饭的人们

最新推荐文章于 2024-06-24 18:44:17 发布

阅读量1.8k

点赞数

分类专栏：大数据学习之hadoop 文章标签： hadoop 大数据 hdfs

本文链接：https://blog.csdn.net/nothair/article/details/110873882

版权

2020/11/27 sunhaiqi@bonc.com.cn

文章目录

Hadoop Benchmarking

Hadoop Benchmarking

一、调试集群

在开始测试之前应当启用HDFS服务以及YARN服务

在启动yarn服务时发现resourceManager启动不了，通过查看日志发现错误：

2020-11-23 15:56:44,775 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]
2020-11-23 15:56:45,408 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/home/bduser101/modules/hadoop/etc/hadoop/core-site.xml
2020-11-23 15:56:45,931 FATAL org.apache.hadoop.conf.Configuration: error parsing conf java.io.BufferedInputStream@74294adb
org.xml.sax.SAXParseException; lineNumber: 19; columnNumber: 38; An 'include' failed, and no 'fallback' element was found.

这个错误来自配置联邦时修改了core-site.xml中的引入文件，将mountTable.xml文件中的配置都写入core-site.xml中，并将该文件同步至所有节点之后，即可正常启动yarn服务。

【错误来源】
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include href="mountTable.xml"/>

更正后：

二、测试组件

当我们部署完一个新的集群，或者对集群做了升级，或调整集群中的性能参数后，想观察集群性能的变化，那么我们就需要一些集群测试工具。

hadoop自带的测试包，在这个测试包下有很多测试工具，其中DFSCIOTest、mrbench、nnbench应用广泛。

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar

DFSCIOTest: Distributed i/o benchmark of libhdfs.

（测试libhdfs中的分布式I/O的基准。Libhdfs是一个为C/C++应用程序提供HDFS文件服务的共享库。）
DistributedFSCheck: Distributed checkup of the file system consistency.

（文件系统一致性的分布式检查）
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
NNdataGenerator: Generate the data to be used by NNloadGenerator
NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
NNstructureGenerator: Generate the structure to be used by NNdataGenerator
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.

（分布式的I/O基准）
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)

（测量HDFS的吞吐量）
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs

（创建大量小作业的MapReduce基准）
nnbench: A benchmark that stresses the namenode.

（NameNode的性能基）
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.

（文件系统读写测试）
testmapredsort: A map/reduce program that validates the map-reduce framework’s sort.

（用于校验MapReduce框架的排序的程序）
testsequencefile: A test for flat files of binary key value pairs.

（对包含二进制键值对的文本文件的测试）
testsequencefileinputformat: A test for sequence file input format.

（对序列文件输入格式的测试）
testtextinputformat: A test for text input format.

（对文本输入格式的测试。）
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

（对比输出一个排序块的Map作业和输出多个排序块的Map作业的性能）

2.1、TestDFSIO

TestDFSIO用于测试HDFS的IO性能，使用一个MapReduce作业来并发地执行读写操作，每个map任务用于读或写每个文件，map的输出用于收集与处理文件相关的统计信息，reduce用于累积统计信息，并产生summary。

TestDFSIO的用法如下：

$>:hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO
Usage: TestDFSIO [genericOptions] -read | -write | -append | -clean [-nrFiles N] [-fileSize Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]

在测试程序执行结束之后会在本地文件目录下生成文件TestDFSIO_results.log，可以查看运行结果的日志

2.1.1、向HDFS上传10个100MB的文件

$>cd /home/bduser101/modules/hadoop/share/hadoop/mapreduce
$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 100
20/11/23 17:03:48 INFO fs.TestDFSIO: TestDFSIO.1.8
20/11/23 17:03:48 INFO fs.TestDFSIO: nrFiles = 10
20/11/23 17:03:48 INFO fs.TestDFSIO: nrBytes (MB) = 100.0
20/11/23 17:03:48 INFO fs.TestDFSIO: bufferSize = 1000000
20/11/23 17:03:48 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
20/11/23 17:03:50 INFO fs.TestDFSIO: creating control file: 104857600 bytes, 10 files

遇到错误WARN hdfs.DFSClient: Caught exception java.lang.Interrupted

Exceptionat java.lang.Object.wait(Native Method)不用慌，根据网上大多数人的情况来看，这是hadoop的bug

20/11/23 17:04:39 INFO mapreduce.Job:  map 0% reduce 0%
20/11/23 17:05:03 INFO mapreduce.Job:  map 13% reduce 0%
20/11/23 17:05:20 INFO mapreduce.Job:  map 17% reduce 0%
20/11/23 17:05:21 INFO mapreduce.Job:  map 20% reduce 0%
20/11/23 17:05:35 INFO mapreduce.Job:  map 20% reduce 7%
20/11/23 17:05:44 INFO mapreduce.Job:  map 27% reduce 7%
20/11/23 17:05:50 INFO mapreduce.Job:  map 30% reduce 10%
20/11/23 17:05:58 INFO mapreduce.Job:  map 77% reduce 10%
20/11/23 17:06:52 INFO mapreduce.Job:  map 80% reduce 10%
Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-112231132-192.168.159.101-1584624837518:blk_1073742201_1382 does not exist or is not under Constructionnull

遇到错误org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-112231132-192.168.159.101-1584624837518:blk_1073742201_1382 does not exist or is not under Constructionnull；这是关于平衡器的bug 具体参照官方文档https://issues.apache.org/jira/browse/hdfs-8093

排除办法有

系统或hdfs是否有空间

修改配置文件core-site.xml 将配置项fs.default.name从viewfs://my-cluser改为hdfs://node101:8020
```
hadoop/bin$>./hdfs dfsadmin -report
```
结果显示集群剩余空间仍然有很多
datanode数是否正常
是否在safemode
防火墙关闭
配置方面
把NameNode的tmp文件清空，然后重新格式化NameNode

20/11/23 17:06:53 INFO mapreduce.Job:  map 77% reduce 10%
20/11/23 17:07:14 INFO mapreduce.Job:  map 80% reduce 10%
20/11/23 17:07:27 INFO mapreduce.Job:  map 90% reduce 13%
Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /benchmarks/TestDFSIO/io_data/test_io_6 (inode 16834): File does not exist. Holder DFSClient_attempt_1606119502234_0004_m_000006_0_-1509478354_1 does not have any open files.

遇到错误Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /benchmarks/TestDFSIO/io_data/test_io_6 (inode 16834): File does not exist. Holder DFSClient_attempt_1606119502234_0004_m_000006_0_-1509478354_1 does not have any open files.

这个问题实际上就是data stream操作过程中文件被删掉了。，通常是因为Mapred多个task操作同一个文件，一个task完成后删掉文件导致
此错误与hadoop的特性有关:Hadoop不会尝试诊断和修复运行缓慢的任务，而是尝试检测（推测）它们并为其运行备份任务。真正的原因是，在任务执行缓慢的情况下，Hadoop运行另一个任务以执行相同的操作（在我的情况下是将数据保存在hadoop的文件系统中），当两个相同的任务中的一个完成时，将删除一些临时文件，另一个任务完成之后将会删除同样的临时文件，所以这样会造成这种错误
这个错误本身并不会影响该测试程序的运行结果，可以忽略。可以通过关闭spark和hadoop的推测来解决此问题：

程序运行结束之后会有以下信息打印，包括该测试程序在运行期间的mapReduce任务，吞吐量，速率等数据

20/11/23 17:07:28 INFO mapreduce.Job:  map 87% reduce 13%
20/11/23 17:07:31 INFO mapreduce.Job:  map 90% reduce 13%
20/11/23 17:07:32 INFO mapreduce.Job:  map 93% reduce 13%
20/11/23 17:07:33 INFO mapreduce.Job:  map 97% reduce 13%
20/11/23 17:07:34 INFO mapreduce.Job:  map 100% reduce 13%
20/11/23 17:07:36 INFO mapreduce.Job:  map 100% reduce 100%
20/11/23 17:07:37 INFO mapreduce.Job: Job job_1606119502234_0004 completed successfully
20/11/23 17:07:38 INFO mapreduce.Job: Counters: 57
        File System Counters
                FILE: Number of bytes read=857
                FILE: Number of bytes written=1377714
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2330
                HDFS: Number of bytes written=1048576078
                HDFS: Number of read operations=43
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=12
                VIEWFS: Number of bytes read=0
                VIEWFS: Number of bytes written=0
                VIEWFS: Number of read operations=0
                VIEWFS: Number of large read operations=0
                VIEWFS: Number of write operations=0
        Job Counters
                Failed map tasks=2
                Killed map tasks=6
                Launched map tasks=19
                Launched reduce tasks=1
                Other local map tasks=1
                Data-local map tasks=18
                Total time spent by all maps in occupied slots (ms)=1723294
                Total time spent by all reduces in occupied slots (ms)=133402
                Total time spent by all map tasks (ms)=1723294
                Total time spent by all reduce tasks (ms)=133402
                Total vcore-milliseconds taken by all map tasks=1723294
                Total vcore-milliseconds taken by all reduce tasks=133402
                Total megabyte-milliseconds taken by all map tasks=1764653056
                Total megabyte-milliseconds taken by all reduce tasks=136603648
        Map-Reduce Framework
                Map input records=10
                Map output records=50
                Map output bytes=751
                Map output materialized bytes=911
                Input split bytes=1210
                Combine input records=0
                Combine output records=0
                Reduce input groups=5
                Reduce shuffle bytes=911
                Reduce input records=50
                Reduce output records=5
                Spilled Records=100
                Shuffled Maps =10
                Failed Shuffles=0
                Merged Map outputs=10
                GC time elapsed (ms)=71333
                CPU time spent (ms)=66340
                Physical memory (bytes) snapshot=1764884480
                Virtual memory (bytes) snapshot=22712225792
                Total committed heap usage (bytes)=2045894656
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0

测试日志

20/11/23 17:07:38 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
20/11/23 17:07:38 INFO fs.TestDFSIO:             Date & time: Mon Nov 23 17:07:38 CST 2020
20/11/23 17:07:38 INFO fs.TestDFSIO:         Number of files: 10
20/11/23 17:07:38 INFO fs.TestDFSIO:  Total MBytes processed: 1000		
20/11/23 17:07:38 INFO fs.TestDFSIO:       Throughput mb/sec: 2.09		吞吐量
20/11/23 17:07:38 INFO fs.TestDFSIO:  Average IO rate mb/sec: 3.48		平均IO速率
20/11/23 17:07:38 INFO fs.TestDFSIO:   IO rate std deviation: 2.52		IO率STD偏差
20/11/23 17:07:38 INFO fs.TestDFSIO:      Test exec time sec: 226.16	测试执行时间秒
20/11/23 17:07:38 INFO fs.TestDFSIO:

在公司测试集群执行相同的测试10次之后的统计分析

2.1.2、从HDFS读取10个1000MB的文件

在读取之前应当运行上一个测试用例，以生成数据

$>cd /home/bduser101/modules/hadoop/share/hadoop/mapreduce
$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

20/11/24 15:16:24 INFO fs.TestDFSIO: TestDFSIO.1.8
20/11/24 15:16:24 INFO fs.TestDFSIO: nrFiles = 10
20/11/24 15:16:24 INFO fs.TestDFSIO: nrBytes (MB) = 100.0
20/11/24 15:16:24 INFO fs.TestDFSIO: bufferSize = 1000000
20/11/24 15:16:24 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
20/11/24 15:16:26 INFO fs.TestDFSIO: creating control file: 104857600 bytes, 10 files

依然会遇到之前上传文件时的Exception：WARN hdfs.DFSClient: Caught exception java.lang.Interrupted

Exceptionat java.lang.Object.wait(Native Meth

最低0.47元/天解锁文章

上苍保佑吃饱饭的人们

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Hadoop性能测试-Benchmarking

2020/11/27 sunhaiqi@bonc.com.cn文章目录Hadoop Benchmarking一、调试集群二、测试组件2.1、TestDFSIO2.1.1、向HDFS上传10个100MB的文件2.1.2、从HDFS读取10个1000MB的文件2.2、nnbench2.2.1、使用12个mapper和6个reducer创建1000个文件2.3、mrbench2.3.1、运行一个小作业50次2.4、Teragen-TeraSort-Teravalidate2.4.1、Teragen生成测试数据2
复制链接

扫一扫

专栏目录