Hadoop性能测试-Benchmarking

2020/11/27 sunhaiqi@bonc.com.cn

Hadoop Benchmarking

一、调试集群

​ 在开始测试之前应当启用HDFS服务以及YARN服务

​ 在启动yarn服务时发现resourceManager启动不了,通过查看日志发现错误:

2020-11-23 15:56:44,775 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]
2020-11-23 15:56:45,408 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/home/bduser101/modules/hadoop/etc/hadoop/core-site.xml
2020-11-23 15:56:45,931 FATAL org.apache.hadoop.conf.Configuration: error parsing conf java.io.BufferedInputStream@74294adb
org.xml.sax.SAXParseException; lineNumber: 19; columnNumber: 38; An 'include' failed, and no 'fallback' element was found.

​ 这个错误来自配置联邦时修改了core-site.xml中的引入文件,将mountTable.xml文件中的配置都写入core-site.xml中,并将该文件同步至所有节点之后,即可正常启动yarn服务。

【错误来源】
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include href="mountTable.xml"/>

​ 更正后:

二、测试组件

​ 当我们部署完一个新的集群,或者对集群做了升级,或调整集群中的性能参数后,想观察集群性能的变化,那么我们就需要一些集群测试工具。

​ hadoop自带的测试包,在这个测试包下有很多测试工具,其中DFSCIOTest、mrbench、nnbench应用广泛。

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar
  1. DFSCIOTest: Distributed i/o benchmark of libhdfs.

    (测试libhdfs中的分布式I/O的基准。Libhdfs是一个为C/C++应用程序提供HDFS文件服务的共享库。)

  2. DistributedFSCheck: Distributed checkup of the file system consistency.

    (文件系统一致性的分布式检查)

  3. JHLogAnalyzer: Job History Log analyzer.

  4. MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures

  5. NNdataGenerator: Generate the data to be used by NNloadGenerator

  6. NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR

  7. NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job

  8. NNstructureGenerator: Generate the structure to be used by NNdataGenerator

  9. SliveTest: HDFS Stress Test and Live Data Verification.

  10. TestDFSIO: Distributed i/o benchmark.

    (分布式的I/O基准)

  11. fail: a job that always fails

  12. filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)

    (测量HDFS的吞吐量)

  13. largesorter: Large-Sort tester

  14. loadgen: Generic map/reduce load generator

  15. mapredtest: A map/reduce test check.

  16. minicluster: Single process HDFS and MR cluster.

  17. mrbench: A map/reduce benchmark that can create many small jobs

    (创建大量小作业的MapReduce基准)

  18. nnbench: A benchmark that stresses the namenode.

    (NameNode的性能基)

  19. sleep: A job that sleeps at each map and reduce task.

  20. testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce

  21. testfilesystem: A test for FileSystem read/write.

    (文件系统读写测试)

  22. testmapredsort: A map/reduce program that validates the map-reduce framework’s sort.

    (用于校验MapReduce框架的排序的程序)

  23. testsequencefile: A test for flat files of binary key value pairs.

    (对包含二进制键值对的文本文件的测试)

  24. testsequencefileinputformat: A test for sequence file input format.

    (对序列文件输入格式的测试)

  25. testtextinputformat: A test for text input format.

    (对文本输入格式的测试。)

  26. threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

    (对比输出一个排序块的Map作业和输出多个排序块的Map作业的性能)

2.1、TestDFSIO

​ TestDFSIO用于测试HDFS的IO性能,使用一个MapReduce作业来并发地执行读写操作,每个map任务用于读或写每个文件,map的输出用于收集与处理文件相关的统计信息,reduce用于累积统计信息,并产生summary。

​ TestDFSIO的用法如下:

$>:hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO
Usage: TestDFSIO [genericOptions] -read | -write | -append | -clean [-nrFiles N] [-fileSize Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]

​ 在测试程序执行结束之后会在本地文件目录下生成文件TestDFSIO_results.log,可以查看运行结果的日志

2.1.1、向HDFS上传10个100MB的文件
$>cd /home/bduser101/modules/hadoop/share/hadoop/mapreduce
$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 100
20/11/23 17:03:48 INFO fs.TestDFSIO: TestDFSIO.1.8
20/11/23 17:03:48 INFO fs.TestDFSIO: nrFiles = 10
20/11/23 17:03:48 INFO fs.TestDFSIO: nrBytes (MB) = 100.0
20/11/23 17:03:48 INFO fs.TestDFSIO: bufferSize = 1000000
20/11/23 17:03:48 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
20/11/23 17:03:50 INFO fs.TestDFSIO: creating control file: 104857600 bytes, 10 files

​ 遇到错误WARN hdfs.DFSClient: Caught exception java.lang.Interrupted

Exceptionat java.lang.Object.wait(Native Method)不用慌,根据网上大多数人的情况来看,这是hadoop的bug

20/11/23 17:04:39 INFO mapreduce.Job:  map 0% reduce 0%
20/11/23 17:05:03 INFO mapreduce.Job:  map 13% reduce 0%
20/11/23 17:05:20 INFO mapreduce.Job:  map 17% reduce 0%
20/11/23 17:05:21 INFO mapreduce.Job:  map 20% reduce 0%
20/11/23 17:05:35 INFO mapreduce.Job:  map 20% reduce 7%
20/11/23 17:05:44 INFO mapreduce.Job:  map 27% reduce 7%
20/11/23 17:05:50 INFO mapreduce.Job:  map 30% reduce 10%
20/11/23 17:05:58 INFO mapreduce.Job:  map 77% reduce 10%
20/11/23 17:06:52 INFO mapreduce.Job:  map 80% reduce 10%
Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-112231132-192.168.159.101-1584624837518:blk_1073742201_1382 does not exist or is not under Constructionnull

​ 遇到错误org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-112231132-192.168.159.101-1584624837518:blk_1073742201_1382 does not exist or is not under Constructionnull;这是关于平衡器的bug 具体参照官方文档https://issues.apache.org/jira/browse/hdfs-8093

​ 排除办法有

  • 系统或hdfs是否有空间

    修改配置文件core-site.xml 将配置项fs.default.name从viewfs://my-cluser改为hdfs://node101:8020

    hadoop/bin$>./hdfs dfsadmin -report
    

    ​ 结果显示集群剩余空间仍然有很多

  • datanode数是否正常

  • 是否在safemode

  • 防火墙关闭

  • 配置方面

  • 把NameNode的tmp文件清空,然后重新格式化NameNode

20/11/23 17:06:53 INFO mapreduce.Job:  map 77% reduce 10%
20/11/23 17:07:14 INFO mapreduce.Job:  map 80% reduce 10%
20/11/23 17:07:27 INFO mapreduce.Job:  map 90% reduce 13%
Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /benchmarks/TestDFSIO/io_data/test_io_6 (inode 16834): File does not exist. Holder DFSClient_attempt_1606119502234_0004_m_000006_0_-1509478354_1 does not have any open files.

​ 遇到错误Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /benchmarks/TestDFSIO/io_data/test_io_6 (inode 16834): File does not exist. Holder DFSClient_attempt_1606119502234_0004_m_000006_0_-1509478354_1 does not have any open files.

  • 这个问题实际上就是data stream操作过程中文件被删掉了。,通常是因为Mapred多个task操作同一个文件,一个task完成后删掉文件导致

  • 此错误与hadoop的特性有关:Hadoop不会尝试诊断和修复运行缓慢的任务,而是尝试检测(推测)它们并为其运行备份任务。真正的原因是,在任务执行缓慢的情况下,Hadoop运行另一个任务以执行相同的操作(在我的情况下是将数据保存在hadoop的文件系统中),当两个相同的任务中的一个完成时,将删除一些临时文件,另一个任务完成之后将会删除同样的临时文件,所以这样会造成这种错误

  • 这个错误本身并不会影响该测试程序的运行结果,可以忽略。可以通过关闭spark和hadoop的推测来解决此问题:

​ 程序运行结束之后会有以下信息打印,包括该测试程序在运行期间的mapReduce任务,吞吐量,速率等数据

20/11/23 17:07:28 INFO mapreduce.Job:  map 87% reduce 13%
20/11/23 17:07:31 INFO mapreduce.Job:  map 90% reduce 13%
20/11/23 17:07:32 INFO mapreduce.Job:  map 93% reduce 13%
20/11/23 17:07:33 INFO mapreduce.Job:  map 97% reduce 13%
20/11/23 17:07:34 INFO mapreduce.Job:  map 100% reduce 13%
20/11/23 17:07:36 INFO mapreduce.Job:  map 100% reduce 100%
20/11/23 17:07:37 INFO mapreduce.Job: Job job_1606119502234_0004 completed successfully
20/11/23 17:07:38 INFO mapreduce.Job: Counters: 57
        File System Counters
                FILE: Number of bytes read=857
                FILE: Number of bytes written=1377714
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2330
                HDFS: Number of bytes written=1048576078
                HDFS: Number of read operations=43
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=12
                VIEWFS: Number of bytes read=0
                VIEWFS: Number of bytes written=0
                VIEWFS: Number of read operations=0
                VIEWFS: Number of large read operations=0
                VIEWFS: Number of write operations=0
        Job Counters
                Failed map tasks=2
                Killed map tasks=6
                Launched map tasks=19
                Launched reduce tasks=1
                Other local map tasks=1
                Data-local map tasks=18
                Total time spent by all maps in occupied slots (ms)=1723294
                Total time spent by all reduces in occupied slots (ms)=133402
                Total time spent by all map tasks (ms)=1723294
                Total time spent by all reduce tasks (ms)=133402
                Total vcore-milliseconds taken by all map tasks=1723294
                Total vcore-milliseconds taken by all reduce tasks=133402
                Total megabyte-milliseconds taken by all map tasks=1764653056
                Total megabyte-milliseconds taken by all reduce tasks=136603648
        Map-Reduce Framework
                Map input records=10
                Map output records=50
                Map output bytes=751
                Map output materialized bytes=911
                Input split bytes=1210
                Combine input records=0
                Combine output records=0
                Reduce input groups=5
                Reduce shuffle bytes=911
                Reduce input records=50
                Reduce output records=5
                Spilled Records=100
                Shuffled Maps =10
                Failed Shuffles=0
                Merged Map outputs=10
                GC time elapsed (ms)=71333
                CPU time spent (ms)=66340
                Physical memory (bytes) snapshot=1764884480
                Virtual memory (bytes) snapshot=22712225792
                Total committed heap usage (bytes)=2045894656
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0

​ 测试日志

20/11/23 17:07:38 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
20/11/23 17:07:38 INFO fs.TestDFSIO:             Date & time: Mon Nov 23 17:07:38 CST 2020
20/11/23 17:07:38 INFO fs.TestDFSIO:         Number of files: 10
20/11/23 17:07:38 INFO fs.TestDFSIO:  Total MBytes processed: 1000		
20/11/23 17:07:38 INFO fs.TestDFSIO:       Throughput mb/sec: 2.09		吞吐量
20/11/23 17:07:38 INFO fs.TestDFSIO:  Average IO rate mb/sec: 3.48		平均IO速率
20/11/23 17:07:38 INFO fs.TestDFSIO:   IO rate std deviation: 2.52		IO率STD偏差
20/11/23 17:07:38 INFO fs.TestDFSIO:      Test exec time sec: 226.16	测试执行时间秒
20/11/23 17:07:38 INFO fs.TestDFSIO:

​ 在公司测试集群执行相同的测试10次之后的统计分析

2.1.2、从HDFS读取10个1000MB的文件

​ 在读取之前应当运行上一个测试用例,以生成数据

$>cd /home/bduser101/modules/hadoop/share/hadoop/mapreduce
$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

20/11/24 15:16:24 INFO fs.TestDFSIO: TestDFSIO.1.8
20/11/24 15:16:24 INFO fs.TestDFSIO: nrFiles = 10
20/11/24 15:16:24 INFO fs.TestDFSIO: nrBytes (MB) = 100.0
20/11/24 15:16:24 INFO fs.TestDFSIO: bufferSize = 1000000
20/11/24 15:16:24 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
20/11/24 15:16:26 INFO fs.TestDFSIO: creating control file: 104857600 bytes, 10 files

​ 依然会遇到之前上传文件时的Exception:WARN hdfs.DFSClient: Caught exception java.lang.Interrupted

Exceptionat java.lang.Object.wait(Native Meth

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值