Hadoop 2.8.0 基准测试
1.查看jar包命令
2.建立乱序100M数据
3.排序
4.删除文件
1.
执行:
hadoop jar../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.0-tests.jar
结果:
Anexample program must be given as the first argument.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MRframework by injecting faults/failures
NNdataGenerator: Generate the data to be used by NNloadGenerator
NNloadGenerator: Generate load on Namenode using NN loadgenerator runWITHOUT MR
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator runas MR job
NNstructureGenerator: Generate the structure to be used byNNdataGenerator
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,recordcompressed and uncompressed), Text(Input|Output)Format (compressed anduncompressed)
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode w/ MR.
nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very bignon-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduceframework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench:A map/reduce benchmark that compares the performance of maps with multiplespills over maps with 1 spill
timelineperformance: A job that launches mappers to test timlineserverperformance.
分析:
当不带参数调用该jar包的时候他会列出所有测试程序
2.
执行:
hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.0-tests.jarTestDFSIO -write -nrFiles 10 -fileSize 10MB
结果:
stDFSIO -write -nrFiles 10 -fileSize 10MB
17/08/15 15:37:47 INFO fs.TestDFSIO:TestDFSIO.1.8
17/08/15 15:37:47 INFO fs.TestDFSIO:nrFiles = 10
17/08/15 15:37:47 INFO fs.TestDFSIO:nrBytes (MB) = 10.0
17/08/15 15:37:47 INFO fs.TestDFSIO:bufferSize = 1000000
17/08/15 15:37:47 INFO fs.TestDFSIO:baseDir = /benchmarks/TestDFSIO
17/08/15 15:37:48 INFO fs.TestDFSIO:creating control file: 10485760 bytes, 10 files
17/08/15 15:37:52 INFO fs.TestDFSIO:created control files for: 10 files
17/08/15 15:37:52 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050
17/08/15 15:37:52 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050
17/08/15 15:37:54 INFOmapred.FileInputFormat: Total input files to process : 10
17/08/15 15:37:55 INFOmapreduce.JobSubmitter: number of splits:10
17/08/15 15:37:55 INFO Configuration.deprecation:dfs.http.address is deprecated. Instead, use dfs.namenode.http-address
17/08/15 15:37:55 INFOConfiguration.deprecation: io.bytes.per.checksum is deprecated. Instead, usedfs.bytes-per-checksum
17/08/15 15:37:55 INFOmapreduce.JobSubmitter: Submitting tokens for job: job_1502782638345_0001
17/08/15 15:37:56 INFO impl.YarnClientImpl:Submitted application application_1502782638345_0001
17/08/15 15:37:56 INFO mapreduce.Job: Theurl to track the job: http://master:8088/proxy/application_1502782638345_0001/
17/08/15 15:37:56 INFO mapreduce.Job:Running job: job_1502782638345_0001
17/08/15 15:38:02 INFO mapreduce.Job: Jobjob_1502782638345_0001 running in uber mode : false
17/08/15 15:38:02 INFO mapreduce.Job: map 0% reduce 0%
17/08/15 15:38:10 INFO mapreduce.Job: map 60% reduce 0%
17/08/15 15:38:15 INFO mapreduce.Job: map 90% reduce 0%
17/08/15 15:38:16 INFO mapreduce.Job: map 100% reduce 100%
17/08/15 15:38:18 INFO mapreduce.Job: Jobjob_1502782638345_0001 completed successfully
17/08/15 15:38:18 INFO mapreduce.Job:Counters: 49
File System Counters
FILE: Number of bytes read=830
FILE: Number of byteswritten=1508153
FILE: Number of readoperations=0
FILE: Number of large readoperations=0
FILE: Number of writeoperations=0
HDFS: Number of bytes read=2320
HDFS: Number of byteswritten=104857678
HDFS: Number of read operations=43
HDFS: Number of large readoperations=0
HDFS: Number of writeoperations=12
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps inoccupied slots (ms)=52666
Total time spent by all reducesin occupied slots (ms)=4329
Total time spent by all maptasks (ms)=52666
Total time spent by all reducetasks (ms)=4329
Total vcore-milliseconds takenby all map tasks=52666
Total vcore-milliseconds takenby all reduce tasks=4329
Total megabyte-millisecondstaken by all map tasks=53929984
Total megabyte-millisecondstaken by all reduce tasks=4432896
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=724
Map output materializedbytes=884
Input split bytes=1200
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=884
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=1142
CPU time spent (ms)=4490
Physical memory (bytes)snapshot=2901250048
Virtual memory (bytes)snapshot=23245361152
Total committed heap usage(bytes)=2125987840
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=78
17/08/15 15:38:18 INFO fs.TestDFSIO: -----TestDFSIO ----- : write
17/08/15 15:38:18 INFO fs.TestDFSIO: Date & time: Tue Aug 1515:38:18 CST 2017
17/08/15 15:38:18 INFO fs.TestDFSIO: Number of files: 10
17/08/15 15:38:18 INFO fs.TestDFSIO: Total MBytes processed: 100
17/08/15 15:38:18 INFO fs.TestDFSIO: Throughput mb/sec: 25.58
17/08/15 15:38:18 INFO fs.TestDFSIO: TotalThroughput mb/sec: 0
17/08/15 15:38:18 INFO fs.TestDFSIO: Average IO rate mb/sec: 42.08
17/08/15 15:38:18 INFO fs.TestDFSIO: IO rate std deviation: 30.58
17/08/15 15:38:18 INFO fs.TestDFSIO: Test exec time sec: 26.26
17/08/15 15:38:18 INFO fs.TestDFSIO:
分析:
Hadoop自带的写入速度测试 写入了10个10M的文件
Throughput mb/sec: 25.58 每秒吞吐量25.58M/S
Average IO rate mb/sec: 42.08 平均读写速度42.08M/s
IO rate std deviation: 30.58 标准差30.58
Test exec time sec: 26.26 总共用时26.26s
用时详情:
17/08/15 15:38:02 INFO mapreduce.Job: map 0% reduce 0%
17/08/15 15:38:10 INFO mapreduce.Job: map 60% reduce 0%
17/08/15 15:38:15 INFO mapreduce.Job: map 90% reduce 0%
17/08/15 15:38:16 INFO mapreduce.Job: map 100% reduce 100%
日志(存在/usr/local/hadoop/sbin/TestDFSIO_results.log):
----- TestDFSIO ----- : write
Date & time: Tue Aug 15 15:38:18 CST 2017
Number of files: 10
Total MBytes processed: 100
Throughput mb/sec: 25.58
Total Throughput mb/sec: 0
Average IO rate mb/sec: 42.08
IOrate std deviation: 30.58
Test exec time sec: 26.26
3.
执行:
hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.0-tests.jarTestDFSIO -read -nrFiles 10 -fileSize 10MB
结果:
17/08/15 16:21:57 INFO fs.TestDFSIO:TestDFSIO.1.8
17/08/15 16:21:57 INFO fs.TestDFSIO:nrFiles = 10
17/08/15 16:21:57 INFO fs.TestDFSIO:nrBytes (MB) = 10.0
17/08/15 16:21:57 INFO fs.TestDFSIO:bufferSize = 1000000
17/08/15 16:21:57 INFO fs.TestDFSIO:baseDir = /benchmarks/TestDFSIO
17/08/15 16:21:58 INFO fs.TestDFSIO:creating control file: 10485760 bytes, 10 files
17/08/15 16:22:00 INFO fs.TestDFSIO:created control files for: 10 files
17/08/15 16:22:00 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050
17/08/15 16:22:00 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050
17/08/15 16:22:02 INFOmapred.FileInputFormat: Total input files to process : 10
17/08/15 16:22:02 INFOmapreduce.JobSubmitter: number of splits:10
17/08/15 16:22:03 INFOConfiguration.deprecation: dfs.http.address is deprecated. Instead, use dfs.namenode.http-address
17/08/15 16:22:03 INFOConfiguration.deprecation: io.bytes.per.checksum is deprecated. Instead, usedfs.bytes-per-checksum
17/08/15 16:22:03 INFOmapreduce.JobSubmitter: Submitting tokens for job: job_1502782638345_0002
17/08/15 16:22:03 INFO impl.YarnClientImpl:Submitted application application_1502782638345_0002
17/08/15 16:22:03 INFO mapreduce.Job: Theurl to track the job: http://master:8088/proxy/application_1502782638345_0002/
17/08/15 16:22:03 INFO mapreduce.Job:Running job: job_1502782638345_0002
17/08/15 16:22:08 INFO mapreduce.Job: Jobjob_1502782638345_0002 running in uber mode : false
17/08/15 16:22:08 INFO mapreduce.Job: map 0% reduce 0%
17/08/15 16:22:15 INFO mapreduce.Job: map 60% reduce 0%
17/08/15 16:22:19 INFO mapreduce.Job: map 80% reduce 0%
17/08/15 16:22:20 INFO mapreduce.Job: map 100% reduce 0%
17/08/15 16:22:22 INFO mapreduce.Job: map 100% reduce 100%
17/08/15 16:22:23 INFO mapreduce.Job: Jobjob_1502782638345_0002 completed successfully
17/08/15 16:22:23 INFO mapreduce.Job:Counters: 50
File System Counters
FILE: Number of bytes read=842
FILE: Number of byteswritten=1508155
FILE: Number of readoperations=0
FILE: Number of large readoperations=0
FILE: Number of writeoperations=0
HDFS: Number of bytesread=104859920
HDFS: Number of byteswritten=78
HDFS: Number of readoperations=53
HDFS: Number of large readoperations=0
HDFS: Number of writeoperations=2
Job Counters
Killed map tasks=1
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps inoccupied slots (ms)=47151
Total time spent by all reducesin occupied slots (ms)=3028
Total time spent by all maptasks (ms)=47151
Total time spent by all reducetasks (ms)=3028
Total vcore-milliseconds takenby all map tasks=47151
Total vcore-milliseconds takenby all reduce tasks=3028
Total megabyte-millisecondstaken by all map tasks=48282624
Total megabyte-millisecondstaken by all reduce tasks=3100672
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=736
Map output materializedbytes=896
Input split bytes=1200
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=896
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=1119
CPU time spent (ms)=3350
Physical memory (bytes)snapshot=2875269120
Virtual memory (bytes)snapshot=23217418240
Total committed heap usage(bytes)=2098724864
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=78
17/08/15 16:22:23 INFO fs.TestDFSIO: -----TestDFSIO ----- : read
17/08/15 16:22:23 INFO fs.TestDFSIO: Date & time: Tue Aug 1516:22:23 CST 2017
17/08/15 16:22:23 INFO fs.TestDFSIO: Number of files: 10
17/08/15 16:22:23 INFO fs.TestDFSIO: Total MBytes processed: 100
17/08/15 16:22:23 INFO fs.TestDFSIO: Throughput mb/sec: 301.2
17/08/15 16:22:23 INFO fs.TestDFSIO: TotalThroughput mb/sec: 0
17/08/15 16:22:23 INFO fs.TestDFSIO: Average IO rate mb/sec: 388.55
17/08/15 16:22:23 INFO fs.TestDFSIO: IO rate std deviation: 189.54
17/08/15 16:22:23 INFO fs.TestDFSIO: Test exec time sec: 22.47
17/08/15 16:22:23 INFO fs.TestDFSIO:
分析:
Hadoop自带的读取速度测试 读了10个10M的文件
Throughput mb/sec: 25.58 每秒吞吐量301.2M/S
Average IO rate mb/sec: 42.08 平均读写速度388.55M/s
IO rate std deviation: 30.58 标准差189.54
Test exec time sec: 26.26 总共用时22.47s
用时详情:
17/08/15 16:22:08 INFO mapreduce.Job: map 0% reduce 0%
17/08/15 16:22:15 INFO mapreduce.Job: map 60% reduce 0%
17/08/15 16:22:19 INFO mapreduce.Job: map 80% reduce 0%
17/08/15 16:22:20 INFO mapreduce.Job: map 100% reduce 0%
17/08/15 16:22:22 INFO mapreduce.Job: map 100% reduce
日志(存在/usr/local/hadoop/sbin/TestDFSIO_results.log):
----- TestDFSIO ----- : read
Date & time: Tue Aug 15 16:22:23 CST 2017
Number of files: 10
Total MBytes processed: 100
Throughput mb/sec: 301.2
Total Throughput mb/sec: 0
Average IO rate mb/sec: 388.55
IOrate std deviation: 189.54
Test exec time sec: 22.47
4.
执行:
hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.0-tests.jarTestDFSIO -clean
分析:
删掉测试文件
TeraSort:
一个完整的TeraSort测试需要按以下三步执行:
1. 用TeraGen生成随机数据
2. 对输入数据运行TeraSort
3. 用TeraValidate验证排好序的输出数据
1.
执行: hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/sources/hadoop-mapreduce- teragen1000000 /usr/local/hadoop/share/hadoop/mapreduce/in
结果:
17/08/15 20:10:44 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050
17/08/15 20:10:45 INFO terasort.TeraGen:Generating 1000000 using 2
17/08/15 20:10:46 INFOmapreduce.JobSubmitter: number of splits:2
17/08/15 20:10:46 INFOmapreduce.JobSubmitter: Submitting tokens for job: job_1502782638345_0004
17/08/15 20:10:46 INFO impl.YarnClientImpl:Submitted application application_1502782638345_0004
17/08/15 20:10:46 INFO mapreduce.Job: Theurl to track the job: http://master:8088/proxy/application_1502782638345_0004/
17/08/15 20:10:46 INFO mapreduce.Job:Running job: job_1502782638345_0004
17/08/15 20:10:52 INFO mapreduce.Job: Jobjob_1502782638345_0004 running in uber mode : false
17/08/15 20:10:52 INFO mapreduce.Job: map 0% reduce 0%
17/08/15 20:10:56 INFO mapreduce.Job: map 50% reduce 0%
17/08/15 20:10:57 INFO mapreduce.Job: map 100% reduce 0%
17/08/15 20:10:59 INFO mapreduce.Job: Jobjob_1502782638345_0004 completed successfully
17/08/15 20:10:59 INFO mapreduce.Job:Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=272344
FILE: Number of readoperations=0
FILE: Number of large readoperations=0
FILE: Number of writeoperations=0
HDFS: Number of bytes read=167
HDFS: Number of byteswritten=100000000
HDFS: Number of readoperations=8
HDFS: Number of large readoperations=0
HDFS: Number of writeoperations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps inoccupied slots (ms)=4678
Total time spent by all reducesin occupied slots (ms)=0
Total time spent by all maptasks (ms)=4678
Total vcore-milliseconds takenby all map tasks=4678
Total megabyte-millisecondstaken by all map tasks=4790272
Map-Reduce Framework
Map input records=1000000
Map output records=1000000
Input split bytes=167
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=91
CPU time spent (ms)=2210
Physical memory (bytes)snapshot=343617536
Virtual memory (bytes)snapshot=4223668224
Total committed heap usage(bytes)=220200960
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=2148987642402270
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=100000000
分析:
脑子抽风了 写了个这么长的路径
运行完才反应过来写在hdfs里面
然后hdfs里面就有/usr/local/hadoop/share/hadoop/mapreduce/in/一个这么长又空的文件
里面有两个文件 每个47.7M??? 不应该是50M么
很纳闷先放一边
哦 生成的是1000000*100b数据
要除以1024…
2.
执行:
hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jarterasort /usr/local/hadoop/share/hadoop/mapreduce/in/usr/local/hadoop/share/hadoop/mapreduce/out
结果:
17/08/15 20:31:21 INFO terasort.TeraSort:starting
17/08/15 20:31:22 INFOinput.FileInputFormat: Total input files to process : 2
Spent 62ms computing base-splits.
Spent 1ms computing TeraScheduler splits.
Computing input splits took 63ms
Sampling 2 splits of 2
Making 1 from 100000 sampled records
Computing parititions took 9196ms
Spent 9261ms computing partitions.
17/08/15 20:31:31 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050
17/08/15 20:31:33 INFOmapreduce.JobSubmitter: number of splits:2
17/08/15 20:31:33 INFOmapreduce.JobSubmitter: Submitting tokens for job: job_1502782638345_0006
17/08/15 20:31:33 INFO impl.YarnClientImpl:Submitted application application_1502782638345_0006
17/08/15 20:31:33 INFO mapreduce.Job: Theurl to track the job: http://master:8088/proxy/application_1502782638345_0006/
17/08/15 20:31:33 INFO mapreduce.Job:Running job: job_1502782638345_0006
17/08/15 20:31:37 INFO mapreduce.Job: Jobjob_1502782638345_0006 running in uber mode : false
17/08/15 20:31:37 INFO mapreduce.Job: map 0% reduce 0%
17/08/15 20:31:42 INFO mapreduce.Job: map 100% reduce 0%
17/08/15 20:31:48 INFO mapreduce.Job: map 100% reduce 100%
17/08/15 20:31:50 INFO mapreduce.Job: Jobjob_1502782638345_0006 completed successfully
17/08/15 20:31:50 INFO mapreduce.Job:Counters: 49
File System Counters
FILE: Number of bytesread=104000012
FILE: Number of byteswritten=208412532
FILE: Number of readoperations=0
FILE: Number of large read operations=0
FILE: Number of writeoperations=0
HDFS: Number of bytesread=100000278
HDFS: Number of byteswritten=100000000
HDFS: Number of readoperations=9
HDFS: Number of large readoperations=0
HDFS: Number of writeoperations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps inoccupied slots (ms)=6067
Total time spent by all reducesin occupied slots (ms)=2850
Total time spent by all maptasks (ms)=6067
Total time spent by all reducetasks (ms)=2850
Total vcore-milliseconds takenby all map tasks=6067
Total vcore-milliseconds takenby all reduce tasks=2850
Total megabyte-millisecondstaken by all map tasks=6212608
Total megabyte-millisecondstaken by all reduce tasks=2918400
Map-Reduce Framework
Map input records=1000000
Map output records=1000000
Map output bytes=102000000
Map output materializedbytes=104000012
Input split bytes=278
Combine input records=0
Combine output records=0
Reduce input groups=1000000
Reduce shuffle bytes=104000012
Reduce input records=1000000
Reduce outputrecords=1000000
Spilled Records=2000000
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=142
CPU time spent (ms)=5590
Physical memory (bytes)snapshot=723681280
Virtual memory (bytes)snapshot=6335279104
Total committed heap usage(bytes)=501743616
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=100000000
File Output Format Counters
Bytes Written=100000000
17/08/15 20:31:50 INFO terasort.TeraSort:done
分析:
时间:
17/08/15 20:31:37 INFO mapreduce.Job: map 0% reduce 0%
17/08/15 20:31:42 INFO mapreduce.Job: map 100% reduce 0%
17/08/15 20:31:48 INFO mapreduce.Job: map 100% reduce 100%
3
执行:
hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jarteravalidate /usr/local/hadoop/share/hadoop/mapreduce/out /report
结果:
17/08/15 20:56:06 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050
17/08/15 20:56:06 INFOinput.FileInputFormat: Total input files to process : 1
Spent 11ms computing base-splits.
Spent 1ms computing TeraScheduler splits.
17/08/15 20:56:07 INFOmapreduce.JobSubmitter: number of splits:1
17/08/15 20:56:08 INFOmapreduce.JobSubmitter: Submitting tokens for job: job_1502782638345_0007
17/08/15 20:56:08 INFO impl.YarnClientImpl:Submitted application application_1502782638345_0007
17/08/15 20:56:08 INFO mapreduce.Job: Theurl to track the job: http://master:8088/proxy/application_1502782638345_0007/
17/08/15 20:56:08 INFO mapreduce.Job:Running job: job_1502782638345_0007
17/08/15 20:56:12 INFO mapreduce.Job: Jobjob_1502782638345_0007 running in uber mode : false
17/08/15 20:56:12 INFO mapreduce.Job: map 0% reduce 0%
17/08/15 20:56:16 INFO mapreduce.Job: map 100% reduce 0%
17/08/15 20:56:20 INFO mapreduce.Job: map 100% reduce 100%
17/08/15 20:56:22 INFO mapreduce.Job: Jobjob_1502782638345_0007 completed successfully
17/08/15 20:56:22 INFO mapreduce.Job:Counters: 49
File System Counters
FILE: Number of bytes read=93
FILE: Number of bytes written=273301
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of writeoperations=0
HDFS: Number of bytes read=100000140
HDFS: Number of bytes written=23
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=1832
Total time spent by all reduces in occupied slots (ms)=1974
Total time spent by all map tasks (ms)=1832
Total time spent by all reduce tasks (ms)=1974
Total vcore-milliseconds taken by all map tasks=1832
Total vcore-milliseconds taken by all reduce tasks=1974
Total megabyte-milliseconds taken by all map tasks=1875968
Total megabyte-milliseconds taken by all reduce tasks=2021376
Map-Reduce Framework
Map input records=1000000
Map output records=3
Map output bytes=81
Map output materialized bytes=93
Input split bytes=140
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=93
Reduce input records=3
Reduce output records=1
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=78
CPU time spent (ms)=1470
Physical memory (bytes)snapshot=447696896
Virtual memory (bytes) snapshot=4226416640
Total committed heap usage (bytes)=311427072
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=100000000
File Output Format Counters
Bytes Written=23
分析:
用于检测输出的结果对不对,并且把这些错误的记录放在输出目录中
参考资料:
http://7543154.blog.51cto.com/7533154/1243883
----注意,他的版本很老了,很多东西已经改过了,记得对照下面的配合食用 味道更佳
http://blog.csdn.net/flygoa/article/details/52127382
------------------以上是hadoop的测试----------------------------------------------------------------------------
-----------------以下是spark的测试----------------------------------------------------------------------------
sparkBench测试:
安装过程不啰嗦了,
git clonehttps://github.com/synhershko/wikixmlj.git
cd wikixmlj
mvn package install
git clone https://github.com/SparkTC/spark-bench
运行./bin/build-all.sh
可以通过修改SPARK_BENCH_HOME的conf目录下的env.sh对配置Spark-Bench环境
SPARK_HOME=/usr/local/spark
HADOOP_HOME=/usr/local/Hadoop
SPARK_MASTER=spark://master:7077
HDFS_MASTER=hdfs://master:9000/
直接进入相应案例的目录的bin目录下,先自动生成测试数据,然后再运行
Workload是工作模式,什么逻辑回归之类的
详情见:
Machine Learning Workloads:
- Logistic Regression
- Support Vector Machine
- Matrix Factorization
Graph Computation Workloads:
- PageRank
- SVD++
- Triangle Count
SQL Workloads:
- Hive
- RDD Relation
Streaming Workloads:
- Twitter Tag
- Page View
Other Workloads:
- KMeans, LinearRegression, DecisionTree, ShortestPaths, LabelPropagation, ConnectedComponent, StronglyConnectedComponent, PregelOperation
Supported Apache Spark releases:
- Spark 2.0.1, this code is branched for release 2.0.1, note that these versions need a later version of scala and as such there are changes to pom files.
<SPARK_BENCH_HOME>/<Workload>/bin/gen_data.sh
<SPARK_BENCH_HOME>/<Workload>/bin/run.sh
可以直接去<SPARK_BENCH_HOME>/num 目录下去查看最后的结果
但是结果就是一个Mean SquaredError = 0.003946153574496333
平均方差?
然后 … 我也不知道了….
大概是这样的
配置文件详细如下:
[root@master spark-bench]# cat./conf/env.sh
# global settings
#master="pts00450-vm16"
master="master"
#A list of machines where the spark clusteris running
#MC_LIST="pts00450-vm22pts00450-vm23"
MC_LIST="master"
#####
[ -z "$HADOOP_HOME" ]&& export HADOOP_HOME=/usr/local/hadoop
# base dir for DataSet
#####:9000
HDFS_URL="hdfs://${master}:9000"
SPARK_HADOOP_FS_LOCAL_BLOCK_SIZE=536870912
#DATA_HDFS="hdfs://${master}:9000/SparkBench","file:///home/`whoami`/SparkBench"
DATA_HDFS="hdfs://${master}:9000/SparkBench"
#Local dataset optional
DATASET_DIR=/home/`whoami`/SparkBench/dataset
SPARK_VERSION=2.0.1 #1.5.1
[ -z "$SPARK_HOME" ]&& exportSPARK_HOME=/usr/local/spark
#SPARK_MASTER=local
#SPARK_MASTER=local[K]
#SPARK_MASTER=local[*]
#SPARK_MASTER=spark://HOST:PORT
##SPARK_MASTER=mesos://HOST:PORT
##SPARK_MASTER=yarn-client
#####SPARK_MASTER=yarn
MASTER=yarn
YARN_DEPLOY_MODE=client # or cluster, thiswill go to spark submit as --deploy-mode
SPARK_RPC_ASKTIMEOUT=500
SPARK_MASTER=spark://${master}:7077
# Spark config in environment variable oraruments of spark-submit
# - SPARK_SERIALIZER, --confspark.serializer
# - SPARK_RDD_COMPRESS, --confspark.rdd.compress
# - SPARK_IO_COMPRESSION_CODEC, --confspark.io.compression.codec
# - SPARK_DEFAULT_PARALLELISM, --confspark.default.parallelism
SPARK_SERIALIZER=org.apache.spark.serializer.KryoSerializer
SPARK_RDD_COMPRESS=false
SPARK_IO_COMPRESSION_CODEC=lzf
# Spark options in system.property orarguments of spark-submit
# - SPARK_EXECUTOR_MEMORY, --conf spark.executor.memory
# - SPARK_STORAGE_MEMORYFRACTION, --confspark.storage.memoryfraction
#SPARK_STORAGE_MEMORYFRACTION=0.5
SPARK_EXECUTOR_MEMORY=1g
#export MEM_FRACTION_GLOBAL=0.005
# Spark options in YARN client mode
# - SPARK_DRIVER_MEMORY, --driver-memory
# - SPARK_EXECUTOR_INSTANCES,--num-executors
# - SPARK_EXECUTOR_CORES, --executor-cores
# - SPARK_DRIVER_MEMORY, --driver-memory
#export EXECUTOR_GLOBAL_MEM=2g
#export executor_cores=2
export SPARK_DRIVER_MEMORY=2g
export SPARK_EXECUTOR_INSTANCES=4
export SPARK_EXECUTOR_CORES=1
# Storage levels, seehttp://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/StorageLevels.html
# - STORAGE_LEVEL, set MEMORY_AND_DISK,MEMORY_AND_DISK_SER, MEMORY_ONLY, MEMORY_ONLY_SER, or DISK_ONLY
STORAGE_LEVEL=MEMORY_AND_DISK
# for data generation
NUM_OF_PARTITIONS=2
# for running
NUM_TRIALS=1
我可以在master:8080中看到
确实已经在spark中运行了这几个任务