hadoop 和spark的基准测试(1)

Hadoop 2.8.0 基准测试

1.查看jar包命令

2.建立乱序100M数据

3.排序

4.删除文件

 

 

1.

执行:

hadoop jar../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.0-tests.jar

结果:

 Anexample program must be given as the first argument.

Valid program names are:

 DFSCIOTest: Distributed i/o benchmark of libhdfs.

 DistributedFSCheck: Distributed checkup of the file system consistency.

 JHLogAnalyzer: Job History Log analyzer.

 MRReliabilityTest: A program that tests the reliability of the MRframework by injecting faults/failures

 NNdataGenerator: Generate the data to be used by NNloadGenerator

 NNloadGenerator: Generate load on Namenode using NN loadgenerator runWITHOUT MR

 NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator runas MR job

 NNstructureGenerator: Generate the structure to be used byNNdataGenerator

 SliveTest: HDFS Stress Test and Live Data Verification.

 TestDFSIO: Distributed i/o benchmark.

 fail: a job that always fails

 filebench: Benchmark SequenceFile(Input|Output)Format (block,recordcompressed and uncompressed), Text(Input|Output)Format (compressed anduncompressed)

 largesorter: Large-Sort tester

 loadgen: Generic map/reduce load generator

 mapredtest: A map/reduce test check.

 minicluster: Single process HDFS and MR cluster.

 mrbench: A map/reduce benchmark that can create many small jobs

 nnbench: A benchmark that stresses the namenode w/ MR.

 nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR.

 sleep: A job that sleeps at each map and reduce task.

 testbigmapoutput: A map/reduce program that works on a very bignon-splittable file and does identity map/reduce

 testfilesystem: A test for FileSystem read/write.

 testmapredsort: A map/reduce program that validates the map-reduceframework's sort.

 testsequencefile: A test for flat files of binary key value pairs.

 testsequencefileinputformat: A test for sequence file input format.

 testtextinputformat: A test for text input format.

  threadedmapbench:A map/reduce benchmark that compares the performance of maps with multiplespills over maps with 1 spill

 timelineperformance: A job that launches mappers to test timlineserverperformance.

分析:

当不带参数调用该jar包的时候他会列出所有测试程序

 

 

2.

执行:

hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.0-tests.jarTestDFSIO -write -nrFiles 10 -fileSize 10MB

结果:

stDFSIO -write -nrFiles 10 -fileSize 10MB

17/08/15 15:37:47 INFO fs.TestDFSIO:TestDFSIO.1.8

17/08/15 15:37:47 INFO fs.TestDFSIO:nrFiles = 10

17/08/15 15:37:47 INFO fs.TestDFSIO:nrBytes (MB) = 10.0

17/08/15 15:37:47 INFO fs.TestDFSIO:bufferSize = 1000000

17/08/15 15:37:47 INFO fs.TestDFSIO:baseDir = /benchmarks/TestDFSIO

17/08/15 15:37:48 INFO fs.TestDFSIO:creating control file: 10485760 bytes, 10 files

17/08/15 15:37:52 INFO fs.TestDFSIO:created control files for: 10 files

17/08/15 15:37:52 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050

17/08/15 15:37:52 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050

17/08/15 15:37:54 INFOmapred.FileInputFormat: Total input files to process : 10

17/08/15 15:37:55 INFOmapreduce.JobSubmitter: number of splits:10

17/08/15 15:37:55 INFO Configuration.deprecation:dfs.http.address is deprecated. Instead, use dfs.namenode.http-address

17/08/15 15:37:55 INFOConfiguration.deprecation: io.bytes.per.checksum is deprecated. Instead, usedfs.bytes-per-checksum

17/08/15 15:37:55 INFOmapreduce.JobSubmitter: Submitting tokens for job: job_1502782638345_0001

17/08/15 15:37:56 INFO impl.YarnClientImpl:Submitted application application_1502782638345_0001

17/08/15 15:37:56 INFO mapreduce.Job: Theurl to track the job: http://master:8088/proxy/application_1502782638345_0001/

17/08/15 15:37:56 INFO mapreduce.Job:Running job: job_1502782638345_0001

17/08/15 15:38:02 INFO mapreduce.Job: Jobjob_1502782638345_0001 running in uber mode : false

17/08/15 15:38:02 INFO mapreduce.Job:  map 0% reduce 0%

17/08/15 15:38:10 INFO mapreduce.Job:  map 60% reduce 0%

17/08/15 15:38:15 INFO mapreduce.Job:  map 90% reduce 0%

17/08/15 15:38:16 INFO mapreduce.Job:  map 100% reduce 100%

17/08/15 15:38:18 INFO mapreduce.Job: Jobjob_1502782638345_0001 completed successfully

17/08/15 15:38:18 INFO mapreduce.Job:Counters: 49

       File System Counters

                FILE: Number of bytes read=830

                FILE: Number of byteswritten=1508153

                FILE: Number of readoperations=0

                FILE: Number of large readoperations=0

                FILE: Number of writeoperations=0

                HDFS: Number of bytes read=2320

                HDFS: Number of byteswritten=104857678

                HDFS: Number of read operations=43

                HDFS: Number of large readoperations=0

                HDFS: Number of writeoperations=12

       Job Counters

                Launched map tasks=10

                Launched reduce tasks=1

                Data-local map tasks=10

                Total time spent by all maps inoccupied slots (ms)=52666

                Total time spent by all reducesin occupied slots (ms)=4329

                Total time spent by all maptasks (ms)=52666

                Total time spent by all reducetasks (ms)=4329

                Total vcore-milliseconds takenby all map tasks=52666

                Total vcore-milliseconds takenby all reduce tasks=4329

                Total megabyte-millisecondstaken by all map tasks=53929984

                Total megabyte-millisecondstaken by all reduce tasks=4432896

       Map-Reduce Framework

                Map input records=10

                Map output records=50

                Map output bytes=724

                Map output materializedbytes=884

               Input split bytes=1200

                Combine input records=0

                Combine output records=0

                Reduce input groups=5

                Reduce shuffle bytes=884

                Reduce input records=50

                Reduce output records=5

                Spilled Records=100

                Shuffled Maps =10

                Failed Shuffles=0

                Merged Map outputs=10

                GC time elapsed (ms)=1142

                CPU time spent (ms)=4490

                Physical memory (bytes)snapshot=2901250048

                Virtual memory (bytes)snapshot=23245361152

                Total committed heap usage(bytes)=2125987840

       Shuffle Errors

                BAD_ID=0

                CONNECTION=0

                IO_ERROR=0

                WRONG_LENGTH=0

                WRONG_MAP=0

                WRONG_REDUCE=0

       File Input Format Counters

                Bytes Read=1120

       File Output Format Counters

                Bytes Written=78

17/08/15 15:38:18 INFO fs.TestDFSIO: -----TestDFSIO ----- : write

17/08/15 15:38:18 INFO fs.TestDFSIO:             Date & time: Tue Aug 1515:38:18 CST 2017

17/08/15 15:38:18 INFO fs.TestDFSIO:         Number of files: 10

17/08/15 15:38:18 INFO fs.TestDFSIO:  Total MBytes processed: 100

17/08/15 15:38:18 INFO fs.TestDFSIO:       Throughput mb/sec: 25.58

17/08/15 15:38:18 INFO fs.TestDFSIO: TotalThroughput mb/sec: 0

17/08/15 15:38:18 INFO fs.TestDFSIO:  Average IO rate mb/sec: 42.08

17/08/15 15:38:18 INFO fs.TestDFSIO:   IO rate std deviation: 30.58

17/08/15 15:38:18 INFO fs.TestDFSIO:      Test exec time sec: 26.26

17/08/15 15:38:18 INFO fs.TestDFSIO:

分析:

Hadoop自带的写入速度测试 写入了10个10M的文件

Throughput mb/sec: 25.58   每秒吞吐量25.58M/S

Average IO rate mb/sec: 42.08   平均读写速度42.08M/s

IO rate std deviation: 30.58    标准差30.58

Test exec time sec: 26.26    总共用时26.26s

 

用时详情:

17/08/15 15:38:02 INFO mapreduce.Job:  map 0% reduce 0%

17/08/15 15:38:10 INFO mapreduce.Job:  map 60% reduce 0%

17/08/15 15:38:15 INFO mapreduce.Job:  map 90% reduce 0%

17/08/15 15:38:16 INFO mapreduce.Job:  map 100% reduce 100%

 

日志(存在/usr/local/hadoop/sbin/TestDFSIO_results.log):

----- TestDFSIO ----- : write

           Date & time: Tue Aug 15 15:38:18 CST 2017

       Number of files: 10

 Total MBytes processed: 100

     Throughput mb/sec: 25.58

Total Throughput mb/sec: 0

 Average IO rate mb/sec: 42.08

  IOrate std deviation: 30.58

    Test exec time sec: 26.26

 

3.

执行:

hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.0-tests.jarTestDFSIO -read -nrFiles 10 -fileSize 10MB

结果:

17/08/15 16:21:57 INFO fs.TestDFSIO:TestDFSIO.1.8

17/08/15 16:21:57 INFO fs.TestDFSIO:nrFiles = 10

17/08/15 16:21:57 INFO fs.TestDFSIO:nrBytes (MB) = 10.0

17/08/15 16:21:57 INFO fs.TestDFSIO:bufferSize = 1000000

17/08/15 16:21:57 INFO fs.TestDFSIO:baseDir = /benchmarks/TestDFSIO

17/08/15 16:21:58 INFO fs.TestDFSIO:creating control file: 10485760 bytes, 10 files

17/08/15 16:22:00 INFO fs.TestDFSIO:created control files for: 10 files

17/08/15 16:22:00 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050

17/08/15 16:22:00 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050

17/08/15 16:22:02 INFOmapred.FileInputFormat: Total input files to process : 10

17/08/15 16:22:02 INFOmapreduce.JobSubmitter: number of splits:10

17/08/15 16:22:03 INFOConfiguration.deprecation: dfs.http.address is deprecated. Instead, use dfs.namenode.http-address

17/08/15 16:22:03 INFOConfiguration.deprecation: io.bytes.per.checksum is deprecated. Instead, usedfs.bytes-per-checksum

17/08/15 16:22:03 INFOmapreduce.JobSubmitter: Submitting tokens for job: job_1502782638345_0002

17/08/15 16:22:03 INFO impl.YarnClientImpl:Submitted application application_1502782638345_0002

17/08/15 16:22:03 INFO mapreduce.Job: Theurl to track the job: http://master:8088/proxy/application_1502782638345_0002/

17/08/15 16:22:03 INFO mapreduce.Job:Running job: job_1502782638345_0002

17/08/15 16:22:08 INFO mapreduce.Job: Jobjob_1502782638345_0002 running in uber mode : false

17/08/15 16:22:08 INFO mapreduce.Job:  map 0% reduce 0%

17/08/15 16:22:15 INFO mapreduce.Job:  map 60% reduce 0%

17/08/15 16:22:19 INFO mapreduce.Job:  map 80% reduce 0%

17/08/15 16:22:20 INFO mapreduce.Job:  map 100% reduce 0%

17/08/15 16:22:22 INFO mapreduce.Job:  map 100% reduce 100%

17/08/15 16:22:23 INFO mapreduce.Job: Jobjob_1502782638345_0002 completed successfully

17/08/15 16:22:23 INFO mapreduce.Job:Counters: 50

       File System Counters

                FILE: Number of bytes read=842

                FILE: Number of byteswritten=1508155

                FILE: Number of readoperations=0

                FILE: Number of large readoperations=0

                FILE: Number of writeoperations=0

                HDFS: Number of bytesread=104859920

                HDFS: Number of byteswritten=78

                HDFS: Number of readoperations=53

                HDFS: Number of large readoperations=0

                HDFS: Number of writeoperations=2

       Job Counters

                Killed map tasks=1

                Launched map tasks=10

                Launched reduce tasks=1

                Data-local map tasks=10

                Total time spent by all maps inoccupied slots (ms)=47151

                Total time spent by all reducesin occupied slots (ms)=3028

                Total time spent by all maptasks (ms)=47151

                Total time spent by all reducetasks (ms)=3028

                Total vcore-milliseconds takenby all map tasks=47151

                Total vcore-milliseconds takenby all reduce tasks=3028

                Total megabyte-millisecondstaken by all map tasks=48282624

                Total megabyte-millisecondstaken by all reduce tasks=3100672

       Map-Reduce Framework

                Map input records=10

                Map output records=50

                Map output bytes=736

                Map output materializedbytes=896

                Input split bytes=1200

                Combine input records=0

                Combine output records=0

                Reduce input groups=5

                Reduce shuffle bytes=896

                Reduce input records=50

                Reduce output records=5

                Spilled Records=100

                Shuffled Maps =10

                Failed Shuffles=0

                Merged Map outputs=10

                GC time elapsed (ms)=1119

                CPU time spent (ms)=3350

               Physical memory (bytes)snapshot=2875269120

                Virtual memory (bytes)snapshot=23217418240

                Total committed heap usage(bytes)=2098724864

       Shuffle Errors

                BAD_ID=0

                CONNECTION=0

                IO_ERROR=0

                WRONG_LENGTH=0

                WRONG_MAP=0

                WRONG_REDUCE=0

       File Input Format Counters

                Bytes Read=1120

       File Output Format Counters

                Bytes Written=78

17/08/15 16:22:23 INFO fs.TestDFSIO: -----TestDFSIO ----- : read

17/08/15 16:22:23 INFO fs.TestDFSIO:             Date & time: Tue Aug 1516:22:23 CST 2017

17/08/15 16:22:23 INFO fs.TestDFSIO:         Number of files: 10

17/08/15 16:22:23 INFO fs.TestDFSIO:  Total MBytes processed: 100

17/08/15 16:22:23 INFO fs.TestDFSIO:       Throughput mb/sec: 301.2

17/08/15 16:22:23 INFO fs.TestDFSIO: TotalThroughput mb/sec: 0

17/08/15 16:22:23 INFO fs.TestDFSIO:  Average IO rate mb/sec: 388.55

17/08/15 16:22:23 INFO fs.TestDFSIO:   IO rate std deviation: 189.54

17/08/15 16:22:23 INFO fs.TestDFSIO:      Test exec time sec: 22.47

17/08/15 16:22:23 INFO fs.TestDFSIO:

分析:

Hadoop自带的读取速度测试 读了10个10M的文件

Throughput mb/sec: 25.58   每秒吞吐量301.2M/S

Average IO rate mb/sec: 42.08   平均读写速度388.55M/s

IO rate std deviation: 30.58    标准差189.54

Test exec time sec: 26.26    总共用时22.47s

 

 

用时详情:

17/08/15 16:22:08 INFO mapreduce.Job:  map 0% reduce 0%

17/08/15 16:22:15 INFO mapreduce.Job:  map 60% reduce 0%

17/08/15 16:22:19 INFO mapreduce.Job:  map 80% reduce 0%

17/08/15 16:22:20 INFO mapreduce.Job:  map 100% reduce 0%

17/08/15 16:22:22 INFO mapreduce.Job:  map 100% reduce

 

日志(存在/usr/local/hadoop/sbin/TestDFSIO_results.log):

 

----- TestDFSIO ----- : read

           Date & time: Tue Aug 15 16:22:23 CST 2017

       Number of files: 10

 Total MBytes processed: 100

     Throughput mb/sec: 301.2

Total Throughput mb/sec: 0

 Average IO rate mb/sec: 388.55

  IOrate std deviation: 189.54

    Test exec time sec: 22.47

 

 

 

 

 

 

4.

执行:

hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.0-tests.jarTestDFSIO -clean

分析:

删掉测试文件

 

 

 

TeraSort:

一个完整的TeraSort测试需要按以下三步执行:

1.  用TeraGen生成随机数据

2.  对输入数据运行TeraSort

3.  用TeraValidate验证排好序的输出数据

 

1.

执行: hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/sources/hadoop-mapreduce- teragen1000000 /usr/local/hadoop/share/hadoop/mapreduce/in

结果:

17/08/15 20:10:44 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050

17/08/15 20:10:45 INFO terasort.TeraGen:Generating 1000000 using 2

17/08/15 20:10:46 INFOmapreduce.JobSubmitter: number of splits:2

17/08/15 20:10:46 INFOmapreduce.JobSubmitter: Submitting tokens for job: job_1502782638345_0004

17/08/15 20:10:46 INFO impl.YarnClientImpl:Submitted application application_1502782638345_0004

17/08/15 20:10:46 INFO mapreduce.Job: Theurl to track the job: http://master:8088/proxy/application_1502782638345_0004/

17/08/15 20:10:46 INFO mapreduce.Job:Running job: job_1502782638345_0004

17/08/15 20:10:52 INFO mapreduce.Job: Jobjob_1502782638345_0004 running in uber mode : false

17/08/15 20:10:52 INFO mapreduce.Job:  map 0% reduce 0%

17/08/15 20:10:56 INFO mapreduce.Job:  map 50% reduce 0%

17/08/15 20:10:57 INFO mapreduce.Job:  map 100% reduce 0%

17/08/15 20:10:59 INFO mapreduce.Job: Jobjob_1502782638345_0004 completed successfully

17/08/15 20:10:59 INFO mapreduce.Job:Counters: 31

       File System Counters

                FILE: Number of bytes read=0

                FILE: Number of bytes written=272344

                FILE: Number of readoperations=0

                FILE: Number of large readoperations=0

                FILE: Number of writeoperations=0

                HDFS: Number of bytes read=167

                HDFS: Number of byteswritten=100000000

                HDFS: Number of readoperations=8

                HDFS: Number of large readoperations=0

                HDFS: Number of writeoperations=4

       Job Counters

                Launched map tasks=2

                Other local map tasks=2

                Total time spent by all maps inoccupied slots (ms)=4678

                Total time spent by all reducesin occupied slots (ms)=0

                Total time spent by all maptasks (ms)=4678

                Total vcore-milliseconds takenby all map tasks=4678

                Total megabyte-millisecondstaken by all map tasks=4790272

       Map-Reduce Framework

                Map input records=1000000

                Map output records=1000000

                Input split bytes=167

                Spilled Records=0

                Failed Shuffles=0

                Merged Map outputs=0

                GC time elapsed (ms)=91

                CPU time spent (ms)=2210

                Physical memory (bytes)snapshot=343617536

                Virtual memory (bytes)snapshot=4223668224

                Total committed heap usage(bytes)=220200960

       org.apache.hadoop.examples.terasort.TeraGen$Counters

                CHECKSUM=2148987642402270

       File Input Format Counters

                Bytes Read=0

       File Output Format Counters

                Bytes Written=100000000

分析:

脑子抽风了  写了个这么长的路径

运行完才反应过来写在hdfs里面

然后hdfs里面就有/usr/local/hadoop/share/hadoop/mapreduce/in/一个这么长又空的文件

里面有两个文件  每个47.7M???  不应该是50M么

很纳闷先放一边

哦  生成的是1000000*100b数据

要除以1024…

 

 

2.

执行:

hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jarterasort /usr/local/hadoop/share/hadoop/mapreduce/in/usr/local/hadoop/share/hadoop/mapreduce/out

结果:

17/08/15 20:31:21 INFO terasort.TeraSort:starting

17/08/15 20:31:22 INFOinput.FileInputFormat: Total input files to process : 2

Spent 62ms computing base-splits.

Spent 1ms computing TeraScheduler splits.

Computing input splits took 63ms

Sampling 2 splits of 2

Making 1 from 100000 sampled records

Computing parititions took 9196ms

Spent 9261ms computing partitions.

17/08/15 20:31:31 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050

17/08/15 20:31:33 INFOmapreduce.JobSubmitter: number of splits:2

17/08/15 20:31:33 INFOmapreduce.JobSubmitter: Submitting tokens for job: job_1502782638345_0006

17/08/15 20:31:33 INFO impl.YarnClientImpl:Submitted application application_1502782638345_0006

17/08/15 20:31:33 INFO mapreduce.Job: Theurl to track the job: http://master:8088/proxy/application_1502782638345_0006/

17/08/15 20:31:33 INFO mapreduce.Job:Running job: job_1502782638345_0006

17/08/15 20:31:37 INFO mapreduce.Job: Jobjob_1502782638345_0006 running in uber mode : false

17/08/15 20:31:37 INFO mapreduce.Job:  map 0% reduce 0%

17/08/15 20:31:42 INFO mapreduce.Job:  map 100% reduce 0%

17/08/15 20:31:48 INFO mapreduce.Job:  map 100% reduce 100%

17/08/15 20:31:50 INFO mapreduce.Job: Jobjob_1502782638345_0006 completed successfully

17/08/15 20:31:50 INFO mapreduce.Job:Counters: 49

       File System Counters

                FILE: Number of bytesread=104000012

                FILE: Number of byteswritten=208412532

                FILE: Number of readoperations=0

                FILE: Number of large read operations=0

                FILE: Number of writeoperations=0

                HDFS: Number of bytesread=100000278

                HDFS: Number of byteswritten=100000000

                HDFS: Number of readoperations=9

                HDFS: Number of large readoperations=0

                HDFS: Number of writeoperations=2

       Job Counters

                Launched map tasks=2

                Launched reduce tasks=1

                Data-local map tasks=2

                Total time spent by all maps inoccupied slots (ms)=6067

                Total time spent by all reducesin occupied slots (ms)=2850

                Total time spent by all maptasks (ms)=6067

                Total time spent by all reducetasks (ms)=2850

                Total vcore-milliseconds takenby all map tasks=6067

                Total vcore-milliseconds takenby all reduce tasks=2850

                Total megabyte-millisecondstaken by all map tasks=6212608

                Total megabyte-millisecondstaken by all reduce tasks=2918400

       Map-Reduce Framework

                Map input records=1000000

                Map output records=1000000

                Map output bytes=102000000

                Map output materializedbytes=104000012

                Input split bytes=278

                Combine input records=0

                Combine output records=0

                Reduce input groups=1000000

                Reduce shuffle bytes=104000012

                Reduce input records=1000000

               Reduce outputrecords=1000000

                Spilled Records=2000000

                Shuffled Maps =2

                Failed Shuffles=0

                Merged Map outputs=2

                GC time elapsed (ms)=142

                CPU time spent (ms)=5590

                Physical memory (bytes)snapshot=723681280

                Virtual memory (bytes)snapshot=6335279104

                Total committed heap usage(bytes)=501743616

       Shuffle Errors

                BAD_ID=0

                CONNECTION=0

                IO_ERROR=0

                WRONG_LENGTH=0

                WRONG_MAP=0

                WRONG_REDUCE=0

       File Input Format Counters

                Bytes Read=100000000

       File Output Format Counters

                Bytes Written=100000000

17/08/15 20:31:50 INFO terasort.TeraSort:done

分析:

时间:

17/08/15 20:31:37 INFO mapreduce.Job:  map 0% reduce 0%

17/08/15 20:31:42 INFO mapreduce.Job:  map 100% reduce 0%

17/08/15 20:31:48 INFO mapreduce.Job:  map 100% reduce 100%

 

3

执行:

hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jarteravalidate /usr/local/hadoop/share/hadoop/mapreduce/out /report

结果:

17/08/15 20:56:06 INFO client.RMProxy:Connecting to ResourceManager at master/172.28.94.34:8050

17/08/15 20:56:06 INFOinput.FileInputFormat: Total input files to process : 1

Spent 11ms computing base-splits.

Spent 1ms computing TeraScheduler splits.

17/08/15 20:56:07 INFOmapreduce.JobSubmitter: number of splits:1

17/08/15 20:56:08 INFOmapreduce.JobSubmitter: Submitting tokens for job: job_1502782638345_0007

17/08/15 20:56:08 INFO impl.YarnClientImpl:Submitted application application_1502782638345_0007

17/08/15 20:56:08 INFO mapreduce.Job: Theurl to track the job: http://master:8088/proxy/application_1502782638345_0007/

17/08/15 20:56:08 INFO mapreduce.Job:Running job: job_1502782638345_0007

17/08/15 20:56:12 INFO mapreduce.Job: Jobjob_1502782638345_0007 running in uber mode : false

17/08/15 20:56:12 INFO mapreduce.Job:  map 0% reduce 0%

17/08/15 20:56:16 INFO mapreduce.Job:  map 100% reduce 0%

17/08/15 20:56:20 INFO mapreduce.Job:  map 100% reduce 100%

17/08/15 20:56:22 INFO mapreduce.Job: Jobjob_1502782638345_0007 completed successfully

17/08/15 20:56:22 INFO mapreduce.Job:Counters: 49

       File System Counters

          FILE: Number of bytes read=93

          FILE: Number of bytes written=273301

          FILE: Number of read operations=0

          FILE: Number of large read operations=0

          FILE: Number of writeoperations=0

          HDFS: Number of bytes read=100000140

          HDFS: Number of bytes written=23

          HDFS: Number of read operations=6

          HDFS: Number of large read operations=0

          HDFS: Number of write operations=2

       Job Counters

          Launched map tasks=1

          Launched reduce tasks=1

          Data-local map tasks=1

          Total time spent by all maps in occupied slots (ms)=1832

          Total time spent by all reduces in occupied slots (ms)=1974

          Total time spent by all map tasks (ms)=1832

          Total time spent by all reduce tasks (ms)=1974

          Total vcore-milliseconds taken by all map tasks=1832

          Total vcore-milliseconds taken by all reduce tasks=1974

          Total megabyte-milliseconds taken by all map tasks=1875968

          Total megabyte-milliseconds taken by all reduce tasks=2021376

       Map-Reduce Framework

          Map input records=1000000

          Map output records=3

          Map output bytes=81

          Map output materialized bytes=93

          Input split bytes=140

          Combine input records=0

          Combine output records=0

          Reduce input groups=3

          Reduce shuffle bytes=93

          Reduce input records=3

          Reduce output records=1

          Spilled Records=6

          Shuffled Maps =1

          Failed Shuffles=0

          Merged Map outputs=1

          GC time elapsed (ms)=78

          CPU time spent (ms)=1470

          Physical memory (bytes)snapshot=447696896

          Virtual memory (bytes) snapshot=4226416640

          Total committed heap usage (bytes)=311427072

       Shuffle Errors

          BAD_ID=0

          CONNECTION=0

          IO_ERROR=0

          WRONG_LENGTH=0

          WRONG_MAP=0

          WRONG_REDUCE=0

       File Input Format Counters

          Bytes Read=100000000

       File Output Format Counters

          Bytes Written=23

分析:

用于检测输出的结果对不对,并且把这些错误的记录放在输出目录中

 

 

 

 

参考资料:

http://7543154.blog.51cto.com/7533154/1243883  

----注意,他的版本很老了,很多东西已经改过了,记得对照下面的配合食用  味道更佳

http://blog.csdn.net/flygoa/article/details/52127382

 

 

------------------以上是hadoop的测试----------------------------------------------------------------------------

 

-----------------以下是spark的测试----------------------------------------------------------------------------

 

sparkBench测试:

安装过程不啰嗦了,

git clonehttps://github.com/synhershko/wikixmlj.git

cd wikixmlj

mvn package install

 

git clone  https://github.com/SparkTC/spark-bench

运行./bin/build-all.sh

 

 

 

可以通过修改SPARK_BENCH_HOME的conf目录下的env.sh对配置Spark-Bench环境

SPARK_HOME=/usr/local/spark

HADOOP_HOME=/usr/local/Hadoop

SPARK_MASTER=spark://master:7077

HDFS_MASTER=hdfs://master:9000/

 

 

 

直接进入相应案例的目录的bin目录下,先自动生成测试数据,然后再运行

Workload是工作模式,什么逻辑回归之类的

详情见:

Machine Learning Workloads:

  • Logistic Regression
  • Support Vector Machine
  • Matrix Factorization

Graph Computation Workloads:

  • PageRank
  • SVD++
  • Triangle Count

SQL Workloads:

  • Hive
  • RDD Relation

Streaming Workloads:

  • Twitter Tag
  • Page View

Other Workloads:

  • KMeans, LinearRegression, DecisionTree, ShortestPaths, LabelPropagation, ConnectedComponent, StronglyConnectedComponent, PregelOperation

Supported Apache Spark releases:

  • Spark 2.0.1, this code is branched for release 2.0.1, note that these versions need a later version of scala and as such there are changes to pom files.

 

<SPARK_BENCH_HOME>/<Workload>/bin/gen_data.sh

<SPARK_BENCH_HOME>/<Workload>/bin/run.sh

 

 

 

 

 

可以直接去<SPARK_BENCH_HOME>/num 目录下去查看最后的结果

但是结果就是一个Mean SquaredError = 0.003946153574496333

平均方差?

然后 …  我也不知道了….

 

大概是这样的

 

 

 

 

 

 

 

 

配置文件详细如下:

[root@master spark-bench]# cat./conf/env.sh

# global settings

 

 

#master="pts00450-vm16"

master="master"

#A list of machines where the spark clusteris running

#MC_LIST="pts00450-vm22pts00450-vm23"

MC_LIST="master"

#####

[ -z "$HADOOP_HOME" ]&&     export HADOOP_HOME=/usr/local/hadoop

# base dir for DataSet

#####:9000

HDFS_URL="hdfs://${master}:9000"

SPARK_HADOOP_FS_LOCAL_BLOCK_SIZE=536870912

 

#DATA_HDFS="hdfs://${master}:9000/SparkBench","file:///home/`whoami`/SparkBench"

DATA_HDFS="hdfs://${master}:9000/SparkBench"

 

#Local dataset optional

DATASET_DIR=/home/`whoami`/SparkBench/dataset

 

SPARK_VERSION=2.0.1  #1.5.1

[ -z "$SPARK_HOME" ]&&     exportSPARK_HOME=/usr/local/spark

 

#SPARK_MASTER=local

#SPARK_MASTER=local[K]

#SPARK_MASTER=local[*]

#SPARK_MASTER=spark://HOST:PORT

##SPARK_MASTER=mesos://HOST:PORT

##SPARK_MASTER=yarn-client

#####SPARK_MASTER=yarn

MASTER=yarn

YARN_DEPLOY_MODE=client # or cluster, thiswill go to spark submit as --deploy-mode

SPARK_RPC_ASKTIMEOUT=500

SPARK_MASTER=spark://${master}:7077

 

 

 

# Spark config in environment variable oraruments of spark-submit

# - SPARK_SERIALIZER, --confspark.serializer

# - SPARK_RDD_COMPRESS, --confspark.rdd.compress

# - SPARK_IO_COMPRESSION_CODEC, --confspark.io.compression.codec

# - SPARK_DEFAULT_PARALLELISM, --confspark.default.parallelism

SPARK_SERIALIZER=org.apache.spark.serializer.KryoSerializer

SPARK_RDD_COMPRESS=false

SPARK_IO_COMPRESSION_CODEC=lzf

 

# Spark options in system.property orarguments of spark-submit

# - SPARK_EXECUTOR_MEMORY, --conf spark.executor.memory

# - SPARK_STORAGE_MEMORYFRACTION, --confspark.storage.memoryfraction

#SPARK_STORAGE_MEMORYFRACTION=0.5

SPARK_EXECUTOR_MEMORY=1g

#export MEM_FRACTION_GLOBAL=0.005

 

# Spark options in YARN client mode

# - SPARK_DRIVER_MEMORY, --driver-memory

# - SPARK_EXECUTOR_INSTANCES,--num-executors

# - SPARK_EXECUTOR_CORES, --executor-cores

# - SPARK_DRIVER_MEMORY, --driver-memory

#export EXECUTOR_GLOBAL_MEM=2g

#export executor_cores=2

export SPARK_DRIVER_MEMORY=2g

export SPARK_EXECUTOR_INSTANCES=4

export SPARK_EXECUTOR_CORES=1

 

# Storage levels, seehttp://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/StorageLevels.html

# - STORAGE_LEVEL, set MEMORY_AND_DISK,MEMORY_AND_DISK_SER, MEMORY_ONLY, MEMORY_ONLY_SER, or DISK_ONLY

STORAGE_LEVEL=MEMORY_AND_DISK

 

# for data generation

NUM_OF_PARTITIONS=2

# for running

NUM_TRIALS=1

 

 

 

我可以在master:8080中看到

确实已经在spark中运行了这几个任务

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值