HOP运行

原创 2011年01月22日 17:37:00

1

分别运行hadoop自带的wordcount程序来计算《尤利西斯》中的单词出现频度:

《Ulysses》大小为1.5 MB (1573044 字节)



1.1 hadoop:


hadoop@hadoop:~$ hadoop jar /home/hadoop/hadoop-0.20.2/hadoop-0.20.2-examples.jar wordcount dft dft-output
10/12/28 16:58:22 INFO input.FileInputFormat: Total input paths to process : 1
10/12/28 16:58:22 INFO mapred.JobClient: Running job: job_201012281638_0002
10/12/28 16:58:23 INFO mapred.JobClient:  map 0% reduce 0%
10/12/28 16:58:33 INFO mapred.JobClient:  map 100% reduce 0%
10/12/28 16:58:45 INFO mapred.JobClient:  map 100% reduce 100%
10/12/28 16:58:47 INFO mapred.JobClient: Job complete: job_201012281638_0002
10/12/28 16:58:47 INFO mapred.JobClient: Counters: 17
10/12/28 16:58:47 INFO mapred.JobClient:   Job Counters
10/12/28 16:58:47 INFO mapred.JobClient:     Launched reduce tasks=1
10/12/28 16:58:47 INFO mapred.JobClient:     Launched map tasks=1
10/12/28 16:58:47 INFO mapred.JobClient:     Data-local map tasks=1
10/12/28 16:58:47 INFO mapred.JobClient:   FileSystemCounters
10/12/28 16:58:47 INFO mapred.JobClient:     FILE_BYTES_READ=1480478
10/12/28 16:58:47 INFO mapred.JobClient:     HDFS_BYTES_READ=1573044
10/12/28 16:58:47 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2220746
10/12/28 16:58:47 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=527522
10/12/28 16:58:47 INFO mapred.JobClient:   Map-Reduce Framework
10/12/28 16:58:47 INFO mapred.JobClient:     Reduce input groups=50091
10/12/28 16:58:47 INFO mapred.JobClient:     Combine output records=51302
10/12/28 16:58:47 INFO mapred.JobClient:     Map input records=33055
10/12/28 16:58:47 INFO mapred.JobClient:     Reduce shuffle bytes=740236
10/12/28 16:58:47 INFO mapred.JobClient:     Reduce output records=50091
10/12/28 16:58:47 INFO mapred.JobClient:     Spilled Records=153906
10/12/28 16:58:47 INFO mapred.JobClient:     Map output bytes=2601773
10/12/28 16:58:47 INFO mapred.JobClient:     Combine input records=267975
10/12/28 16:58:47 INFO mapred.JobClient:     Map output records=267975
10/12/28 16:58:47 INFO mapred.JobClient:     Reduce input records=51302


map开始到结束10秒。reduce12秒。整个job25秒。




1.2 HOP:


hadoop@hadoop:~$ /home/hadoop/hadoop-hop-0.2/bin/hadoop jar /home/hadoop/hadoop-hop-0.2/hadoop-hop-0.2-examples.jar wordcount input output
10/12/28 17:08:23 INFO mapred.FileInputFormat: Total input paths to process : 1
10/12/28 17:08:23 INFO mapred.JobClient: Running job: job_201012281706_0001
10/12/28 17:08:23 INFO mapred.JobClient: Job configuration (HOP):
     map pipeline    = true
     reduce pipeline = true
     snapshot input  = false
     snapshot freq   = 0.1
10/12/28 17:08:24 INFO mapred.JobClient:  map 0% reduce 0%
10/12/28 17:08:28 INFO mapred.JobClient:  map 50% reduce 0%
10/12/28 17:08:29 INFO mapred.JobClient:  map 100% reduce 0%
10/12/28 17:08:31 INFO mapred.JobClient:  map 100% reduce 100%
10/12/28 17:08:32 INFO mapred.JobClient: Job complete: job_201012281706_0001
10/12/28 17:08:32 INFO mapred.JobClient: Elapsed time: 9523 msec.
10/12/28 17:08:32 INFO mapred.JobClient: Counters: 13
10/12/28 17:08:32 INFO mapred.JobClient:   File Systems
10/12/28 17:08:32 INFO mapred.JobClient:     HDFS bytes read=1577051
10/12/28 17:08:32 INFO mapred.JobClient:     Local bytes read=1590902
10/12/28 17:08:32 INFO mapred.JobClient:     Local bytes written=3926147
10/12/28 17:08:32 INFO mapred.JobClient:   Job Counters
10/12/28 17:08:32 INFO mapred.JobClient:     Launched reduce tasks=1
10/12/28 17:08:32 INFO mapred.JobClient:     Rack-local map tasks=2
10/12/28 17:08:32 INFO mapred.JobClient:     Launched map tasks=2
10/12/28 17:08:32 INFO mapred.JobClient:   Map-Reduce Framework
10/12/28 17:08:32 INFO mapred.JobClient:     Reduce input groups=0
10/12/28 17:08:32 INFO mapred.JobClient:     Combine output records=0
10/12/28 17:08:32 INFO mapred.JobClient:     Map input records=33055
10/12/28 17:08:32 INFO mapred.JobClient:     Reduce output records=0
10/12/28 17:08:32 INFO mapred.JobClient:     Map input bytes=1573044
10/12/28 17:08:32 INFO mapred.JobClient:     Combine input records=0
10/12/28 17:08:32 INFO mapred.JobClient:     Reduce input records=0


map用时5秒,reduce用时2秒,整个job用时8秒。


2.


用两个机器运行。数据文件约250MB

2.1 HADOOP:

hadoop@hadoop:~$ hadoop jar /home/hadoop/hadoop-0.20.2/hadoop-0.20.2-examples.jar wordcount input output
11/01/22 14:29:51 INFO input.FileInputFormat: Total input paths to process : 1
11/01/22 14:29:51 INFO mapred.JobClient: Running job: job_201101221428_0001
11/01/22 14:29:52 INFO mapred.JobClient:  map 0% reduce 0%
11/01/22 14:30:06 INFO mapred.JobClient:  map 12% reduce 0%
11/01/22 14:30:09 INFO mapred.JobClient:  map 34% reduce 0%
11/01/22 14:30:12 INFO mapred.JobClient:  map 51% reduce 0%
11/01/22 14:30:15 INFO mapred.JobClient:  map 69% reduce 0%
11/01/22 14:30:18 INFO mapred.JobClient:  map 87% reduce 0%
11/01/22 14:30:21 INFO mapred.JobClient:  map 97% reduce 0%
11/01/22 14:30:24 INFO mapred.JobClient:  map 100% reduce 0%
11/01/22 14:30:39 INFO mapred.JobClient:  map 100% reduce 16%
11/01/22 14:30:45 INFO mapred.JobClient:  map 100% reduce 25%
11/01/22 14:30:55 INFO mapred.JobClient:  map 100% reduce 69%
11/01/22 14:30:58 INFO mapred.JobClient:  map 100% reduce 72%
11/01/22 14:31:01 INFO mapred.JobClient:  map 100% reduce 76%
11/01/22 14:31:04 INFO mapred.JobClient:  map 100% reduce 79%
11/01/22 14:31:07 INFO mapred.JobClient:  map 100% reduce 83%
11/01/22 14:31:10 INFO mapred.JobClient:  map 100% reduce 87%
11/01/22 14:31:13 INFO mapred.JobClient:  map 100% reduce 90%
11/01/22 14:31:16 INFO mapred.JobClient:  map 100% reduce 94%
11/01/22 14:31:22 INFO mapred.JobClient:  map 100% reduce 100%
11/01/22 14:31:24 INFO mapred.JobClient: Job complete: job_201101221428_0001
11/01/22 14:31:24 INFO mapred.JobClient: Counters: 18
11/01/22 14:31:24 INFO mapred.JobClient:   Job Counters
11/01/22 14:31:24 INFO mapred.JobClient:     Launched reduce tasks=1
11/01/22 14:31:24 INFO mapred.JobClient:     Rack-local map tasks=2
11/01/22 14:31:24 INFO mapred.JobClient:     Launched map tasks=4
11/01/22 14:31:24 INFO mapred.JobClient:     Data-local map tasks=2
11/01/22 14:31:24 INFO mapred.JobClient:   FileSystemCounters
11/01/22 14:31:24 INFO mapred.JobClient:     FILE_BYTES_READ=899092936
11/01/22 14:31:24 INFO mapred.JobClient:     HDFS_BYTES_READ=264087722
11/01/22 14:31:24 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1262226400
11/01/22 14:31:24 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=297057516
11/01/22 14:31:24 INFO mapred.JobClient:   Map-Reduce Framework
11/01/22 14:31:24 INFO mapred.JobClient:     Reduce input groups=16518949
11/01/22 14:31:24 INFO mapred.JobClient:     Combine output records=33037898
11/01/22 14:31:24 INFO mapred.JobClient:     Map input records=16522439
11/01/22 14:31:24 INFO mapred.JobClient:     Reduce shuffle bytes=276839460
11/01/22 14:31:24 INFO mapred.JobClient:     Reduce output records=16518949
11/01/22 14:31:24 INFO mapred.JobClient:     Spilled Records=57419405
11/01/22 14:31:24 INFO mapred.JobClient:     Map output bytes=330165187
11/01/22 14:31:24 INFO mapred.JobClient:     Combine input records=33041388
11/01/22 14:31:24 INFO mapred.JobClient:     Map output records=16522439
11/01/22 14:31:24 INFO mapred.JobClient:     Reduce input records=16518949

map用时32秒,reduce用时58秒,整个作业用时93秒。

2.2 HOP:

hadoop@hadoop:~$ /home/hadoop/hadoop-hop-0.2/bin/hadoop jar /home/hadoop/hadoop-hop-0.2/hadoop-hop-0.2-examples.jar wordcount input output
11/01/22 14:36:57 INFO mapred.FileInputFormat: Total input paths to process : 1
11/01/22 14:36:57 INFO mapred.JobClient: Running job: job_201101221435_0001
11/01/22 14:36:57 INFO mapred.JobClient: Job configuration (HOP):
     map pipeline    = true
     reduce pipeline = true
     snapshot input  = false
     snapshot freq   = 0.1
11/01/22 14:36:58 INFO mapred.JobClient:  map 0% reduce 0%
11/01/22 14:37:14 INFO mapred.JobClient:  map 9% reduce 0%
11/01/22 14:37:18 INFO mapred.JobClient:  map 24% reduce 0%
11/01/22 14:37:19 INFO mapred.JobClient:  map 33% reduce 0%
11/01/22 14:37:23 INFO mapred.JobClient:  map 46% reduce 0%
11/01/22 14:37:24 INFO mapred.JobClient:  map 56% reduce 0%
11/01/22 14:37:28 INFO mapred.JobClient:  map 86% reduce 0%
11/01/22 14:37:31 INFO mapred.JobClient:  map 91% reduce 0%
11/01/22 14:37:33 INFO mapred.JobClient:  map 99% reduce 0%
11/01/22 14:37:40 INFO mapred.JobClient:  map 100% reduce 0%
11/01/22 14:37:44 INFO mapred.JobClient:  map 100% reduce 11%
11/01/22 14:38:19 INFO mapred.JobClient:  map 100% reduce 18%
11/01/22 14:39:04 INFO mapred.JobClient:  map 100% reduce 23%
11/01/22 14:39:58 INFO mapred.JobClient:  map 100% reduce 30%
11/01/22 14:41:03 INFO mapred.JobClient:  map 100% reduce 37%
11/01/22 14:42:24 INFO mapred.JobClient:  map 100% reduce 45%
11/01/22 14:44:04 INFO mapred.JobClient:  map 100% reduce 52%
11/01/22 14:44:09 INFO mapred.JobClient:  map 100% reduce 56%
11/01/22 14:54:13 INFO mapred.JobClient:  map 100% reduce 0%
11/01/22 14:54:14 INFO mapred.JobClient: Task Id : attempt_201101221435_0001_r_000000_0, Status : FAILED
Task attempt_201101221435_0001_r_000000_0 failed to report status for 601 seconds. Killing!


map用时42秒,reduce前52%用时6min25s(hadoop下reduce69%用时16s),整个作业失败,每次均在56%处失败。
原因不明。


2.2.1
更换数据文件,这次约为266MB,结果如下:

hadoop@hadoop:~$ /home/hadoop/hadoop-hop-0.2/bin/hadoop jar /home/hadoop/hadoop-hop-0.2/hadoop-hop-0.2-examples.jar wordcount input output
11/01/22 15:29:25 INFO mapred.FileInputFormat: Total input paths to process : 1
11/01/22 15:29:25 INFO mapred.JobClient: Running job: job_201101221528_0001
11/01/22 15:29:25 INFO mapred.JobClient: Job configuration (HOP):
     map pipeline    = true
     reduce pipeline = true
     snapshot input  = false
     snapshot freq   = 0.1
11/01/22 15:29:26 INFO mapred.JobClient:  map 0% reduce 0%
11/01/22 15:29:42 INFO mapred.JobClient:  map 13% reduce 0%
11/01/22 15:29:46 INFO mapred.JobClient:  map 27% reduce 0%
11/01/22 15:29:47 INFO mapred.JobClient:  map 38% reduce 0%
11/01/22 15:29:51 INFO mapred.JobClient:  map 56% reduce 0%
11/01/22 15:29:52 INFO mapred.JobClient:  map 58% reduce 0%
11/01/22 15:29:56 INFO mapred.JobClient:  map 63% reduce 0%
11/01/22 15:29:57 INFO mapred.JobClient:  map 94% reduce 0%
11/01/22 15:29:58 INFO mapred.JobClient:  map 95% reduce 0%
11/01/22 15:30:00 INFO mapred.JobClient:  map 100% reduce 0%
11/01/22 15:30:11 INFO mapred.JobClient:  map 100% reduce 24%
11/01/22 15:30:26 INFO mapred.JobClient:  map 100% reduce 44%
11/01/22 15:41:04 INFO mapred.JobClient:  map 100% reduce 0%
11/01/22 15:41:04 INFO mapred.JobClient: Task Id : attempt_201101221528_0001_r_000000_0, Status : FAILED
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.buffer.impl.JOutputBuffer.malloc(JOutputBuffer.java:738)
    at org.apache.hadoop.mapred.ReduceTask.reduce(ReduceTask.java:449)
    at org.apache.hadoop.mapred.ReduceTask.copy(ReduceTask.java:395)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:329)
    at org.apache.hadoop.mapred.Child.main(Child.java:180)

Task attempt_201101221528_0001_r_000000_0 failed to report status for 601 seconds. Killing!
attempt_201101221528_0001_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.buffer.impl.JInputBuffer).
attempt_201101221528_0001_r_000000_0: log4j:WARN Please initialize the log4j system properly.

此时程序出现错误。另,map过程耗时35s


相同情形下HADOOP:
hadoop@hadoop:~$ hadoop jar /home/hadoop/hadoop-0.20.2/hadoop-0.20.2-examples.jar wordcount input output
11/01/22 16:00:32 INFO input.FileInputFormat: Total input paths to process : 1
11/01/22 16:00:33 INFO mapred.JobClient: Running job: job_201101221547_0001
11/01/22 16:00:34 INFO mapred.JobClient:  map 0% reduce 0%
11/01/22 16:00:47 INFO mapred.JobClient:  map 14% reduce 0%
11/01/22 16:00:50 INFO mapred.JobClient:  map 36% reduce 0%
11/01/22 16:00:53 INFO mapred.JobClient:  map 60% reduce 0%
11/01/22 16:00:56 INFO mapred.JobClient:  map 78% reduce 0%
11/01/22 16:00:59 INFO mapred.JobClient:  map 79% reduce 0%
11/01/22 16:01:02 INFO mapred.JobClient:  map 100% reduce 0%
11/01/22 16:01:08 INFO mapred.JobClient:  map 100% reduce 20%
11/01/22 16:01:12 INFO mapred.JobClient:  map 100% reduce 26%
11/01/22 16:01:21 INFO mapred.JobClient:  map 100% reduce 72%
11/01/22 16:01:24 INFO mapred.JobClient:  map 100% reduce 81%
11/01/22 16:01:27 INFO mapred.JobClient:  map 100% reduce 91%
11/01/22 16:01:33 INFO mapred.JobClient:  map 100% reduce 100%
11/01/22 16:01:35 INFO mapred.JobClient: Job complete: job_201101221547_0001
11/01/22 16:01:35 INFO mapred.JobClient: Counters: 18
11/01/22 16:01:35 INFO mapred.JobClient:   Job Counters
11/01/22 16:01:35 INFO mapred.JobClient:     Launched reduce tasks=1
11/01/22 16:01:35 INFO mapred.JobClient:     Rack-local map tasks=3
11/01/22 16:01:35 INFO mapred.JobClient:     Launched map tasks=5
11/01/22 16:01:35 INFO mapred.JobClient:     Data-local map tasks=2
11/01/22 16:01:35 INFO mapred.JobClient:   FileSystemCounters
11/01/22 16:01:35 INFO mapred.JobClient:     FILE_BYTES_READ=564627888
11/01/22 16:01:35 INFO mapred.JobClient:     HDFS_BYTES_READ=279755261
11/01/22 16:01:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=848808480
11/01/22 16:01:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=260717055
11/01/22 16:01:35 INFO mapred.JobClient:   Map-Reduce Framework
11/01/22 16:01:35 INFO mapred.JobClient:     Reduce input groups=4745187
11/01/22 16:01:35 INFO mapred.JobClient:     Combine output records=9947422
11/01/22 16:01:35 INFO mapred.JobClient:     Map input records=4301230
11/01/22 16:01:35 INFO mapred.JobClient:     Reduce shuffle bytes=215654605
11/01/22 16:01:35 INFO mapred.JobClient:     Reduce output records=4745187
11/01/22 16:01:35 INFO mapred.JobClient:     Spilled Records=14864138
11/01/22 16:01:35 INFO mapred.JobClient:     Map output bytes=305910102
11/01/22 16:01:35 INFO mapred.JobClient:     Combine input records=11573827
11/01/22 16:01:35 INFO mapred.JobClient:     Map output records=6543121
11/01/22 16:01:35 INFO mapred.JobClient:     Reduce input records=4916716


map耗时60s,为在HOP下map耗时的两倍。
同样条件HOP执行第二次,结果如下:
hadoop@hadoop:~$ /home/hadoop/hadoop-hop-0.2/bin/hadoop jar /home/hadoop/hadoop-hop-0.2/hadoop-hop-0.2-examples.jar wordcount input output
11/01/22 16:15:44 INFO mapred.FileInputFormat: Total input paths to process : 1
11/01/22 16:15:45 INFO mapred.JobClient: Running job: job_201101221611_0001
11/01/22 16:15:45 INFO mapred.JobClient: Job configuration (HOP):
     map pipeline    = true
     reduce pipeline = true
     snapshot input  = false
     snapshot freq   = 0.1
11/01/22 16:15:46 INFO mapred.JobClient:  map 0% reduce 0%
11/01/22 16:15:58 INFO mapred.JobClient:  map 12% reduce 0%
11/01/22 16:16:02 INFO mapred.JobClient:  map 26% reduce 0%
11/01/22 16:16:03 INFO mapred.JobClient:  map 38% reduce 0%
11/01/22 16:16:06 INFO mapred.JobClient:  map 45% reduce 0%
11/01/22 16:16:07 INFO mapred.JobClient:  map 63% reduce 0%
11/01/22 16:16:11 INFO mapred.JobClient:  map 67% reduce 10%
11/01/22 16:16:12 INFO mapred.JobClient:  map 71% reduce 10%
11/01/22 16:16:13 INFO mapred.JobClient:  map 97% reduce 10%
11/01/22 16:16:14 INFO mapred.JobClient:  map 100% reduce 10%
11/01/22 16:27:17 INFO mapred.JobClient:  map 100% reduce 0%
11/01/22 16:27:17 INFO mapred.JobClient: Task Id : attempt_201101221611_0001_r_000000_0, Status : FAILED
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.buffer.impl.JOutputBuffer.<init>(JOutputBuffer.java:632)
    at org.apache.hadoop.mapred.ReduceTask.reduce(ReduceTask.java:446)
    at org.apache.hadoop.mapred.ReduceTask.copy(ReduceTask.java:395)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:329)
    at org.apache.hadoop.mapred.Child.main(Child.java:180)

Task attempt_201101221611_0001_r_000000_0 failed to report status for 602 seconds. Killing!
attempt_201101221611_0001_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.buffer.impl.JInputBuffer).
attempt_201101221611_0001_r_000000_0: log4j:WARN Please initialize the log4j system properly.
11/01/22 16:27:27 INFO mapred.JobClient:  map 100% reduce 11%

 

版权声明:本文为博主原创文章,未经博主允许不得转载。

相关文章推荐

天河二号上运行ZHT(a zero-hop distributed table)

最近做的研究室是空间信息网络中分布式元数据的管理,基于ZHT实现,需要搭建一个集群环境,老师就提供了天河二号,以下是在天河二号上运行ZHT的步骤,作为记录: 使用ZHT首先需要安装proto...

【USACO OPEN 10】hop

USACO OPEN10 hop JZOJ 3128

HDU 3355 Hop — Don’t Walk!

Problem Description KERMIT THE FROG is a classic video game with a simple control and objective b...

次末跳弹出机制(Penultimate Hop Poping)

在实际应用当中(如MPLS VPN),对于Egress LSR在弹出最外层Label后还需要进行其他较为复杂的三层工作,而事实上最外层标签的作用在MPLS VPN的应用中只是为了将报文送到Egress...
  • jinspo
  • jinspo
  • 2013-09-16 20:18
  • 1505

hOp-2004-06-09.tar.bz2

  • 2011-02-09 16:41
  • 142KB
  • 下载

hOp-2004-06-09.tar.bz2

  • 2011-02-09 16:45
  • 142KB
  • 下载

win10 下运行scrapy startproject tutorial 报错 “ImportError:DLL load failed”

win10下利用命令“conda install -c conda-forge scrapy"安装scrapy没有提示出现问题。运行scrapy startproject tutorial 后,报错...

手把手教你linux下google chrome浏览器root用户无法运行,以及flash插件安装

周末终于被卡爆的win7折腾烦了,无奈之下终于下决心装了双系统,好不容易折腾好双系统,愉快的在linux下写了一个hello world后不愉快的事情终于发生了。 由于装的是kali,没有自带火狐或者...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:深度学习:神经网络中的前向传播和反向传播算法推导
举报原因:
原因补充:

(最多只允许输入30个字)