GiraphV1.2之DiskMessage 运行设置


V1.2中针对OOC做了特别优化。


首先要设置选项:

<property>
 <name>giraph.useOutOfCoreGraph</name>
 <value>true</value>
</property>

这里易出现的问题是, 关于flowcontrol包中,Server端的配置默认是No_OP, 但是Worker端默认是CreditBasedFlowControl, 这样两边通信时由于response产生的机制不同,导致生成的responseId不同, 报错,,因此需要设置:

<property>
 <name>giraph.waitForPerWorkerRequests</name>
 <value>true</value>
</property>

确保两边都是CreditBasedFlowControl。

此外,如果设定的worker数太少,会报内存oom错误!

java.lang.IllegalStateException: Exception occurred
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:274)
	at org.apache.giraph.graph.GraphTaskManager.processGraphPartitions(GraphTaskManager.java:821)
	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:365)
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:202)
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:271)
	... 10 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.apache.giraph.utils.UnsafeByteArrayOutputStream.<init>(UnsafeByteArrayOutputStream.java:82)
	at org.apache.giraph.utils.UnsafeByteArrayOutputStream.<init>(UnsafeByteArrayOutputStream.java:73)
	at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createExtendedDataOutput(ImmutableClassesGiraphConfiguration.java:1188)
	at org.apache.giraph.utils.io.ExtendedDataInputOutput.<init>(ExtendedDataInputOutput.java:47)
	at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createMessagesInputOutput(ImmutableClassesGiraphConfiguration.java:1177)
	at org.apache.giraph.comm.messages.primitives.IntByteArrayMessageStore.getDataInputOutput(IntByteArrayMessageStore.java:124)
	at org.apache.giraph.comm.messages.primitives.IntByteArrayMessageStore.addPartitionMessages(IntByteArrayMessageStore.java:181)
	at org.apache.giraph.ooc.data.DiskBackedMessageStore.addEntryToInMemoryPartitionData(DiskBackedMessageStore.java:283)
	at org.apache.giraph.ooc.data.DiskBackedMessageStore.addEntryToInMemoryPartitionData(DiskBackedMessageStore.java:1)
	at org.apache.giraph.ooc.data.DiskBackedDataStore.addEntry(DiskBackedDataStore.java:200)
	at org.apache.giraph.ooc.data.DiskBackedMessageStore.addPartitionMessages(DiskBackedMessageStore.java:136)
	at org.apache.giraph.comm.requests.SendWorkerMessagesRequest.doRequest(SendWorkerMessagesRequest.java:94)
	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:472)
	at org.apache.giraph.comm.SendMessageCache.flush(SendMessageCache.java:257)
	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:404)
	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:253)
	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:1)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:67)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

可以看出,是由于接收本地消息时内存不足造成的。


设置后执行命令:

 giraph ../giraph-core-1.2.0.jar  org.apache.giraph.benchmark.PageRankComputation -vif  org.apache.giraph.io.formats.IntFloatNullTextInputFormat -vip /test/youTube.txt  -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /output  -w 3

结果:

No HADOOP_CONF_DIR set, using /opt/hadoop-1.2.1/conf 
16/12/12 00:02:26 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
16/12/12 00:02:26 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one.
16/12/12 00:02:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 1, old value = 4)
16/12/12 00:02:32 INFO job.GiraphJob: Tracking URL: http://mu02:50030/jobdetails.jsp?jobid=job_201612092054_0044
16/12/12 00:02:32 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 4 mappers
16/12/12 00:03:19 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer c02b13:22181 --zkNode /_hadoopBsp/job_201612092054_0044/_haltComputation'
16/12/12 00:03:19 INFO mapred.JobClient: Running job: job_201612092054_0044
16/12/12 00:03:20 INFO mapred.JobClient:  map 100% reduce 0%
16/12/12 00:03:34 INFO mapred.JobClient: Job complete: job_201612092054_0044
16/12/12 00:03:34 INFO mapred.JobClient: Counters: 47
16/12/12 00:03:34 INFO mapred.JobClient:   Zookeeper halt node
16/12/12 00:03:34 INFO mapred.JobClient:     /_hadoopBsp/job_201612092054_0044/_haltComputation=0
16/12/12 00:03:34 INFO mapred.JobClient:   Zookeeper base path
16/12/12 00:03:34 INFO mapred.JobClient:     /_hadoopBsp/job_201612092054_0044=0
16/12/12 00:03:34 INFO mapred.JobClient:   Job Counters 
16/12/12 00:03:34 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=135763
16/12/12 00:03:34 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
16/12/12 00:03:34 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
16/12/12 00:03:34 INFO mapred.JobClient:     Launched map tasks=4
16/12/12 00:03:34 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
16/12/12 00:03:34 INFO mapred.JobClient:   Giraph Timers
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 5 PageRankComputation (ms)=1325
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 0 PageRankComputation (ms)=1086
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 3 PageRankComputation (ms)=1685
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 1 PageRankComputation (ms)=2478
16/12/12 00:03:34 INFO mapred.JobClient:     Input superstep (ms)=5188
16/12/12 00:03:34 INFO mapred.JobClient:     Total (ms)=26390
16/12/12 00:03:34 INFO mapred.JobClient:     Shutdown (ms)=10014
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 4 PageRankComputation (ms)=2016
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 2 PageRankComputation (ms)=1958
16/12/12 00:03:34 INFO mapred.JobClient:     Initialize (ms)=14028
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 6 PageRankComputation (ms)=567
16/12/12 00:03:34 INFO mapred.JobClient:     Setup (ms)=69
16/12/12 00:03:34 INFO mapred.JobClient:   Zookeeper server:port
16/12/12 00:03:34 INFO mapred.JobClient:     c02b13:22181=0
16/12/12 00:03:34 INFO mapred.JobClient:   Giraph Stats
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate bytes loaded from local disks (out-of-core)=0
16/12/12 00:03:34 INFO mapred.JobClient:     Sent message bytes=0
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate bytes stored to local disks (out-of-core)=0
16/12/12 00:03:34 INFO mapred.JobClient:     Current workers=3
16/12/12 00:03:34 INFO mapred.JobClient:     Last checkpointed superstep=0
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate sent messages=17925744
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate finished vertices=1134890
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate vertices=1134890
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate edges=2987624
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep=7
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate sent message bytes=143419884
16/12/12 00:03:34 INFO mapred.JobClient:     Current master task partition=0
16/12/12 00:03:34 INFO mapred.JobClient:     Sent messages=0
16/12/12 00:03:34 INFO mapred.JobClient:     Lowest percentage of graph in memory so far (out-of-core)=100
16/12/12 00:03:34 INFO mapred.JobClient:   File Output Format Counters 
16/12/12 00:03:34 INFO mapred.JobClient:     Bytes Written=0
16/12/12 00:03:34 INFO mapred.JobClient:   FileSystemCounters
16/12/12 00:03:34 INFO mapred.JobClient:     HDFS_BYTES_READ=29531257
16/12/12 00:03:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=490099
16/12/12 00:03:34 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=20394140
16/12/12 00:03:34 INFO mapred.JobClient:   File Input Format Counters 
16/12/12 00:03:34 INFO mapred.JobClient:     Bytes Read=0
16/12/12 00:03:34 INFO mapred.JobClient:   Map-Reduce Framework
16/12/12 00:03:34 INFO mapred.JobClient:     Map input records=4
16/12/12 00:03:34 INFO mapred.JobClient:     Physical memory (bytes) snapshot=1087893504
16/12/12 00:03:34 INFO mapred.JobClient:     Spilled Records=0
16/12/12 00:03:34 INFO mapred.JobClient:     CPU time spent (ms)=184690
16/12/12 00:03:34 INFO mapred.JobClient:     Total committed heap usage (bytes)=722993152
16/12/12 00:03:34 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3353665536
16/12/12 00:03:34 INFO mapred.JobClient:     Map output records=0
16/12/12 00:03:34 INFO mapred.JobClient:     SPLIT_RAW_BYTES=176


可以观察到,由于数据写磁盘的缘故,每轮超步的执行时间都比较长!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
提供的源码资源涵盖了安卓应用、小程序、Python应用和Java应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。
提供的源码资源涵盖了安卓应用、小程序、Python应用和Java应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。
提供的源码资源涵盖了安卓应用、小程序、Python应用和Java应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值