V1.2中针对OOC做了特别优化。
首先要设置选项:
<property>
<name>giraph.useOutOfCoreGraph</name>
<value>true</value>
</property>
这里易出现的问题是, 关于flowcontrol包中,Server端的配置默认是No_OP, 但是Worker端默认是CreditBasedFlowControl, 这样两边通信时由于response产生的机制不同,导致生成的responseId不同, 报错,,因此需要设置:
<property>
<name>giraph.waitForPerWorkerRequests</name>
<value>true</value>
</property>
确保两边都是CreditBasedFlowControl。
此外,如果设定的worker数太少,会报内存oom错误!
java.lang.IllegalStateException: Exception occurred
at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:274)
at org.apache.giraph.graph.GraphTaskManager.processGraphPartitions(GraphTaskManager.java:821)
at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:365)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:202)
at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:271)
... 10 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.giraph.utils.UnsafeByteArrayOutputStream.<init>(UnsafeByteArrayOutputStream.java:82)
at org.apache.giraph.utils.UnsafeByteArrayOutputStream.<init>(UnsafeByteArrayOutputStream.java:73)
at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createExtendedDataOutput(ImmutableClassesGiraphConfiguration.java:1188)
at org.apache.giraph.utils.io.ExtendedDataInputOutput.<init>(ExtendedDataInputOutput.java:47)
at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createMessagesInputOutput(ImmutableClassesGiraphConfiguration.java:1177)
at org.apache.giraph.comm.messages.primitives.IntByteArrayMessageStore.getDataInputOutput(IntByteArrayMessageStore.java:124)
at org.apache.giraph.comm.messages.primitives.IntByteArrayMessageStore.addPartitionMessages(IntByteArrayMessageStore.java:181)
at org.apache.giraph.ooc.data.DiskBackedMessageStore.addEntryToInMemoryPartitionData(DiskBackedMessageStore.java:283)
at org.apache.giraph.ooc.data.DiskBackedMessageStore.addEntryToInMemoryPartitionData(DiskBackedMessageStore.java:1)
at org.apache.giraph.ooc.data.DiskBackedDataStore.addEntry(DiskBackedDataStore.java:200)
at org.apache.giraph.ooc.data.DiskBackedMessageStore.addPartitionMessages(DiskBackedMessageStore.java:136)
at org.apache.giraph.comm.requests.SendWorkerMessagesRequest.doRequest(SendWorkerMessagesRequest.java:94)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:472)
at org.apache.giraph.comm.SendMessageCache.flush(SendMessageCache.java:257)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:404)
at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:253)
at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:1)
at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:67)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
可以看出,是由于接收本地消息时内存不足造成的。
设置后执行命令:
giraph ../giraph-core-1.2.0.jar org.apache.giraph.benchmark.PageRankComputation -vif org.apache.giraph.io.formats.IntFloatNullTextInputFormat -vip /test/youTube.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /output -w 3
结果:
No HADOOP_CONF_DIR set, using /opt/hadoop-1.2.1/conf
16/12/12 00:02:26 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
16/12/12 00:02:26 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one.
16/12/12 00:02:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 1, old value = 4)
16/12/12 00:02:32 INFO job.GiraphJob: Tracking URL: http://mu02:50030/jobdetails.jsp?jobid=job_201612092054_0044
16/12/12 00:02:32 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 4 mappers
16/12/12 00:03:19 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer c02b13:22181 --zkNode /_hadoopBsp/job_201612092054_0044/_haltComputation'
16/12/12 00:03:19 INFO mapred.JobClient: Running job: job_201612092054_0044
16/12/12 00:03:20 INFO mapred.JobClient: map 100% reduce 0%
16/12/12 00:03:34 INFO mapred.JobClient: Job complete: job_201612092054_0044
16/12/12 00:03:34 INFO mapred.JobClient: Counters: 47
16/12/12 00:03:34 INFO mapred.JobClient: Zookeeper halt node
16/12/12 00:03:34 INFO mapred.JobClient: /_hadoopBsp/job_201612092054_0044/_haltComputation=0
16/12/12 00:03:34 INFO mapred.JobClient: Zookeeper base path
16/12/12 00:03:34 INFO mapred.JobClient: /_hadoopBsp/job_201612092054_0044=0
16/12/12 00:03:34 INFO mapred.JobClient: Job Counters
16/12/12 00:03:34 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=135763
16/12/12 00:03:34 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
16/12/12 00:03:34 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
16/12/12 00:03:34 INFO mapred.JobClient: Launched map tasks=4
16/12/12 00:03:34 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
16/12/12 00:03:34 INFO mapred.JobClient: Giraph Timers
16/12/12 00:03:34 INFO mapred.JobClient: Superstep 5 PageRankComputation (ms)=1325
16/12/12 00:03:34 INFO mapred.JobClient: Superstep 0 PageRankComputation (ms)=1086
16/12/12 00:03:34 INFO mapred.JobClient: Superstep 3 PageRankComputation (ms)=1685
16/12/12 00:03:34 INFO mapred.JobClient: Superstep 1 PageRankComputation (ms)=2478
16/12/12 00:03:34 INFO mapred.JobClient: Input superstep (ms)=5188
16/12/12 00:03:34 INFO mapred.JobClient: Total (ms)=26390
16/12/12 00:03:34 INFO mapred.JobClient: Shutdown (ms)=10014
16/12/12 00:03:34 INFO mapred.JobClient: Superstep 4 PageRankComputation (ms)=2016
16/12/12 00:03:34 INFO mapred.JobClient: Superstep 2 PageRankComputation (ms)=1958
16/12/12 00:03:34 INFO mapred.JobClient: Initialize (ms)=14028
16/12/12 00:03:34 INFO mapred.JobClient: Superstep 6 PageRankComputation (ms)=567
16/12/12 00:03:34 INFO mapred.JobClient: Setup (ms)=69
16/12/12 00:03:34 INFO mapred.JobClient: Zookeeper server:port
16/12/12 00:03:34 INFO mapred.JobClient: c02b13:22181=0
16/12/12 00:03:34 INFO mapred.JobClient: Giraph Stats
16/12/12 00:03:34 INFO mapred.JobClient: Aggregate bytes loaded from local disks (out-of-core)=0
16/12/12 00:03:34 INFO mapred.JobClient: Sent message bytes=0
16/12/12 00:03:34 INFO mapred.JobClient: Aggregate bytes stored to local disks (out-of-core)=0
16/12/12 00:03:34 INFO mapred.JobClient: Current workers=3
16/12/12 00:03:34 INFO mapred.JobClient: Last checkpointed superstep=0
16/12/12 00:03:34 INFO mapred.JobClient: Aggregate sent messages=17925744
16/12/12 00:03:34 INFO mapred.JobClient: Aggregate finished vertices=1134890
16/12/12 00:03:34 INFO mapred.JobClient: Aggregate vertices=1134890
16/12/12 00:03:34 INFO mapred.JobClient: Aggregate edges=2987624
16/12/12 00:03:34 INFO mapred.JobClient: Superstep=7
16/12/12 00:03:34 INFO mapred.JobClient: Aggregate sent message bytes=143419884
16/12/12 00:03:34 INFO mapred.JobClient: Current master task partition=0
16/12/12 00:03:34 INFO mapred.JobClient: Sent messages=0
16/12/12 00:03:34 INFO mapred.JobClient: Lowest percentage of graph in memory so far (out-of-core)=100
16/12/12 00:03:34 INFO mapred.JobClient: File Output Format Counters
16/12/12 00:03:34 INFO mapred.JobClient: Bytes Written=0
16/12/12 00:03:34 INFO mapred.JobClient: FileSystemCounters
16/12/12 00:03:34 INFO mapred.JobClient: HDFS_BYTES_READ=29531257
16/12/12 00:03:34 INFO mapred.JobClient: FILE_BYTES_WRITTEN=490099
16/12/12 00:03:34 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=20394140
16/12/12 00:03:34 INFO mapred.JobClient: File Input Format Counters
16/12/12 00:03:34 INFO mapred.JobClient: Bytes Read=0
16/12/12 00:03:34 INFO mapred.JobClient: Map-Reduce Framework
16/12/12 00:03:34 INFO mapred.JobClient: Map input records=4
16/12/12 00:03:34 INFO mapred.JobClient: Physical memory (bytes) snapshot=1087893504
16/12/12 00:03:34 INFO mapred.JobClient: Spilled Records=0
16/12/12 00:03:34 INFO mapred.JobClient: CPU time spent (ms)=184690
16/12/12 00:03:34 INFO mapred.JobClient: Total committed heap usage (bytes)=722993152
16/12/12 00:03:34 INFO mapred.JobClient: Virtual memory (bytes) snapshot=3353665536
16/12/12 00:03:34 INFO mapred.JobClient: Map output records=0
16/12/12 00:03:34 INFO mapred.JobClient: SPLIT_RAW_BYTES=176