笔记:Hama 0.7.1 错误修复

0 篇文章 0 订阅

1.Vertex NullPointException

在 Hama 运行过程中出现如下错误日志:

20171110152105951 INFO  org.apache.hama.ipc.Server#run() L346: Starting Socket Reader #1 for port 61001
20171110152105953 INFO  org.apache.hama.ipc.Server#run() L639: IPC Server Responder: starting
20171110152105953 INFO  org.apache.hama.ipc.Server#run() L468: IPC Server listener on 61001: starting
20171110152105954 INFO  org.apache.hama.ipc.Server#run() L1182: IPC Server handler 0 on 61001: starting
20171110152105956 INFO  apache.hama.bsp.message.HamaMessageManagerImpl#startServer() L101: BSPPeer address:master port:61001
20171110152105956 INFO  org.apache.hama.ipc.Server#run() L1182: IPC Server handler 4 on 61001: starting
20171110152105955 INFO  org.apache.hama.ipc.Server#run() L1182: IPC Server handler 3 on 61001: starting
20171110152105955 INFO  org.apache.hama.ipc.Server#run() L1182: IPC Server handler 2 on 61001: starting
20171110152105954 INFO  org.apache.hama.ipc.Server#run() L1182: IPC Server handler 1 on 61001: starting
20171110152106230 INFO  apache.hama.bsp.sync.ZKSyncClient#initialize() L65: Initializing ZK Sync Client
20171110152106230 INFO  apache.hama.bsp.sync.ZooKeeperSyncClientImpl#init() L85: Start connecting to Zookeeper! At master/192.168.1.201:61001
20171110152106286 INFO  org.apache.hama.graph.GraphJobRunner#loadVertices() L399: EE: org.apache.hama.graph.MapVerticesInfo@73163d48
20171110152106287 INFO  org.apache.hama.graph.GraphJobRunner#loadVertices() L400: EE: org.apache.hama.bsp.BSPPeerImpl@58c34bb3
20171110152134081 INFO  org.apache.hama.graph.GraphJobRunner#loadVertices() L460: 3363000 vertices are loaded into master:61001
20171110152134081 INFO  org.apache.hama.graph.GraphJobRunner#setup() L136: Total time spent for loading vertices: 27795 ms
20171110152137279 INFO  org.apache.hama.graph.GraphJobRunner#setup() L140: Total time spent for broadcasting global vertex count: 3195 ms
20171110152200413 INFO  org.apache.hama.graph.GraphJobRunner#setup() L145: Total time spent for initial superstep: 23133 ms
20171110152205319 INFO  org.apache.hama.graph.GraphJobRunner#bsp() L166: Total time spent for broadcasting aggregation values: 19 ms
20171110152209237 ERROR org.apache.hama.bsp.BSPTask#runBSP() L173: Error running bsp setup and bsp function.
java.lang.NullPointerException
    at org.apache.hama.util.UnsafeByteArrayInputStream.<init>(UnsafeByteArrayInputStream.java:63)
    at org.apache.hama.util.WritableUtils.unsafeDeserialize(WritableUtils.java:63)
    at org.apache.hama.graph.MapVerticesInfo.get(MapVerticesInfo.java:101)
    at org.apache.hama.graph.GraphJobRunner$ComputeRunnable.<init>(GraphJobRunner.java:322)
    at org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:247)
    at org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:174)
    at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:171)
    at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
    at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
20171110152216601 INFO  org.apache.hama.ipc.Server#stop() L1393: Stopping server on 61001
20171110152216601 INFO  org.apache.hama.ipc.Server#run() L1260: IPC Server handler 0 on 61001: exiting
20171110152216601 INFO  org.apache.hama.ipc.Server#run() L1260: IPC Server handler 3 on 61001: exiting
20171110152216602 INFO  org.apache.hama.ipc.Server#run() L707: Stopping IPC Server Responder
20171110152216601 INFO  org.apache.hama.ipc.Server#run() L1260: IPC Server handler 1 on 61001: exiting
20171110152216601 INFO  org.apache.hama.ipc.Server#run() L1260: IPC Server handler 4 on 61001: exiting
20171110152216601 INFO  org.apache.hama.ipc.Server#run() L1260: IPC Server handler 2 on 61001: exiting
20171110152216601 INFO  org.apache.hama.ipc.Server#run() L503: Stopping IPC Server listener on 61001
20171110152216603 ERROR org.apache.hama.bsp.BSPTask#stopPingingGroom() L133: Shutting down ping service.
20171110152216604 FATAL org.apache.hama.bsp.GroomServer#main() L1270: Error running child
java.lang.NullPointerException
    at org.apache.hama.util.UnsafeByteArrayInputStream.<init>(UnsafeByteArrayInputStream.java:63)
    at org.apache.hama.util.WritableUtils.unsafeDeserialize(WritableUtils.java:63)
    at org.apache.hama.graph.MapVerticesInfo.get(MapVerticesInfo.java:101)
    at org.apache.hama.graph.GraphJobRunner$ComputeRunnable.<init>(GraphJobRunner.java:322)
    at org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:247)
    at org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:174)
    at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:171)
    at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
    at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)

解决方式:
需要修改 Hama 中 hama-graph-0.7.1.jar 文件中 org.apache.hama.graph.GraphJobRunner 类的 loadVertices 方法。
(1).下载该 jar 包的源码,采用 LOG.info() 的方式添加打印。在 loadVertices 方法源码中添加以 TODO 标识的代码行,如下所示。
(2).重新将 hama-graph 进行同名打包,替换原 Hama 路径下的 hama-graph-0.7.1.jar 文件。
注:采用 LOG 输出需要确保 Hama对该类的输出日志等级为 INFO 级别或其以下,才会生效。

@SuppressWarnings("unchecked")
private void loadVertices(BSPPeer<Writable, Writable, Writable, Writable, GraphJobMessage> peer) throws IOException, SyncException, InterruptedException {
    LOG.info(vertices.toString());// TODO 添加打印
    LOG.info(peer.toString());// TODO 添加打印

    for (int i = 0; i < peer.getNumPeers(); i++) {
        partitionMessages.put(i, new GraphJobMessage());
    }

    VertexInputReader<Writable, Writable, V, E, M> reader = (VertexInputReader<Writable, Writable, V, E, M>) ReflectionUtils.newInstance(conf.getClass(Constants.RUNTIME_PARTITION_RECORDCONVERTER, VertexInputReader.class));
    LOG.info(reader.toString());// TODO 添加打印

    ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newCachedThreadPool();
    executor.setMaximumPoolSize(conf.getInt(DEFAULT_THREAD_POOL_SIZE, 64));
    executor.setRejectedExecutionHandler(retryHandler);

    KeyValuePair<Writable, Writable> next = null;

    while ((next = peer.readNext()) != null) {
        Vertex<V, E, M> vertex = GraphJobRunner.<V, E, M>newVertexInstance(VERTEX_CLASS);

        boolean vertexFinished = false;
        try {
            vertexFinished = reader.parseVertex(next.getKey(), next.getValue(), vertex);
        } catch (Exception e) {
            throw new IOException("Parse exception occured: " + e);
        }

        if (!vertexFinished) {
            continue;
        }

        LOG.info(vertex.getVertexID());// TODO 打印顶点Id

        Runnable worker = new Parser(vertex);
        executor.execute(worker);
    }

    executor.shutdown();
    executor.awaitTermination(60, TimeUnit.SECONDS);

    Iterator<Entry<Integer, GraphJobMessage>> it;
    it = partitionMessages.entrySet().iterator();
    while (it.hasNext()) {
        Entry<Integer, GraphJobMessage> e = it.next();
        it.remove();
        GraphJobMessage msg = e.getValue();
        msg.setFlag(GraphJobMessage.PARTITION_FLAG);
        peer.send(getHostName(e.getKey()), msg);
    }

    peer.sync();

    executor = (ThreadPoolExecutor) Executors.newCachedThreadPool();
    executor.setMaximumPoolSize(conf.getInt(DEFAULT_THREAD_POOL_SIZE, 64));
    executor.setRejectedExecutionHandler(retryHandler);

    GraphJobMessage msg;
    while ((msg = peer.getCurrentMessage()) != null) {
        executor.execute(new AddVertex(msg));
    }

    executor.shutdown();
    executor.awaitTermination(60, TimeUnit.SECONDS);

    LOG.info(vertices.size() + " vertices are loaded into " + peer.getPeerName());
}

以上方法会带来一个缺陷,可以明显的看出”顶点打印“在 while 循环中,这样会使得数据图有多少个顶点存在便会在日志中打印多少次,造成日志文件大小的急剧膨胀,倘若超出 Hama 默认内存分配配置,将会抛出内存溢出的告警提示。

2.内存溢出:Java Heap Space

产生该错误的原因主要是堆大小分配不够。
如果先前按照步骤1中的说明增加了顶点打印,则可以看到程序是在控制台不断地打印顶点 Id,而不是输出至日志文件中;正常情况下这些日志会输出至指定的日志文件中。

一般日志错误提示如下:

20171202214956380 INFO  org.apache.hama.bsp.FileInputFormat#listStatus() L165: Total input paths to process : 1
20171202214956866 INFO  org.apache.hama.bsp.BSPJobClient#monitorAndPrintJob() L663: Running job: job_201712022149_0001
20171202214959873 INFO  org.apache.hama.bsp.BSPJobClient#monitorAndPrintJob() L674: Current supersteps number: 0
20171202215141969 INFO  org.apache.hama.bsp.BSPJobClient#monitorAndPrintJob() L674: Current supersteps number: 1
20171202215147975 INFO  org.apache.hama.bsp.BSPJobClient#monitorAndPrintJob() L674: Current supersteps number: 2
20171202220712455 INFO  org.apache.hama.bsp.BSPJobClient#monitorAndPrintJob() L674: Current supersteps number: 0
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
        at java.lang.StringBuffer.append(StringBuffer.java:367)
        at java.io.BufferedReader.readLine(BufferedReader.java:370)
        at java.io.BufferedReader.readLine(BufferedReader.java:389)
        at org.apache.hama.bsp.BSPJobClient.getTaskLogs(BSPJobClient.java:729)
        at org.apache.hama.bsp.BSPJobClient.displayTaskLogs(BSPJobClient.java:716)
        at org.apache.hama.bsp.BSPJobClient.monitorAndPrintJob(BSPJobClient.java:685)
        at org.apache.hama.bsp.BSPJob.waitForCompletion(BSPJob.java:230)
        at ProbMatch.main(ProbMatch.java:138)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hama.util.RunJar.main(RunJar.java:146)

解决方式:
修改 Hama 路径下 conf/hama-env.sh 中 HAMA_HEAPSIZE 项,默认大小为 1000 MB

# The maximum amount of heap to use, in MB. Default is 1000.
# export HAMA_HEAPSIZE=1000
export HAMA_HEAPSIZE=你需要的大小

修改后,重启 Hama 服务即可。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值