方法1: 看日志。
方法2:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m -Xdebug -Xrunjdwp:transport=dt_socket,address=8792,server=y,suspend=y</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
</property>
GiraphX即使-w参数设为1, 仍然要运行两个map任务, 一个master,一个worker, worker负责注册和实际计算,master汇总数据。
修改后pagerank能够顺利运行, 理论上说也应该能顺利调试,但是结果好像不行。 master 的map 任务和worker的map任务出现了Debug端口抢占现象。
方法3:
IsolationRunner
mapred-site.xml 增加:
<property>
<name>keep.failed.task.files</name>
<value>true</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/opt/hadoop-1.2.1/tmp/mapred</value>
</property>
到此目录:
/opt/hadoop-1.2.1/tmp/mapred/taskTracker/liuqiang2/jobcache/job_201603171716_0003/attempt_201603171716_0003_m_000001_0/work
执行:
[liuqiang2@mu02 work]$ hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.ifExists(LocalDirAllocator.java:508)
at org.apache.hadoop.fs.LocalDirAllocator.ifExists(LocalDirAllocator.java:216)
at org.apache.hadoop.mapred.IsolationRunner.run(IsolationRunner.java:195)
at org.apache.hadoop.mapred.IsolationRunner.main(IsolationRunner.java:238)
发现是 LocalDirAllocator中出现问题,于是增加一行代码:
public boolean ifExists(String pathStr,Configuration conf) {
AllocatorPerContext context = obtainContext(contextCfgItemName);
try {
context.confChanged(conf);
} catch (IOException e) {
e.printStackTrace();
}
return context.ifExists(pathStr, conf);
}
然后执行,发现没有包含giraph相关jar包,修改hadoop 脚本中的classpath, 见 http://blog.csdn.net/cloudeagle_bupt/article/details/50916686
然后可以执行:
[liuqiang2@mu02 work]$ pwd
/opt/hadoop-1.2.1/tmp/mapred/taskTracker/liuqiang2/jobcache/job_201603171947_0001/attempt_201603171947_0001_m_000001_0/work
[liuqiang2@mu02 work]$ hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml
结果:
[liuqiang2@mu02 work]$ hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml
Listening for transport dt_socket at address: 8792
16/03/17 20:40:19 WARN bsp.BspOutputFormat: getOutputCommitter: Returning ImmutableOutputCommiter (does nothing).
16/03/17 20:40:19 INFO util.ProcessTree: setsid exited with exit code 0
16/03/17 20:40:19 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@13b3625
16/03/17 20:40:19 INFO mapred.MapTask: Processing split: 'org.apache.giraph.bsp.BspInputSplit, index=-1, num=-1
16/03/17 20:40:19 INFO graph.GraphTaskManager: setup: Log level remains at info
16/03/17 20:40:19 INFO zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201603171947_0001
16/03/17 20:40:19 INFO zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201603171947_0001/_zkServer
16/03/17 20:40:19 INFO zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201603171947_0001/_task/mu02 1
16/03/17 20:40:19 INFO zk.ZooKeeperManager: getZooKeeperServerList: For task 1, got file 'zkServerList_mu02 0 ' (polling period is 3000)
16/03/17 20:40:19 INFO zk.ZooKeeperManager: getZooKeeperServerList: Found [mu02, 0] 2 hosts in filename 'zkServerList_mu02 0 '
16/03/17 20:40:19 INFO zk.ZooKeeperManager: onlineZooKeeperServers: Got [mu02] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
16/03/17 20:40:19 INFO graph.GraphTaskManager: setup: Starting up BspServiceWorker...
16/03/17 20:40:19 INFO bsp.BspService: BspService: Path to create to halt is /_hadoopBsp/job_201603171947_0001/_haltComputation
16/03/17 20:40:19 INFO bsp.BspService: BspService: Connecting to ZooKeeper with job job_201603171947_0001, 1 on mu02:22181
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:host.name=mu02
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_79
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.home=/home/liuqiang2/jdk/jdk1.7.0_79/jre
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/hadoop-1.2.1/libexec/../conf:/home/liu ............ 一堆jar包
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/hadoop-1.2.1/libexec/../lib/native/Linux-amd64-64
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-279.el6.x86_64
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:user.name=liuqiang2
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/liuqiang2
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:user.dir=/opt/hadoop-1.2.1/tmp/mapred/taskTracker/liuqiang2/jobcache/job_201603171947_0001/attempt_201603171947_0001_m_000001_0/work
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=mu02:22181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@623cc34d
16/03/17 20:40:19 INFO zookeeper.ClientCnxn: Opening socket connection to server mu02/192.168.0.100:22181. Will not attempt to authenticate using SASL (unknown error)
16/03/17 20:40:19 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
16/03/17 20:40:20 INFO zookeeper.ClientCnxn: Opening socket connection to server mu02/192.168.0.100:22181. Will not attempt to authenticate using SASL (unknown error)
16/03/17 20:40:21 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
16/03/17 20:40:22 INFO zookeeper.ClientCnxn: Opening socket connection to server mu02/192.168.0.100:22181. Will not attempt to authenticate using SASL (unknown error)
16/03/17 20:40:22 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
16/03/17 20:40:23 INFO zookeeper.ClientCnxn: Opening socket connection to server mu02/192.168.0.100:22181. Will not attempt to authenticate using SASL (unknown error)
16/03/17 20:40:23 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
16/03/17 20:40:24 INFO zookeeper.ClientCnxn: Opening socket connection to server mu02/192.168.0.100:22181. Will not attempt to authenticate using SASL (unknown error)
16/03/17 20:40:24 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
这里由于map任务作为子进程需要进行zookeeper通信,但是由于只是跑一个单任务,因此没法继续运行,但是单任务测试的目的已达到。