近期一直在研究基于hadoop-2.2和hbase-0.96的安装配置,并结合官方文档简单的编写了一些mapreduce的例子。为了更深入了解hadoop-2.2的实现,决定研究一下如何对hadoop进行远程调试。
下面介绍利用eclipse远程调试工具进行hadoop的调试:
-
- 在Shell脚本中运行命令:export YARN_CLIENT_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8788"
- 运行mapreduce job:yarn jar hbase-demo-0.0.1-SNAPSHOT.jar,命令行会显示一条信息:Listening for transport dt_socket at address: 8788,直到有客户端连接上调试后,程序才会继续往下进行。
- 在eclipse中选择Run>Debug Configurations,在Remote Java Application下新建一个配置,选择mapreduce的实现工程,Host:[运行job机器的ip],Port:8788,然后点击debug,和server建立debug连接,进入debug状态。
备注:调试hadoop不同的程序,需要设置不同的环境变量(YARN_CLIENT_OPTS),如何确定这个环境变量呢?请看hadoop/bin/yarn的shell脚本实现。在这里可以找到需要设置的相应环境变量和对应的主程序。
# figure out which class to run if [ "$COMMAND" = "classpath" ] ; then echo $CLASSPATH exit elif [ "$COMMAND" = "rmadmin" ] ; then CLASS='org.apache.hadoop.yarn.client.cli.RMAdminCLI' YARN_OPTS="$YARN_OPTS $YARN_CLIENT_OPTS" elif [ "$COMMAND" = "application" ] ; then CLASS=org.apache.hadoop.yarn.client.cli.ApplicationCLI YARN_OPTS="$YARN_OPTS $YARN_CLIENT_OPTS" elif [ "$COMMAND" = "node" ] ; then CLASS=org.apache.hadoop.yarn.client.cli.NodeCLI YARN_OPTS="$YARN_OPTS $YARN_CLIENT_OPTS" elif [ "$COMMAND" = "resourcemanager" ] ; then CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/rm-config/log4j.properties CLASS='org.apache.hadoop.yarn.server.resourcemanager.ResourceManager' YARN_OPTS="$YARN_OPTS $YARN_RESOURCEMANAGER_OPTS" if [ "$YARN_RESOURCEMANAGER_HEAPSIZE" != "" ]; then JAVA_HEAP_MAX="-Xmx""$YARN_RESOURCEMANAGER_HEAPSIZE""m" fi elif [ "$COMMAND" = "nodemanager" ] ; then CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/nm-config/log4j.properties CLASS='org.apache.hadoop.yarn.server.nodemanager.NodeManager' YARN_OPTS="$YARN_OPTS -server $YARN_NODEMANAGER_OPTS" if [ "$YARN_NODEMANAGER_HEAPSIZE" != "" ]; then JAVA_HEAP_MAX="-Xmx""$YARN_NODEMANAGER_HEAPSIZE""m" fi elif [ "$COMMAND" = "proxyserver" ] ; then CLASS='org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer' YARN_OPTS="$YARN_OPTS $YARN_PROXYSERVER_OPTS" if [ "$YARN_PROXYSERVER_HEAPSIZE" != "" ]; then JAVA_HEAP_MAX="-Xmx""$YARN_PROXYSERVER_HEAPSIZE""m" fi elif [ "$COMMAND" = "version" ] ; then CLASS=org.apache.hadoop.util.VersionInfo YARN_OPTS="$YARN_OPTS $YARN_CLIENT_OPTS" elif [ "$COMMAND" = "jar" ] ; then CLASS=org.apache.hadoop.util.RunJar YARN_OPTS="$YARN_OPTS $YARN_CLIENT_OPTS" elif [ "$COMMAND" = "logs" ] ; then CLASS=org.apache.hadoop.yarn.client.cli.LogsCLI YARN_OPTS="$YARN_OPTS $YARN_CLIENT_OPTS" elif [ "$COMMAND" = "daemonlog" ] ; then CLASS=org.apache.hadoop.log.LogLevel YARN_OPTS="$YARN_OPTS $YARN_CLIENT_OPTS" else CLASS=$COMMAND fi