【hadoop集群搭建和代码调试】总结17/2/18

按照配置、执行流程
~~~~~~~~~~~~~~~~~~~~~~~~步骤1,在hadoop集群设定监听端口
【调试】监听hadoop集群9999端口,在eclipse中远程调试java代码
【具体做法】
export YARN_NODEMANAGER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y"
export YARN_RESOURCEMANAGER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y"
tail -f logs/yarn-root-resourcemanager-localhost.log
【结果】
export YARN_RESOURCEMANAGER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=9999"
[root@amaster /]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [amaster]
amaster: starting namenode, logging to /opt/hadoop/logs/hadoop-root-namenode-amaster.out
aslave-4: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-aslave-4.out
aslave-1: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-aslave-1.out
aslave-2: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-aslave-2.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/logs/yarn-root-resourcemanager-amaster.out
Listening for transport dt_socket at address: 9999
【注意】不要多次导入 export YARN_RESOURCEMANAGER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=9999",否则会出现以下问题
----------------------
【问题】输入命令 export YARN_RESOURCEMANAGER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y"
start-all.sh,出现报错:ERROR: transport error 202: bind failed: Address already in use
【具体描述】
输入命令 export YARN_RESOURCEMANAGER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y"
start-all.sh

/opt/hadoop/sbin/yarn-daemon.sh: line 103: 17912 Aborted                 nohup nice -n $YARN_NICENESS "
$HADOOP_YARN_HOME"/bin/yarn --config $YARN_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null
ERROR: transport error 202: bind failed: Address already in use
ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [debugInit.c:750]
FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)
【解决】
centos解除端口占用(如下)
[root@amaster centos]# netstat -tlnp|grep 9999
tcp        0      0 0.0.0.0:9999            0.0.0.0:*               LISTEN      11038/java          
[root@amaster centos]# 
[root@amaster centos]# 
[root@amaster centos]# kill 11038
[root@amaster centos]# netstat -tlnp|grep 9999
[root@amaster centos]# netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 10.10.89.119:19888      0.0.0.0:*               LISTEN      2008/java           
tcp        0      0 0.0.0.0:10033           0.0.0.0:*               LISTEN      2008/java           
tcp        0      0 0.0.0.0:50070           0.0.0.0:*               LISTEN      24512/java          
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      9499/sshd           
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1871/master         
tcp        0      0 10.10.89.119:10020      0.0.0.0:*               LISTEN      2008/java           
tcp        0      0 10.10.89.119:9000       0.0.0.0:*               LISTEN      24512/java          
tcp6       0      0 :::22                   :::*                    LISTEN      9499/sshd           
tcp6       0      0 ::1:25                  :::*                    LISTEN      1871/master 
----------------------
【问题】登录slave节点,输入jps命令,看不到datanode、nodemanager服务 
【具体描述】
【解决】
没有登录root
sudo passwd root来设置一下密码为xxx
然后 su root用root登陆
----------------------
日志调试
【具体做法】
修改$HADOOP_CONF_DIR/log4j.properties文件 hadoop.root.logger=ALL,console
or:
开启:export HADOOP_ROOT_LOGGER=DEBUG,console
关闭:export HADOOP_ROOT_LOGGER=INFO,console
~~~~~~~~~~~~~~~~~~~~~~~~步骤2,配置eclipse
【配置】eclipse.ini文件配置
具体做法
-startup
plugins/org.eclipse.equinox.launcher_1.3.0.v20140415-2008.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.200.v20150204-1316
-product
org.eclipse.epp.package.jee.product
--launcher.defaultAction
openFile
-showsplash
org.eclipse.platform
--launcher.XXMaxPermSize
256m
--launcher.defaultAction
-vm
/work/jdk1.7.0/bin/java
openFile
--launcher.appendVmargs
-vmargs
-Dosgi.requiredJavaVersion=1.6
-XX:MaxPermSize=256m
-Xms128m
-Xmx1024m
>>>>>>>>>>>>以上是现在版本,以下是之前版本
-startup
plugins/org.eclipse.equinox.launcher_1.3.0.v20140415-2008.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.200.v20150204-1316
-product
org.eclipse.epp.package.jee.product
--launcher.defaultAction
openFile
-showsplash
org.eclipse.platform
--launcher.XXMaxPermSize
256m
--launcher.defaultAction
openFile
--launcher.appendVmargs
-vmargs
-Dosgi.requiredJavaVersion=1.6
-XX:MaxPermSize=256m
-Xms40m
-Xmx512m
----------------------
【问题】 -Dmaven.multiModuleProjectDirectory system property is not set. Check $M2_HOME environment variable and mvn script match.
【具体描述】
【解决】
1.添加M2_HOME的环境变量
2.Preference->Java->Installed JREs->Edit 选择一个jdk,
添加  -Dmaven.multiModuleProjectDirectory=$MAVEN_HOME
~~~~~~~~~~~~~~~~~~~~~~~~步骤3,编译hadoop源码
【调试】对hadoop源码进行maven编译
【具体做法】
mvn package -Pdist,native,docs -DskipTests -Dtar
【注意】
maven编译时,第一次package成功,第二次失败。
第二次编译,需要clean后再package
clean package -Pdist -DskipTests -Dtar
clean
----------------------
【子模块】子模块重新编译
【具体做法】
cd /work/hadoop-2.7.3-src2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/
mvn package -Pdist -DskipTests -Dtar
~~~~~~~~~~~~~~~~~~~~~~~~步骤4,远程java调试
【调试】eclipse远程调试
【具体做法】
disconnect后,再重新debug。开始是suspend按钮亮,之后我在hadoop集群运行job,则跳出debug调试界面,resume按钮亮(suspend按钮灰了)

hadoop jar  /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar randomwriter -D mapreduce.randomwriter.mapsperhost=1 -D mapreduce.randomwriter.totalbytes=5000000000 /user/ubuntu/sort/5G-input

hadoop jar  /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar sort -r 4 /user/ubuntu/sort/5G-input /user/ubuntu/sort/1G-output-task48-1
【注意】
eclipse打入断点 断点该行的代码是否被执行
没执行哦  只是在断点过去后 才执行的 
【注意】
17/01/13 02:04:42 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1484209289884_0010 is still in SUBMITTED

如果调试完一个job后,不disconnect,直接再跑一个job,则会出现上面的问题。
----------------------
调试】eclipse断点调试
【具体做法】
1,专门有个断点管理的视图,在窗口->显示视图 里,里面你想对断点干啥都可以
2, window->show view->expression view来打开debug的表达式

~~~~~~~~~~~~~~~~~~~~~~~~步骤5,重启hadoop集群
重编译 】hadoop集群
【具体做法】
jar包放在/opt/hadoop/share/hadoop/yarn
----------------------
【配置】在hadoop集群配置新的调度器
【具体做法】
vi /opt/hadoop/etc/hadoop/yarn-site.xml
DependencyScheduler
scp -r /opt/hadoop/etc/hadoop/yarn-site.xml aslave-4:/opt/hadoop/etc/hadoop/
//stop-all.sh
stop-yarn.sh
export YARN_RESOURCEMANAGER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y"
//start-all.sh
start-yarn.sh

tail -f /home/hadoop/logs/yarn-centos-resourcemanager-amaster.log

hadoop jar  /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar sort -r 4 /user/ubuntu/sort/1G-input /user/ubuntu/sort/1G-output-task48-8
----------------------
【正常】启动hadoop集群,包括hdfs
【结果】
[root@amaster hadoop]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [amaster]
amaster: starting namenode, logging to /opt/hadoop/logs/hadoop-root-namenode-amaster.out
aslave-1: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-aslave-1.out
aslave-2: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-aslave-2.out
aslave-4: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-aslave-4.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/logs/yarn-root-resourcemanager-amaster.out
aslave-4: starting nodemanager, logging to /opt/hadoop/logs/yarn-root-nodemanager-aslave-4.out
aslave-2: starting nodemanager, logging to /opt/hadoop/logs/yarn-root-nodemanager-aslave-2.out
aslave-1: starting nodemanager, logging to /opt/hadoop/logs/yarn-root-nodemanager-aslave-1.out
----------------------
【问题】hadoop运行benchmark,出现报错:WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
【具体描述】
[root@amaster opt]# hadoop jar  /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar randomwriter -D mapreduce.randomwriter.mapsperhost=1 -D mapreduce.randomwriter.totalbytes=1073741800 /user/ubuntu/sort/1G-input-1
17/01/04 12:22:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/01/04 12:22:05 INFO client.RMProxy: Connecting to ResourceManager at amaster/10.10.89.119:8032
Running 1 maps.
Job started: Wed Jan 04 12:22:06 UTC 2017
17/01/04 12:22:06 INFO client.RMProxy: Connecting to ResourceManager at amaster/10.10.89.119:8032
17/01/04 12:22:06 INFO mapreduce.JobSubmitter: number of splits:1
17/01/04 12:22:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1483530318384_0002
17/01/04 12:22:07 INFO impl.YarnClientImpl: Submitted application application_1483530318384_0002
17/01/04 12:22:07 INFO mapreduce.Job: The url to track the job: http://amaster:8088/proxy/application_1483530318384_0002/
17/01/04 12:22:07 INFO mapreduce.Job: Running job: job_1483530318384_0002
17/01/04 12:22:12 INFO mapreduce.Job: Job job_1483530318384_0002 running in uber mode : false
17/01/04 12:22:12 INFO mapreduce.Job:  map 0% reduce 0%
17/01/04 12:22:25 INFO mapreduce.Job: Task Id : attempt_1483530318384_0002_m_000000_0, Status : FAILED
【解决】
我的做法:我编译时没有加native参数,所以编译完的tar包里面,没有lib/native文件夹TT。我直接把别人编译好的native库加入到我的里面,好像也可以。

搜索到的做法如下:
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/NativeLibraries.html
http://blog.csdn.net/lalaguozhe/article/details/10580727
用官方的Hadoop 2.1.0-beta安装后,每次hadoop命令进去都会抛出这样一个Warning

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-Java classes where applicable

设置logger级别,看下具体原因

export HADOOP_ROOT_LOGGER=DEBUG,console?(设为INFO可以恢复)
wrong ELFCLASS32,难道是加载的so文件系统版本不对
执行命令file libhadoop.so.1.0.0
libhadoop.so.1.0.0: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped
果然是80386,是32位的系统版本,而我的hadoop环境是64位OS
原来直接从apache镜像中下载的编译好的Hadoop版本native library都是32版本的,如果要支持64位版本,必须自己重新编译,这就有点坑爹了,要知道几乎所有的生产环境都是64位的OS
YARN官方对于native library的一段话验证了这一点
“The pre-built 32-bit i386-Linux native hadoop library is available as part of the hadoop distribution and is located in the lib/native directory?”

重新checkout source code
svn checkout http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.0-beta/
加上编译native的选项,编译时会根据当前的操作系统架构来生产相应的native库
mvn package -Pdist,native -DskipTests -Dtar
再去native文件下查看所有的file type,已经都是64位版的了,替换线上文件,WARNING消除
----------------------
【hadoop命令】查看hdfs中文件的具体存放位置
【具体做法】
[root@amaster centos]# hadoop fsck /user/ubuntu/sort/1G-input -files -blocks -locations
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://amaster:50070/fsck?ugi=root&files=1&blocks=1&locations=1&path=%2Fuser%2Fubuntu%2Fsort%2F1G-input
FSCK started by root (auth:SIMPLE) from /10.10.89.119 for path /user/ubuntu/sort/1G-input at Sat Jan 14 02:29:59 UTC 2017
/user/ubuntu/sort/1G-input <dir>
/user/ubuntu/sort/1G-input/_SUCCESS 0 bytes, 0 block(s):  OK

/user/ubuntu/sort/1G-input/part-m-00000 1077280348 bytes, 9 block(s):  OK
0. BP-1046137187-10.10.89.119-1483530277701:blk_1073742041_1217 len=134217728 repl=3 [DatanodeInfoWithStorage[10.10.89.121:50010,DS-51240f86-23d3-46d3-9e6e-92476ad23720,DISK], DatanodeInfoWithStorage[10.10.89.123:50010,DS-8e42b641-901a-451e-a60e-d4de833965fb,DISK], DatanodeInfoWithStorage[10.10.89.120:50010,DS-292772b8-9190-4863-9bc5-b5ec20352232,DISK]]
1. BP-1046137187-10.10.89.119-1483530277701:blk_1073742042_1218 len=134217728 repl=3 [DatanodeInfoWithStorage[10.10.89.121:50010,DS-51240f86-23d3-46d3-9e6e-92476ad23720,DISK], DatanodeInfoWithStorage[10.10.89.123:50010,DS-8e42b641-901a-451e-a60e-d4de833965fb,DISK], DatanodeInfoWithStorage[10.10.89.120:50010,DS-292772b8-9190-4863-9bc5-b5ec20352232,DISK]]
2. BP-1046137187-10.10.89.119-1483530277701:blk_1073742043_1219 len=134217728 repl=3 [DatanodeInfoWithStorage[10.10.89.121:50010,DS-51240f86-23d3-46d3-9e6e-92476ad23720,DISK], DatanodeInfoWithStorage[10.10.89.123:50010,DS-8e42b641-901a-451e-a60e-d4de833965fb,DISK], DatanodeInfoWithStorage[10.10.89.120:50010,DS-292772b8-9190-4863-9bc5-b5ec20352232,DISK]]
3. BP-1046137187-10.10.89.119-1483530277701:blk_1073742044_1220 len=134217728 repl=3 [DatanodeInfoWithStorage[10.10.89.121:50010,DS-51240f86-23d3-46d3-9e6e-92476ad23720,DISK], DatanodeInfoWithStorage[10.10.89.123:50010,DS-8e42b641-901a-451e-a60e-d4de833965fb,DISK], DatanodeInfoWithStorage[10.10.89.120:50010,DS-292772b8-9190-4863-9bc5-b5ec20352232,DISK]]
4. BP-1046137187-10.10.89.119-1483530277701:blk_1073742045_1221 len=134217728 repl=3 [DatanodeInfoWithStorage[10.10.89.121:50010,DS-51240f86-23d3-46d3-9e6e-92476ad23720,DISK], DatanodeInfoWithStorage[10.10.89.123:50010,DS-8e42b641-901a-451e-a60e-d4de833965fb,DISK], DatanodeInfoWithStorage[10.10.89.120:50010,DS-292772b8-9190-4863-9bc5-b5ec20352232,DISK]]
5. BP-1046137187-10.10.89.119-1483530277701:blk_1073742046_1222 len=134217728 repl=3 [DatanodeInfoWithStorage[10.10.89.121:50010,DS-51240f86-23d3-46d3-9e6e-92476ad23720,DISK], DatanodeInfoWithStorage[10.10.89.123:50010,DS-8e42b641-901a-451e-a60e-d4de833965fb,DISK], DatanodeInfoWithStorage[10.10.89.120:50010,DS-292772b8-9190-4863-9bc5-b5ec20352232,DISK]]
6. BP-1046137187-10.10.89.119-1483530277701:blk_1073742047_1223 len=134217728 repl=3 [DatanodeInfoWithStorage[10.10.89.121:50010,DS-51240f86-23d3-46d3-9e6e-92476ad23720,DISK], DatanodeInfoWithStorage[10.10.89.123:50010,DS-8e42b641-901a-451e-a60e-d4de833965fb,DISK], DatanodeInfoWithStorage[10.10.89.120:50010,DS-292772b8-9190-4863-9bc5-b5ec20352232,DISK]]
7. BP-1046137187-10.10.89.119-1483530277701:blk_1073742048_1224 len=134217728 repl=3 [DatanodeInfoWithStorage[10.10.89.121:50010,DS-51240f86-23d3-46d3-9e6e-92476ad23720,DISK], DatanodeInfoWithStorage[10.10.89.123:50010,DS-8e42b641-901a-451e-a60e-d4de833965fb,DISK], DatanodeInfoWithStorage[10.10.89.120:50010,DS-292772b8-9190-4863-9bc5-b5ec20352232,DISK]]
8. BP-1046137187-10.10.89.119-1483530277701:blk_1073742049_1225 len=3538524 repl=3 [DatanodeInfoWithStorage[10.10.89.121:50010,DS-51240f86-23d3-46d3-9e6e-92476ad23720,DISK], DatanodeInfoWithStorage[10.10.89.123:50010,DS-8e42b641-901a-451e-a60e-d4de833965fb,DISK], DatanodeInfoWithStorage[10.10.89.120:50010,DS-292772b8-9190-4863-9bc5-b5ec20352232,DISK]]

Status: HEALTHY
 Total size: 1077280348 B
 Total dirs: 1
 Total files: 2
 Total symlinks: 0
 Total blocks (validated): 9 (avg. block size 119697816 B)
 Minimally replicated blocks: 9 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 3
 Average block replication: 3.0
 Corrupt blocks: 0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes: 3
 Number of racks: 1
FSCK ended at Sat Jan 14 02:29:59 UTC 2017 in 1 milliseconds

The filesystem under path '/user/ubuntu/sort/1G-input' is HEALTHY

----------------------










  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值