比如有backtest11,backtest12两个节点。
在secureCRT下生成的公钥密钥(密码非空),将公钥内容分别追加到各节点的~/.ssh/authorized_keys
cat '公钥内容' >>~/.ssh/authorized_keys
这时候已经可以在各个节点之间scp -P 32200(或-P 22)或者go跳转。
这时各节点上/etc/hosts也改好
127.0.0.1 localhost
172.19.102.11 backtest11
172.19.102.12 backtest12
----------
接下来,配置hadoop集群免输入口令登录
以12为master
在12上
本地
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh localhost -p 32200
这时可以登录本地了。
将公钥逐一复制到主节点以及每个从节点上
huangshaobin@backtest12:~$ scp -P32200 /home/huangshaobin/.ssh/id_dsa.pub backtest11:~/master_key
然后在11上:
cat master_key >> ~/.ssh/authorized_keys
这时测试从12登录到11验证
ssh 172.19.102.11 -p 32200
ssh backtest11 -p 32200
免输入口令,表示成功
-----------------
另外,11登录到12上是不是必须的呢?
ssh 172.19.102.12 -p 32200
ssh backtest12 -p 32200
出现下面错误:
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
8b:04:cb:64:ef:f0:4d:6a:0c:0c:00:4b:0b:1c:93:05.
Please contact your system administrator.
Add correct host key in /home/huangshaobin/.ssh/known_hosts to get rid of this message.
Offending RSA key in /home/huangshaobin/.ssh/known_hosts:3
remove with: ssh-keygen -f "/home/huangshaobin/.ssh/known_hosts" -R [172.19.102.12]:32200
RSA host key for [172.19.102.12]:32200 has changed and you have requested strict checking.
Host key verification failed.
按提示:
ssh-keygen -f "/home/huangshaobin/.ssh/known_hosts" -R [172.19.102.12]:32200
这是之前的遗留问题(12重装过1次系统)
然后再
ssh 172.19.102.12 -p 32200
ssh backtest12 -p 32200
就可以了。
-------------------------
最后,改相应的hadoopXXX/conf/hadoop-env.sh,设置ssh为32200端口,因为ssh默认是22端口,加在JAVA_HOME后面:
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.22(不一定是这个版本和路径,看情况)
export HADOOP_SSH_OPTS="-p 32200"
接着,在12上bin/start-all.sh
在各个节点上jps看到成功启用了的各个进程。
-----------------------------
20121224
但为何在12上start-all.sh或stop-all.sh都正常启动,但11上提交任务
bin/hadoop jar hadoop-examples-1.0.3.jar pi 2 5
Number of Maps = 2
Samples per Map = 5
12/12/24 12:31:40 INFO ipc.Client: Retrying connect to server: backtest12/172.19.102.12:9000. Already tried 0 time(s).
12/12/24 12:31:41 INFO ipc.Client: Retrying connect to server: backtest12/172.19.102.12:9000. Already tried 1 time(s).
12/12/24 12:31:42 INFO ipc.Client: Retrying connect to server: backtest12/172.19.102.12:9000. Already tried 2 time(s).
12/12/24 12:31:43 INFO ipc.Client: Retrying connect to server: backtest12/172.19.102.12:9000. Already tried 3 time(s).
都不成功,12上提交可成功但11没跑,这是为何??
重启后,/tmp目录下一些东西被清理,所以启不来,
在conf/core-site.xml中加上(之前被错误放到hdfs-site.xml中了):
<property>
<name>hadoop.tmp.dir</name>
<value>/home/huangshaobin/hadoop-brian/tmp</value>
<description>A base for other temporary directories.</description>
</property>
然后格式化namenode
bin/hadoop namenode -format
bin/start-all.sh
在11上:
huangshaobin@backtest11:~/hadoop-1.0.3$ bin/stop-all.sh
no jobtracker to stop
backtest11: stopping tasktracker
backtest12: stopping tasktracker
no namenode to stop
backtest11: stopping datanode
backtest12: stopping datanode
backtest11: stopping secondarynamenode
huangshaobin@backtest11:~/hadoop-1.0.3$ jps
相当于没有发现namenode(但12上的tasktrackr和datanode可以被停了)!!!!这是哪里的配置???
stop-all.sh后格式化namenode
bin/hadoop namenode -format
有必要的话格式化每个节点datanode
bin/hadoop datanode -format
输入大写“Y”
start-all.sh发现12上datanode没有起来。
单独启12DataNode:
huangshaobin@backtest12:~/hadoop-1.0.3$ bin/hadoop-daemon.sh start DataNode
starting DataNode, logging to /home/huangshaobin/hadoop-1.0.3/libexec/../logs/hadoop-huangshaobin-DataNode-backtest12.out
Exception in thread "main" java.lang.NoClassDefFoundError: DataNode
Caused by: java.lang.ClassNotFoundException: DataNode
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: DataNode. Program will exit.
哪里看日志???
hadoop安装目录下的logs目录
namenode namespaceID和datanode namespaceID不一致!!!
可参考:
http://blog.csdn.net/wh62592855/article/details/5752199
在各个节点上:
vim /home/huangshaobin/hadoop-brian/hdfs/data/current/VERSION
将namespaceID=1400508718改为1240677757,
然后stop-all.sh再start-all.sh
就可以了(单独启datanode不行吗???)。
这时看到http://172.19.102.12:50070/dfshealth.jsp
上的Live Nodes是2而不是1
在11上跑任务
bin/hadoop jar hadoop-examples-1.0.3.jar pi 2 5
可以跑了。
而且在50030上看map任务有: