假设用2台机器配置hadoop分布式集群,192.168.11.13为主服务器namenode,192.168.11.17为数据节点datanode
1. 配置SSH的无密码公钥
192.168.11.13
用root登录
创建用户linleran:adduser linleran
设置密码:passwd linleran
切换用户:su linleran
到用户linleran的目录(/home/linleran)新建文件件.ssh:mkdir .ssh
修改目录.ssh权限为:[linleran@centos ~]$ chmod 755 .ssh
配置SSH的无密码公钥,一路回车。
[linleran@centos ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/linleran/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/linleran/.ssh/id_rsa.
Your public key has been saved in /home/linleran/.ssh/id_rsa.pub.
The key fingerprint is:
df:99:37:84:a1:04:34:06:60:45:b9:ce:43:af:54:77 linleran@centos.test
进入.ssh目录,将id_rsa.pub的内容复制到authorized_keys后面。
修改authorized_keys的权限为 [linleran@centos .ssh]$ chmod 644 authorized_keys
192.168.11.17同样执行以内容,且将id_rsa.pub内容添加到192.168.11.13的authorized_keys后面,同时将192.168.11.13的id_rsa.pub内容添加过来,以确保2台机器相互可以ssh连接。
2.在主服务器安装hadoop
解压 tar zxvf hadoop-0.18.2.tar.gz
创建目录
/home/linleran/hadoop-0.18.2/hadoopfs/name
/home/linleran/hadoop-0.18.2/hadoopfs/data
/home/linleran/hadoop-0.18.2/tmp/
配置hadoop-env.sh设置jdk路径 export JAVA_HOME=/home/linleran/jdk1.5.0_15
配置conf/hadoop-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.11.13:9000/</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>192.168.11.13:9001</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/linleran/hadoop-0.18.2/hadoopfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/linleran/hadoop-0.18.2/hadoopfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/linleran/hadoop-0.18.2/tmp/</value>
</property>
</configuration>
配置conf/masters
将主机ip添加 192.168.11.13
配置conf/slaves
将datanodeIP添加 192.168.11.17
将配置好的hadoop通过scp部署到节点。
scp –r /home/linleran/hadoop-0.18.2 192.168.11.17: /home/linleran/hadoop-0.18.2
3.在主服务器启动
格式化一个新的分布式文件系统
bin/hadoop namenode –format
启动服务
bin/start-all.sh
4.测试
测试分布式文件系统
mkdir test-in
echo “hello word” > file1.txt
echo “hello hadoop” > file2.txt
bin/hadoop dfs –put test-in input
bin/hadoop jar hadoop-0.18.2-examples.jar wordcount input output
bin/hadoop dfs –get output result
cat result/*
5.配置过程中遇到的问题
SSH无密码访问不成功。.ssh目录需要755权限,authorized_keys需要644权限,否则一直提示输入密码。
Linux防火墙开着,hadoop需要开的端口需要添加,或者关掉防火墙。否则会出现节点服务器一直无法连上主服务器。节点服务器的日志不断的retry:
INFO org.apache.hadoop.ipc.Client: Retrying connect to server…
数据节点连不上主服务器还有可能是使用了机器名的缘故,还是使用IP地址比较稳妥。
分布式文件系统报错,java.lang.IllegalArgumentException: Wrong FS:,hadoop-site.xml要配置正确,fs.default.name的value是hdfs://IP:端口/,mapred.job.tracker的value是IP:端口
参考:
http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop1/index.html
http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop2/index.html
http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop3/index.html
http://blog.csdn.net/cenwenchu79/archive/2008/08/29/2847529.aspx
http://rdc.taobao.com/blog/dw/archives/206