Hadoop分布式集群配置总结

 

假设用2台机器配置hadoop分布式集群,192.168.11.13为主服务器namenode,192.168.11.17为数据节点datanode 
1. 配置SSH的无密码公钥 
192.168.11.13 
用root登录 
创建用户linleran:adduser linleran 
设置密码:passwd linleran 
切换用户:su linleran 
到用户linleran的目录(/home/linleran)新建文件件.ssh:mkdir .ssh 
修改目录.ssh权限为:[linleran@centos ~]$ chmod 755 .ssh 
配置SSH的无密码公钥,一路回车。 
[linleran@centos ~]$ ssh-keygen -t rsa 
Generating public/private rsa key pair. 
Enter file in which to save the key (/home/linleran/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/linleran/.ssh/id_rsa. 
Your public key has been saved in /home/linleran/.ssh/id_rsa.pub. 
The key fingerprint is: 
df:99:37:84:a1:04:34:06:60:45:b9:ce:43:af:54:77 linleran@centos.test 
进入.ssh目录,将id_rsa.pub的内容复制到authorized_keys后面。 
修改authorized_keys的权限为 [linleran@centos .ssh]$ chmod 644 authorized_keys 

192.168.11.17同样执行以内容,且将id_rsa.pub内容添加到192.168.11.13的authorized_keys后面,同时将192.168.11.13的id_rsa.pub内容添加过来,以确保2台机器相互可以ssh连接。 

2.在主服务器安装hadoop 
解压 tar zxvf hadoop-0.18.2.tar.gz 
创建目录 
/home/linleran/hadoop-0.18.2/hadoopfs/name 
/home/linleran/hadoop-0.18.2/hadoopfs/data 
/home/linleran/hadoop-0.18.2/tmp/ 
配置hadoop-env.sh设置jdk路径 export JAVA_HOME=/home/linleran/jdk1.5.0_15 
配置conf/hadoop-site.xml 

<configuration> 
   <property>   
      <name>fs.default.name</name>  
      <value>hdfs://192.168.11.13:9000/</value> 
   </property> 
   <property> 
      <name>mapred.job.tracker</name> 
      <value>192.168.11.13:9001</value> 
   </property> 
   <property> 
      <name>dfs.name.dir</name> 
      <value>/home/linleran/hadoop-0.18.2/hadoopfs/name</value> 
   </property> 
   <property> 
     <name>dfs.data.dir</name> 
     <value>/home/linleran/hadoop-0.18.2/hadoopfs/data</value> 
   </property> 
   <property> 
      <name>dfs.replication</name> 
      <value>1</value> 
   </property> 
   <property> 
      <name>hadoop.tmp.dir</name> 
      <value>/home/linleran/hadoop-0.18.2/tmp/</value> 
   </property> 
</configuration> 

配置conf/masters 
将主机ip添加 192.168.11.13 

配置conf/slaves 
将datanodeIP添加 192.168.11.17 

将配置好的hadoop通过scp部署到节点。 
scp –r /home/linleran/hadoop-0.18.2 192.168.11.17: /home/linleran/hadoop-0.18.2 

3.在主服务器启动 
格式化一个新的分布式文件系统 
bin/hadoop namenode –format 
启动服务 
bin/start-all.sh 

4.测试 
测试分布式文件系统 
mkdir test-in 
echo “hello word” > file1.txt 
echo “hello hadoop” > file2.txt 
bin/hadoop dfs –put test-in input 
bin/hadoop jar hadoop-0.18.2-examples.jar wordcount input output 
bin/hadoop dfs –get output result 
cat result/* 

5.配置过程中遇到的问题 
SSH无密码访问不成功。.ssh目录需要755权限,authorized_keys需要644权限,否则一直提示输入密码。 
Linux防火墙开着,hadoop需要开的端口需要添加,或者关掉防火墙。否则会出现节点服务器一直无法连上主服务器。节点服务器的日志不断的retry: 
INFO org.apache.hadoop.ipc.Client: Retrying connect to server… 
数据节点连不上主服务器还有可能是使用了机器名的缘故,还是使用IP地址比较稳妥。 
分布式文件系统报错,java.lang.IllegalArgumentException: Wrong FS:,hadoop-site.xml要配置正确,fs.default.name的value是hdfs://IP:端口/,mapred.job.tracker的value是IP:端口 

参考: 
http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop1/index.html 
http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop2/index.html 
http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop3/index.html 
http://blog.csdn.net/cenwenchu79/archive/2008/08/29/2847529.aspx 
http://rdc.taobao.com/blog/dw/archives/206 

 

阅读更多
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

关闭
关闭
关闭