上篇博客介绍了hadoop-1.2.1的伪分布式安装,并成功运行了wordcount示例程序。现在我们来完成hadoop-1.2.1的完全分布模式的实现。
如果伪分布模式都已经成功了,完全分布模式将非常简单了。
let's go!
1、模拟环境
virtualbox虚拟3台centos6.4 32bit ,本想虚拟4台,但是内存爆满了,硬件是硬伤,只能退而求其次了。
虚拟3台机器后,安装好java和hadoop(安装配置请参考上一篇博客)。
2、3台虚拟机主从关系master-slave
master ip: 192.168.31.100 caixen(hostname)
slave ip: 192.168.31.101 caixen-1(hostname)
slave ip:192.168.31.102 caixen-2(hostname)
3、hadoop配置完全分布模式,只需要配置master就好了。然后通过scp命令覆盖slave,一切都ok了。
3.1 根据步骤2 的ip地址 和 主机名 进行3台虚拟机的网卡,hosts,resolv.conf进行配置。具体可参照上一篇博客。
这里需要特别注意:在master主机里面 需要添加全部slave的ip和主机名称信息,slave只要添加自己ip和主机名称就行了。如下:
---------master--------
[root@caixen ~]# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.31.100 caixen
192.168.31.101 caixen-1
192.168.31.102 caixen-2
保存并退出
------slave-------
[root@caixen-1 ~]# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.31.101 caixen-1
保存并退出
[root@caixen-2 ~]# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.31.102 caixen-2
保存并退出
3.2 配置master的 core-site.xml
[root@caixen ~]# vi /usr/hadoop/hadoop-1.2.1/conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.31.100:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/caixen/hadooptmpdir</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>
保存并退出
3.3 配置master的 hdfs-site.xml (配置为 2 因为只有2台 slave)
[root@caixen ~]# vi /usr/hadoop/hadoop-1.2.1/conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
保存并退出
3.4 配置master的 mapred-site.xml
[root@caixen ~]# vi /usr/hadoop/hadoop-1.2.1/conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.31.100:9001</value>
</property>
</configuration>
保存并退出
3.5 配置master的 masters
[root@caixen ~]# vi /usr/hadoop/hadoop-1.2.1/conf/masters
192.168.31.100
保存并退出
3.6 配置master的 slaves
[root@caixen ~]# vi /usr/hadoop/hadoop-1.2.1/conf/slaves
192.168.31.101
192.168.31.102
保存并退出
3.7 配置master与slave之间 SSH 免密码登陆
3.7.1 在master上输入 ssh-keygen 生成公钥秘钥对。(如果命令不可用,请安装openssh-server 和 openssh-client)
[root@caixen ~]# ssh-keygen -t rsa -P ''
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
56:7f:77:11:fa:ed:8b:c3:83:f2:a4:ed:9a:03:be:12 root@caixen
The key's randomart image is:
+--[ RSA 2048]----+
| . |
| . .|
| . . . |
| . . . o|
| S . ..+|
| E o . o.|
| o . .o .|
| . . o=. +. .|
| ...+*+ .o. |
+-----------------+
3.7.2 将master的id_rsa.pub 复制到 caixen-1的 authorized_keys下。
[root@caixen ~]# scp .ssh/id_rsa.pub 192.168.31.101:/root/.ssh/authorized_keys
root@192.168.31.101's password:
id_rsa.pub 100% 393 0.4KB/s 00:00
3.7.3 在slave:caixen-1中 将id_rsa.pub添加到.ssh/authorzied_keys文件里
[root@caixen-1 ~]# cat .ssh/id_rsa.pub >> .ssh/authorized_keys
[root@caixen-1 ~]# chmod 600 .ssh/authorized_keys
在master:caixen中测试无密钥登陆caixen-1
[root@caixen ~]# ssh caixen-1
Last login: Sun Feb 1 11:45:48 2015 from caixen-pc
[root@caixen-1 ~]#
成功!
3.7.4 将master的id_rsa.pub 复制到 caixen-2的 authorized_keys下。
[root@caixen ~]# scp .ssh/id_rsa.pub 192.168.31.102:/root/.ssh/authorized_keys
root@192.168.31.102's password:
id_rsa.pub 100% 393 0.4KB/s 00:00
3.7.5 在slave:caixen-2中将id_rsa.pub添加到.ssh/authorzied_keys文件里
[root@caixen-2 ~]# cat .ssh/id_rsa.pub >> .ssh/authorized_keys
[root@caixen-2 ~]# chmod 600 .ssh/authorized_keys
在master:caixen中测试无密钥登陆caixen-2
[root@caixen ~]# ssh caixen-2
Last login: Sun Feb 1 11:45:50 2015 from caixen-pc
[root@caixen-2 ~]#
成功!
3.8 将master的 hadoop-1.2.1文件夹通过scp命令 复制到 slave:caixen-1 caixen-2中。
这里直接在 caixen-1与caixen-2中直接修改core-site hdfs-site.xml mapred-site.xml 与master中的一样
4、配置完成,在master:caixen中格式化。
[root@caixen /]# hadoop namenode -format
15/02/01 13:00:01 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = caixen/192.168.31.100
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_71
************************************************************/
Re-format filesystem in /home/caixen/hadooptmpdir/dfs/name ? (Y or N) n
Format aborted in /home/caixen/hadooptmpdir/dfs/name
15/02/01 13:00:12 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at caixen/192.168.31.100
************************************************************/
注:由于以前我格式化过。所以这里就不重新格式化了。
5、启动hadoop全部服务前,jps查看当前状态。并关闭防火墙,方便可以网页访问
--------master------------
[root@caixen /]# jps
1152 Jps
[root@caixen /]# service iptables stop
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
--------slave---------
[root@caixen-1 ~]# jps
1145 Jps
[root@caixen-1 ~]# service iptables stop
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
[root@caixen-2 ~]# jps
1132 Jps
[root@caixen-2 ~]# service iptables stop
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
6、 master:caixen中 启动hadoop全部服务 start-all.sh
[root@caixen /]# start-all.sh
starting namenode, logging to /usr/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-caixen.out
192.168.31.101: starting datanode, logging to /usr/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-caixen-1.out
192.168.31.102: starting datanode, logging to /usr/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-caixen-2.out
192.168.31.100: starting secondarynamenode, logging to /usr/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-caixen.out
starting jobtracker, logging to /usr/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-caixen.out
192.168.31.101: starting tasktracker, logging to /usr/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-caixen-1.out
192.168.31.102: starting tasktracker, logging to /usr/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-caixen-2.out
[root@caixen /]# jps
1289 NameNode
1431 SecondaryNameNode
1615 Jps
1516 JobTracker
可以看到 slave也启动起来了。
接下来查看slave:caixen-1 caixen-2 中启动的服务
[root@caixen-1 ~]# jps
1342 DataNode
1339 TaskTracker
1386 Jps
[root@caixen-2 ~]# jps
1366 Jps
1242 DataNode
1319 TaskTracker
分布式搭建完成!