一、结构图
主机名 进程 | Name Node | Resource Manager | Node Manager | Data Node |
hadoop0 | Y | Y | N | N |
hadoop1 | N | N | Y | Y |
hadoop2 | N | N | Y | Y |
主机名 进程 | Master | Zookeeper | Region Server |
hadoop0 | Y | Y | N |
hadoop1 | backup | Y | Y |
hadoop2 | N | Y | Y |
主机名 | IP地址 | 用户/密码 |
hadoop0 | 192.168.56.101 | hadoop/hadoop |
hadoop1 | 192.168.56.102 | hadoop/hadoop |
hadoop2 | 192.168.56.103 | hadoop/hadoop |
二、实验环境:
Windows7 64位操作系统+virtualbox虚拟机环境+3台centos6.5 64位虚拟机+jdk-7u67-linux-x64.gz+hadoop-2.5.1.tar.gz+hbase-0.98.6.1-hadoop2-bin.tar.gz
注:实验环境很是重要,一定注意版本号的对应,当然不排除有一定的兼容性,但也应该注意这个问题。
三、实验准备:
①安照表格所示进行设置三台Centos主机名、IP地址和hadoop用户及其密码;
并在三台主机上修改/etc/hosts,修改为如下所示:
[hadoop@hadoop0 .ssh]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.56.101 hadoop0
192.168.56.102 hadoop1
192.168.56.103 hadoop2
注:为防止iptables和selinuxe的过滤作用,将它们关闭。
service iptables stop;setenforce 0
②配置三台主机在不用输入密码的情况下可以互相SSH登录:
以在hadoop0上为例:
[hadoop@hadoop0 ~]$ service sshd start
[hadoop@hadoop0 ~]$ ssh-keygen(然后全部默认,直接敲击回车即可。)
即可在~/.ssh中生成一对公钥和私钥:
[hadoop@hadoop0 .ssh]$ ls
id_rsa id_rsa.pub
并把公钥放到authorized_keys文件(权限为644)中:
[hadoop@hadoop0 .ssh]$ cat id_rsa.pub >>authorized_keys
[hadoop@hadoop0 .ssh]$ chmod 644 authorized_keys
类似在hadoop1、hadoop2上生成公钥和私钥,一定要在hadoop用户中生成,并都放入到一个authorized_keys文件中,再放到三台主机中:
[hadoop@hadoop0 .ssh]$ ls
authorized_keys hadoop1_pub_key hadoop2_pub_key id_rsa id_rsa.pub
[hadoop@hadoop0 .ssh]$ cat hadoop1_pub_key >>authorized_keys
[hadoop@hadoop0 .ssh]$ cat hadoop2_pub_key >>authorized_keys
[hadoop@hadoop0 .ssh]$ scp authorized_keys hadoop@hadoop1:/home/hadoop/.ssh/
The authenticity of host 'hadoop1 (192.168.56.102)' can't be established.
RSA key fingerprint is 4c:a7:c7:70:a1:d5:c4:be:76:4d:f8:33:5b:99:7f:ac.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop1,192.168.56.102' (RSA) to the list of known hosts.
hadoop@hadoop1's password:
authorized_keys 100% 1188 1.2KB/s 00:00
[hadoop@hadoop0 .ssh]$ scp authorized_keys hadoop@hadoop2:/home/hadoop/.ssh/
用ssh命令登录hadoop1测试:
[hadoop@hadoop0 .ssh]$ ssh hadoop1
Last login: Thu Oct 23 20:28:57 2014 from hadoop0
[hadoop@hadoop1 ~]$
无密码登录成功,证明成功。
③在三台主机中配置相同版本的jdk环境,顺便将要用到的软件包也都放入主机中:
还是在hadoop0中为例,但需要在三台主机上做同样动作:
若原来环境中有java,可以用:yum erase java命令删除;
把jdk7u67-linux-x64.gz、hadoop-2.5.1.tar.gz、hbase-0.98.6.1-hadoop2-bin.tar.gz放入/usr/local/目录下,并解压,再用软连接java、hadoop、hbase指向解压后的包:
lrwxrwxrwx. 1 root root 12 10月 23 20:48 hadoop -> hadoop-2.5.1
drwxr-xr-x. 9 hadoop hadoop 4096 10月 23 20:44 hadoop-2.5.1
-rwxr-x---. 1 hadoop hadoop 148199785 10月 23 20:38 hadoop-2.5.1.tar.gz
lrwxrwxrwx. 1 root root 22 10月 23 20:49 hbase -> hbase-0.98.6.1-hadoop2
drwxr-xr-x. 7 hadoop hadoop 4096 10月 23 20:44 hbase-0.98.6.1-hadoop2
-rwxr-x---. 1 hadoop hadoop 82107040 10月 23 20:38 hbase-0.98.6.1-hadoop2-bin.tar.gz
lrwxrwxrwx. 1 root root 11 10月 23 20:52 java -> jdk1.7.0_67
drwxr-xr-x. 8 uucp 143 4096 7月 26 00:51 jdk1.7.0_67
在/etc/profile文件末尾处配置java环境变量:
export JAVA_HOME=/usr/local/java
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
export PATH=$PATH:$JAVA_HOME/bin
加载环境变量:$source /etc/profile
测试:# java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
证明成功。
四、配置、启动并验证hadoop
(1)、在三台主机的hadoop用户身份下配置hadoop的配置文件(hadoop的全部配置文件都在/usr/local/hadoop/etc/hadoop/目录下,因为我将hadoop压缩包解压在/usr/local目录下,所以都在该路径下):
在hadoop-env.sh中:
export JAVA_HOME=/usr/local/java
yarn-env.sh中:
export JAVA_HOME=/usr/local/java
保证slaves文件中只有:
hadoop1
hadoop2
在core-site.xml中:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop0:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
</configuration>
在hdfs-site.xml中:
<configuration>
<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop0:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
在mapred-site.xml中:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>hadoop0:50030</value>
</property>
<property> <name>mapreduce.jobhistory.address</name> <value>hadoop0:10020</value>
</property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop0:19888</value> </property>
</configuration>
在yarn-site.xml中:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop0:8032</value>
</property>
<property> <name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop0:8030</value>
</property>
<property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoop0:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>hadoop0:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hadoop0:8088</value>
</property>
</configuration>
(2)在hadoop0中启动hadoop:
在hadoop0主机中,用hadoop的身份,在/usr/local/hadoop/目录下执行命令:
bin/hdfs namenode -format (用于格式化文件系统)
格式化成功。
sbin/start-all.sh (启动hadoop)
启动时的状态。
(3)测试
hadoop0的状态:
hadoop1的状态:
hadoop2的状态:
web页面访问:http://192.168.56.101:8088,看到有两个nodemanager活动。
web页面访问:http://192.168.56.101:50070,看到有两个datanode活动。
证明访问成功。
五、配置、启动并验证hbase在hadoop上运行。
(1)在三台主机中的/usr/local/hbase/conf中配置hbase的配置文件:(在前面我已经将hbase压缩包解压在/usr/local/下)
hbase-env.sh中:
export JAVA_HOME=/usr/local/java
hbase-site.xml中:
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop0:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop0,hadoop1,hadoop2</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>
保证regionservers中只有:
hadoop1
hadoop2
创建backup-masters文件,并在其中填入:hadoop1
(2)在hadoop0中启动hbase:
上图为启动时的状态。
(3)验证
hadoop0的启动后状态为:
hadoop1的启动后的状态为:
hadoop2的启动后的状态为:
查看web页面:http://192.168.56.101:101:60010,显示master的运行状态:
查看web地址:http://192.168.56.103:60030,显示RegionServer的运行状态:
在hadoop的目录下运行命令,查看在hdfs文件系统的中hbase数据库文件: