Hadoop2.0 HA 集群搭建步骤
一、准备
1.新建六台虚拟机
1.1 采用 clone 的方式快速新增六台虚拟机
[url=http://jingyan.baidu.com/article/6b97984d9798f11ca2b0bfcd.html]安装方式[/url]
采用 clone 方式的优势,可以免去在原主机中软件的安装问题
clone 后主机中的这些软件依然可以使用
选择完全克隆
若安装错误,则删除系统重新安装
[url=http://jingyan.baidu.com/article/c275f6bad61838e33c75676a.html]删除VM上安装的系统[/url]
1.2 网络配置
VM -- 选择克隆后的虚拟机 -- 右键 -- 设置 -- 网络 -- 高级 -- MAC 地址
此时的地址时VM 分配给 克隆机的MAC地址
但克隆后会拷贝原机器的MAC地址信息在网络配置文件中
[url=http://www.cnblogs.com/raphael5200/p/5114727.html]克隆后的虚拟机如何配置网络连接[/url]
vi /etc/sysconfig/network-scripts/ifcfg-eth0
删除UUID 、 MAC地址配置
配置 IP地址
rm -rf /etc/udev/rules.d/70-persistent-net.rules
删除已有映射,重启机器后,重新读取当前机器的映射
vi /etc/sysconfig/network
配置hostname
shutdown -r now
重启机器
2.关闭防火墙
service iptables stop
chkconfig iptables off
3.修改host
vim /etc/hosts
编辑hosts文件,需要重启后生效
若不想重启,则 hostname xxxx 临时生效
4.免密设置
生成秘钥
ssh-keygen
回车,其他无需输入,默认即可
[img]http://dl2.iteye.com/upload/attachment/0125/3829/bfb66a02-56b4-324b-bd55-a77818141fd7.png[/img]
生成的文件:/root/.ssh/id_rsa
配置免密登录
ssh-copy-id -i /root/.ssh/id_rsa.put root@linux05
配置每台机器之间相互免密登录
5.文件传输
scp -r /etc/hosts
192.168.76.151 linux04
192.168.76.152 linux05
192.168.76.153 linux06
192.168.76.154 linux07
192.168.76.155 linux08
192.168.76.156 linux09
保持每台虚拟机上的host配置文件相同
二、集群节点分配
Park01
Zookeeper
NameNode(active)
Resourcemanager (active)
Park02
Zookeeper
NameNode (standby)
Park03
Zookeeper
ResourceManager(standby)
Park04
DataNode
NodeManager
JournalNode
Park05
DataNode
NodeManager
JournalNode
Park06
DataNode
NodeManager
JournalNode
三、安装 zookeeper
配置机器:前三台机器
linux04/linux05/linux06
上传安装包至 linux04
解压缩
tar -zxvf zookeeper.tar.gz
进入安装目录
cd /zookeeper/conf
重命名
cp zoo_example.cfg zoo.cfg
编辑配置文件
vim zoo.cfg
[img]http://dl2.iteye.com/upload/attachment/0125/3833/0944eb36-bd1c-3463-80e9-0c60eb84cf51.png[/img]
建立文件夹
cd ..
mkdir tmp
cd tmp
vim myid
1
表示第一台机器,即zookeeper的选举ID
scp -r /zookeeper root@linux05
scp -r /zookeeper root@linux06
将配置好的zookeeper发送到其他两台机器上
传输结束后,vim myid 分别改为 2 、 3
四、安装hadoop
安装机器:linux04
上传安装包并解压
cd ./hadoop/etc/hadoop
vim hadoop-env.sh
JDK配置路径
export JAVA_HOME=/usr/java/jdk1.8.0_121
配置文件路径
export HADOOP_CONF_DIR=/usr/local/software/hadoop-2.7.1/etc/hadoop
立即生效
source hadoop-env.sh
vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/software/hadoop/usr/local/software/hadoop-2.7.1/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>linux04:2181,linux05:2181,linux06:2181</value>
</property>
</configuration>
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>ns</value>
</property>
<property>
<name>dfs.ha.namenodes.ns</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns.nn1</name>
<value>linux04:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns.nn1</name>
<value>linux04:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns.nn2</name>
<value>linux05:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns.nn2</name>
<value>linux05:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value> qjournal://linux07:8485;linux08:8485;linux09:8485/ns</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/software/hadoop-2.7.1/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<property>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns</name>
<value> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/software/hadoop-2.7.1/tmp/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/software/hadoop-2.7.1/tmp/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<value>file:///usr/local/software/hadoop-2.7.1/tmp/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/software/hadoop-2.7.1/tmp/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
vim yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>linux04</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>linux06</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore </value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>linux04:2181,linux05:2181,linux06:2181</value>
<description>For multiple zk services, separate them with comma</description>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-ha</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>linux04</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
配置slave文件
linux07
linux08
linux09
配置环境变量
vim /etc/profile
创建目录
mkdir journal
mkdir tmp
cd tmp
mkdir datanode
mkdir namenode
将配置好的hadoop发往其他5台机器
scp -r hadoop root@linux05 ...
scp -r /etc/profile root@linux05 ...
source /etc/profile
五、启动集群
1.启动三台zookeeper
cd zookeeper
cd bin
sh zkServer.sh start
检验启动是否成功
sh zkServer.sh status
2.在 zookeeper 中 leader 的节点上 执行
hdfs zkfc -formatZK
3.在 linux07 linux08 linux09 上启动 journalnode
cd /hadoop/sbin
sh hadoop-daemons.sh start journalnode
jps
查看 journalnode 已启动
4.linux 04
namenode 格式化
hadoop namenode -format
启动
hadoop-daemon.sh start namenode
5.linux 05
hdfs namenode-bootstrapStandby
hadoop-daemon.sh start namenode
6.linux 07 08 09
启动 dataNode
hadoop-daemon.sh start datanode
7.linux 04 05
hadoop-daemon.sh start zkfc
8.linux 04
start-yarn.sh
9.linux 06
yarn-daemon.sh start resourcemanager
六、
linux01:50070
linux01:8088
一、准备
1.新建六台虚拟机
1.1 采用 clone 的方式快速新增六台虚拟机
[url=http://jingyan.baidu.com/article/6b97984d9798f11ca2b0bfcd.html]安装方式[/url]
采用 clone 方式的优势,可以免去在原主机中软件的安装问题
clone 后主机中的这些软件依然可以使用
选择完全克隆
若安装错误,则删除系统重新安装
[url=http://jingyan.baidu.com/article/c275f6bad61838e33c75676a.html]删除VM上安装的系统[/url]
1.2 网络配置
VM -- 选择克隆后的虚拟机 -- 右键 -- 设置 -- 网络 -- 高级 -- MAC 地址
此时的地址时VM 分配给 克隆机的MAC地址
但克隆后会拷贝原机器的MAC地址信息在网络配置文件中
[url=http://www.cnblogs.com/raphael5200/p/5114727.html]克隆后的虚拟机如何配置网络连接[/url]
vi /etc/sysconfig/network-scripts/ifcfg-eth0
删除UUID 、 MAC地址配置
配置 IP地址
rm -rf /etc/udev/rules.d/70-persistent-net.rules
删除已有映射,重启机器后,重新读取当前机器的映射
vi /etc/sysconfig/network
配置hostname
shutdown -r now
重启机器
2.关闭防火墙
service iptables stop
chkconfig iptables off
3.修改host
vim /etc/hosts
编辑hosts文件,需要重启后生效
若不想重启,则 hostname xxxx 临时生效
4.免密设置
生成秘钥
ssh-keygen
回车,其他无需输入,默认即可
[img]http://dl2.iteye.com/upload/attachment/0125/3829/bfb66a02-56b4-324b-bd55-a77818141fd7.png[/img]
生成的文件:/root/.ssh/id_rsa
配置免密登录
ssh-copy-id -i /root/.ssh/id_rsa.put root@linux05
配置每台机器之间相互免密登录
5.文件传输
scp -r /etc/hosts
192.168.76.151 linux04
192.168.76.152 linux05
192.168.76.153 linux06
192.168.76.154 linux07
192.168.76.155 linux08
192.168.76.156 linux09
保持每台虚拟机上的host配置文件相同
二、集群节点分配
Park01
Zookeeper
NameNode(active)
Resourcemanager (active)
Park02
Zookeeper
NameNode (standby)
Park03
Zookeeper
ResourceManager(standby)
Park04
DataNode
NodeManager
JournalNode
Park05
DataNode
NodeManager
JournalNode
Park06
DataNode
NodeManager
JournalNode
三、安装 zookeeper
配置机器:前三台机器
linux04/linux05/linux06
上传安装包至 linux04
解压缩
tar -zxvf zookeeper.tar.gz
进入安装目录
cd /zookeeper/conf
重命名
cp zoo_example.cfg zoo.cfg
编辑配置文件
vim zoo.cfg
[img]http://dl2.iteye.com/upload/attachment/0125/3833/0944eb36-bd1c-3463-80e9-0c60eb84cf51.png[/img]
建立文件夹
cd ..
mkdir tmp
cd tmp
vim myid
1
表示第一台机器,即zookeeper的选举ID
scp -r /zookeeper root@linux05
scp -r /zookeeper root@linux06
将配置好的zookeeper发送到其他两台机器上
传输结束后,vim myid 分别改为 2 、 3
四、安装hadoop
安装机器:linux04
上传安装包并解压
cd ./hadoop/etc/hadoop
vim hadoop-env.sh
JDK配置路径
export JAVA_HOME=/usr/java/jdk1.8.0_121
配置文件路径
export HADOOP_CONF_DIR=/usr/local/software/hadoop-2.7.1/etc/hadoop
立即生效
source hadoop-env.sh
vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/software/hadoop/usr/local/software/hadoop-2.7.1/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>linux04:2181,linux05:2181,linux06:2181</value>
</property>
</configuration>
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>ns</value>
</property>
<property>
<name>dfs.ha.namenodes.ns</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns.nn1</name>
<value>linux04:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns.nn1</name>
<value>linux04:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns.nn2</name>
<value>linux05:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns.nn2</name>
<value>linux05:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value> qjournal://linux07:8485;linux08:8485;linux09:8485/ns</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/software/hadoop-2.7.1/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<property>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns</name>
<value> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/software/hadoop-2.7.1/tmp/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/software/hadoop-2.7.1/tmp/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<value>file:///usr/local/software/hadoop-2.7.1/tmp/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/software/hadoop-2.7.1/tmp/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
vim yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>linux04</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>linux06</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore </value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>linux04:2181,linux05:2181,linux06:2181</value>
<description>For multiple zk services, separate them with comma</description>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-ha</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>linux04</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
配置slave文件
linux07
linux08
linux09
配置环境变量
vim /etc/profile
创建目录
mkdir journal
mkdir tmp
cd tmp
mkdir datanode
mkdir namenode
将配置好的hadoop发往其他5台机器
scp -r hadoop root@linux05 ...
scp -r /etc/profile root@linux05 ...
source /etc/profile
五、启动集群
1.启动三台zookeeper
cd zookeeper
cd bin
sh zkServer.sh start
检验启动是否成功
sh zkServer.sh status
2.在 zookeeper 中 leader 的节点上 执行
hdfs zkfc -formatZK
3.在 linux07 linux08 linux09 上启动 journalnode
cd /hadoop/sbin
sh hadoop-daemons.sh start journalnode
jps
查看 journalnode 已启动
4.linux 04
namenode 格式化
hadoop namenode -format
启动
hadoop-daemon.sh start namenode
5.linux 05
hdfs namenode-bootstrapStandby
hadoop-daemon.sh start namenode
6.linux 07 08 09
启动 dataNode
hadoop-daemon.sh start datanode
7.linux 04 05
hadoop-daemon.sh start zkfc
8.linux 04
start-yarn.sh
9.linux 06
yarn-daemon.sh start resourcemanager
六、
linux01:50070
linux01:8088