NameNode重要性
- 原因
– NameNode是HDFS的核心配置,HDFS又是Hadoop的核心组件,NameNode在Hadoop集群中至关重要
– NameNode宕机,将导致集群不可用,如果NameNode数据丢失将导致整个集群的数据丢失,而NameNode的数据的更新又比较频繁,实现NameNode高可用势在必行
解决方案
- 官方提供了两种解决方案
– HDFS with NFS
– HDFS with QJM - 两种方案异同
NFS | QJM |
---|---|
NN | NN |
ZK | ZK |
ZKFailoverController | ZKFailoverController |
NFS | JournalNode |
- HA方案对比
– 都能实现热备
– 都是一个Active NN和一个 Standly NN
– 都使用Zookeeper和ZKFS来实现自动失效恢复
– 失效切换都使用Fencin配置的方法来Active NN
– NFS 数据共享变更方案把数据存储在共享存储里,我们还需要考虑NFS的高可用设计
– QJM不需要共享存储,但需要让每个DN都知道两个NN的位置,并把块信息和心跳包发送给Active个Standby这两个NN
使用方案
- 使用原因(QJM)
– 解决NameNode单点故障问题
– Hadoop给出了HDFS的高可用HA方案 : HDFS通常由两个NameNode组成,一个处于Active状态,另一个处于Standby状态.Active NameNode对外提供服务,比如处理来自客户端的RPC请求,而Standby NameNode则不对外提供服务,仅同步Active NameNode的状态,以便能够在它失败时进行切换 - 典型的HA集群
– NameNode会被配置在两台独立的机器上,在任何时候,一个NameNode处于活动状态,而另一个NameNode则处于备份状态
– 活动状态的NameNode会响应集群中所用的客户端,备份状态的NameNode只是作为一个副本,保证在必要的时候提供一个快速的转移
NameNode高可用
- NameNode高可用架构
– 为让Standby Node与Active Node保持同步,这两个Node都与一组称为JNS的互相独立的进程保持通信(Journal Nodes).当Active Node更新了namespace,它将记录修改日志发送给JNS的多数派.Standby Node将会从JNS中读取这些edits,并持续关注它们对日志的变更
– Standby Node将日志变更应用在自己的NameSpace中,当Failover发生时.Standby将会在提升自己为Active之前,确保能够从JNS中读取所有的edits,即在Failover发生之前Standby持有的NameSpace与ACtive保持完全同步
– NameNode更新很频繁,为保持主备数据的一致性,为支持快速Failover,Standby Node持有集群中blocks的最新位置是非常必要的.为达到这一目的,Datanodes上需要同时配置这两个Namenode的地址,同时和它们都建立心跳连接,并把block位置发送给它们
– 任何时刻,只能有一个Active NameNode,否则会导致集群操作混乱,两个NameNode将会有两种不同的数据状态,可能会导致数据丢失或状态异常,这种情况通常成为"split-brain"(脑裂,三节点通讯阻断,即集群中不同的DataNode看到了不同的Active NameNodes)
– 对于JNS而言,任何时候只允许一个NameNode作为Writer ; 在Failover期间,原来的Standby Node将会接管Active的所有只能,并负责向JNS写入日志记录,这种机制阻止了其他NameNode处于Active状态的问题 - NameNode高可用架构图
- 系统规划
主机 | 角色 | 软件 |
---|---|---|
192.168.1.60 | NameNode1 | Hadoop |
192.168.1.66 | NameNode2 | Hadoop |
192.168.1.61 | DataNode和JournalNod和Zookeeper | HDFS和Zookeeper |
192.168.1.62 | DataNode和JournalNod和Zookeeper | HDFS和Zookeeper |
192.168.1.63 | DataNode和JournalNod和Zookeeper | HDFS和Zookeeper |
hadoop的高可用配置
停止hadoop所有服务
cd /usr/local/hadoop/
./sbin/stop-all.sh #停止所有服务
所有主机启动zookeeper(需要一台一台的启动)
/usr/local/zookeeper/bin/zkServer.sh start
/usr/local/zookeeper/bin/zkServer.sh status
新加一台机器hadoop02,安装java-1.8.0-openjdk-devel
所有主机修改vim /etc/hosts
vim /etc/hosts
192.168.1.60 hadoop01
192.168.1.66 hadoop02
192.168.1.61 node1
192.168.1.62 node2
192.168.1.63 node3
配置SSH信任关系
注意:hadoop01和hadoop02互相连接不需要密码,nn02连接自己和node1,node2,node3同样不需要密码
所有的主机删除/var/hadoop/*
rm -rf /var/hadoop/* #hadoop.tmp.dir的数据目录配置参数
配置 core-site
vim /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://nsdcluster</value>
//nsdcluster是随便起的名。相当于一个组,访问的时候访问这个组
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value> //zookeepe的地址
</property>
<property>
<name>hadoop.proxyuser.nfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.nfs.hosts</name>
<value>*</value>
</property>
</configuration>
配置 hdfs-site
vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>nsdcluster</value>
</property>
<property>
<name>dfs.ha.namenodes.nsdcluster</name>
//nn1,nn2名称固定,是内置的变量,nsdcluster里面有nn1,nn2
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nsdcluster.nn1</name>
//声明nn1 8020为通讯端口,是nn01的rpc通讯端口
<value>nn01:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nsdcluster.nn2</name>
//声明nn2是谁,nn02的rpc通讯端口
<value>nn02:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.nsdcluster.nn1</name>
//nn01的http通讯端口
<value>nn01:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.nsdcluster.nn2</name>
//nn01和nn02的http通讯端口
<value>nn02:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
//指定namenode元数据存储在journalnode中的路径
<value>qjournal://node1:8485;node2:8485;node3:8485/nsdcluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
//指定journalnode日志文件存储的路径
<value>/var/hadoop/journal</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.nsdcluster</name>
//指定HDFS客户端连接active namenode的java类
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name> //配置隔离机制为ssh
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name> //指定密钥的位置
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name> //开启自动故障转移
<value>true</value>
</property>
</configuration>
配置yarn-site
vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name> //rm1,rm2代表nn01和nn02
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-ha</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>nn01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>nn02</value>
</property>
</configuration>
同步到hadoop02,node1,node2,node3
for i in {61..63} 66
do
rsync -aXSH --delete /usr/local/hadoop 192.168.1.$i:/usr/local/
done
删除所有机器上面的/user/local/hadoop/logs,方便排错
for i in {60..63} 66; do ssh 192.168.1.$i rm -rf /usr/local/hadoop/logs ; done
高可用验证
初始化ZK集群
/usr/local/hadoop/bin/hdfs zkfc -formatZK
...
18/09/11 15:43:35 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/nsdcluster in ZK
#出现Successfully即为成功
...
在node1,node2,node3上面启动journalnode服务(以node1为例子)
/usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-node1.out
jps
29262 JournalNode
26895 QuorumPeerMain
29311 Jps
在hadoop01上格式化,先在node1,node2,node3上面启动journalnode才能格式化
/usr/local/hadoop//bin/hdfs namenode -format
//出现Successfully即为成功
ls /var/hadoop/
dfs
nn02数据同步到本地 /var/hadoop/dfs
cd /var/hadoop/
rsync -aXSH hadoop01:/var/hadoop /var/
ls
dfs
hadoop01上初始化 JNS
/usr/local/hadoop/bin/hdfs namenode -initializeSharedEdits
18/09/11 16:26:15 INFO client.QuorumJournalManager: Successfully started new epoch 1
#出现Successfully,成功开启一个节点
停止 journalnode 服务(node1,node2,node3)
/usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode
stopping journalnode
jps
29346 Jps
26895 QuorumPeerMain
启动集群
hadoop01上面操作
/usr/local/hadoop/sbin/start-all.sh //启动所有集群
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [nn01 nn02]
nn01: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-nn01.out
nn02: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-nn02.out
node2: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-node2.out
node3: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-node3.out
node1: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-node1.out
Starting journal nodes [node1 node2 node3]
node1: starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-node1.out
node3: starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-node3.out
node2: starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-node2.out
Starting ZK Failover Controllers on NN hosts [nn01 nn02]
nn01: starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-nn01.out
nn02: starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-nn02.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-nn01.out
node2: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-node2.out
node1: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-node1.out
node3: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-node3.out
hadoop02上面操作
/usr/local/hadoop/bin/hdfs haadmin -getServiceState nn1
active
/usr/local/hadoop/bin/hdfs haadmin -getServiceState nn2
standby
/usr/local/hadoop/bin/yarn rmadmin -getServiceState rm1
active
/usr/local/hadoop/bin/yarn rmadmin -getServiceState rm2
standby
查看节点是否加入
/usr/local/hadoop/bin/hdfs dfsadmin -report
...
Live datanodes (3): //会有三个节点
...
/usr/local/hadoop/bin/yarn node -list
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
node2:43307 RUNNING node2:8042 0
node1:34606 RUNNING node1:8042 0
node3:36749 RUNNING node3:8042 0
访问集群
查看并创建
/usr/local/hadoop/bin/hadoop fs -ls /
/usr/local/hadoop/bin/hadoop fs -mkdir /aa #创建aa
/usr/local/hadoop/bin/hadoop fs -ls / #再次查看
Found 1 items
drwxr-xr-x - root supergroup 0 2018-09-11 16:54 /aa
/usr/local/hadoop/bin/hadoop fs -put *.txt /aa
/usr/local/hadoop/bin/hadoop fs -ls hdfs://nsdcluster/aa
#也可以这样查看
Found 3 items
-rw-r--r-- 2 root supergroup 86424 2018-09-11 17:00 hdfs://nsdcluster/aa/LICENSE.txt
-rw-r--r-- 2 root supergroup 14978 2018-09-11 17:00 hdfs://nsdcluster/aa/NOTICE.txt
-rw-r--r-- 2 root supergroup 1366 2018-09-11 17:00 hdfs://nsdcluster/aa/README.txt
验证高可用,关闭 active namenode
/usr/local/hadoop/bin/hdfs haadmin -getServiceState nn1
active
/usr/local/hadoop/sbin/hadoop-daemon.sh stop namenode
stopping namenode
/usr/local/hadoop/bin/hdfs haadmin -getServiceState nn1
#再次查看会报错
/usr/local/hadoop/bin/hdfs haadmin -getServiceState nn2
#nn02由之前的standby变为active
active
/usr/local/hadoop/bin/yarn rmadmin -getServiceState rm1
active
/usr/local/hadoop/sbin/yarn-daemon.sh stop resourcemanager
#停止resourcemanager
/usr/local/hadoop/bin/yarn rmadmin -getServiceState rm2
active
恢复节点
/usr/local/hadoop/sbin/hadoop-daemon.sh start namenode
#启动namenode
/usr/local/hadoop/sbin/yarn-daemon.sh start resourcemanager
#启动resourcemanager
/usr/local/hadoop/bin/hdfs haadmin -getServiceState nn1
#查看
standby
/usr/local/hadoop/bin/yarn rmadmin -getServiceState rm1
#查看
standby