暂停了好长一段时间,终于可以继续大数据的学习了,今天要学习的是HDFS集群自动故障切换的知识,学习本部分内容,需要提前了解ZooKeeper和HDFS HA QJM相关知识点。

       

        Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  • Failure detection - each of the NameNode machines in the cluster  maintains a persistent session in ZooKeeper. If the machine crashes, the  ZooKeeper session will expire, notifying the other NameNode that a failover  should be triggered.

  • Active NameNode election - ZooKeeper provides a simple mechanism to  exclusively elect a node as active. If the current active NameNode crashes,  another node may take a special exclusive lock in ZooKeeper indicating that  it should become the next active.

    

        The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  • Health monitoring - the ZKFC pings its local NameNode on a periodic  basis with a health-check command. So long as the NameNode responds in a  timely fashion with a healthy status, the ZKFC considers the node  healthy. If the node has crashed, frozen, or otherwise entered an unhealthy  state, the health monitor will mark it as unhealthy.

  • ZooKeeper session management - when the local NameNode is healthy, the  ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it  also holds a special “lock” znode. This lock uses ZooKeeper’s support for  “ephemeral” nodes; if the session expires, the lock node will be  automatically deleted.

  • ZooKeeper-based election - if the local NameNode is healthy, and the  ZKFC sees that no other node currently holds the lock znode, it will itself  try to acquire the lock. If it succeeds, then it has “won the election”, and  is responsible for running a failover to make its local NameNode active. The  failover process is similar to the manual failover described above: first,  the previous active is fenced if necessary, and then the local NameNode  transitions to active state.


        配置HDFS自动故障切换功能,必须先停止HDFS集群,以下为具体操作步骤:


1、在hadoop01服务器上编辑hdfs-site.xml文件。下面是hdfs-site.xml文件的完整粘贴,蓝色字体部分为本次添加或修改的内容。

[hadoop@hadoop01 hadoop]$ pwd
/home/hadoop/hadoop-2.7.2/etc/hadoop
[hadoop@hadoop01 hadoop]$ vi hdfs-site.xml


<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <!-- add start 20160712 -->
   <property>
     <name>dfs.nameservices</name>
     <value>mycluster</value>
   </property>
   <property>
     <name>dfs.ha.namenodes.mycluster</name>
     <value>nn1,nn2</value>
   </property>

   <property>
     <name>dfs.namenode.rpc-address.mycluster.nn1</name>
     <value>hadoop01:8020</value>
   </property>
   <property>
     <name>dfs.namenode.rpc-address.mycluster.nn2</name>
     <value>hadoop02:8020</value>
   </property>

   <property>
     <name>dfs.namenode.http-address.mycluster.nn1</name>
     <value>hadoop01:50070</value>
   </property>
   <property>
     <name>dfs.namenode.http-address.mycluster.nn2</name>
     <value>hadoop02:50070</value>
   </property>

  <property>
     <name>dfs.namenode.shared.edits.dir</name>
     <value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/mycluster</value>
  </property>

  <property>
     <name>dfs.client.failover.proxy.provider.mycluster</name>
     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>

  <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
  </property>

  <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_dsa</value>
  </property>

  <!-- add end 20160712 -->

  <!-- add start 20160713 -->
  <property>
     <name>dfs.ha.automatic-failover.enabled</name>
     <value>true</value>

  </property>

  <!-- add end 20160713 -->

    <!-- add start 20160623 -->
    <property>
            <name>dfs.replication</name>
            <!-- modify start 20160627
            <value>1</value>  -->
            <!-- modify start 20170712
            <value>2</value>-->
            <value>3</value>
            <!-- modify end 20170712-->
            <!-- modify end 20160627 -->
    </property>
    <!-- add end  20160623 -->

    <!-- add start 20160627 -->
    <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/home/hadoop/dfs/name</value>
    </property>
    <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/home/hadoop/dfs/data</value>
    </property>
    <!-- add end by  20160627 -->
</configuration>


2、在hadoop01服务器上编辑core-site.xml文件。下面是core-site.xml文件的完整粘贴,蓝色字体部分为本次添加或修改的内容。

[hadoop@hadoop01 hadoop]$ pwd
/home/hadoop/hadoop-2.7.2/etc/hadoop
[hadoop@hadoop01 hadoop]$ vi core-site.xml


<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <!--add start 20160623 -->
    <property>
            <name>fs.defaultFS</name>
            <!-- modify start 20160627
            <value>hdfs://localhost:9000</value>   -->
            <!--modify start 20160712
            <value>hdfs://hadoop01:9000</value> -->
            <value>hdfs://mycluster</value>
            <!-- modify end 20160712 -->
            <!-- modify end -->
    </property>
    <!--add end 20160623 -->

    <!-- add start 20160712 -->
    <property>
       <name>dfs.journalnode.edits.dir</name>
       <value>/home/hadoop/dfs/journaldata</value>
    </property>
    <!-- add end 20160712 -->


    <!-- add start 20160713 -->
    <property>
       <name>ha.zookeeper.quorum</name>
       <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>

    <!-- add end 20160713 -->

    <!--add start 20160627 -->
    <property>
            <name>io.file.buffer.size</name>
            <value>131072</value>
    </property>
    <property>
            <name>hadoop.tmp.dir</name>
            <value>file:/home/hadoop/tmp</value>
    </property>
    <!--add end by 20160627 -->
</configuration>

3、将hadoop01机器上配置好的 hdfs-site.xml 和 core-site.xml 两个文件复制hadoop02、hadoop03服务器上。

[hadoop@hadoop01 hadoop]$ pwd
/home/hadoop/hadoop-2.7.2/etc/hadoop
[hadoop@hadoop01 hadoop]$ scp hdfs-site.xml core-site.xml hadoop02:$PWD
hdfs-site.xml                                                             100% 2973     2.9KB/s   00:00   
core-site.xml                                                             100% 1906     1.9KB/s   00:00   
[hadoop@hadoop01 hadoop]$ scp hdfs-site.xml core-site.xml hadoop03:$PWD
hdfs-site.xml                                                             100% 2973     2.9KB/s   00:00   
core-site.xml                                                             100% 1906     1.9KB/s   00:00   
[hadoop@hadoop01 hadoop]$

4、启动ZooKeeper服务。

#已zookeeper身份,登录hadoop01服务器

[zookeeper@hadoop01 ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/zookeeper/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED


#已zookeeper身份,登录hadoop02服务器

[zookeeper@hadoop02 ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/zookeeper/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED


#已zookeeper身份,登录hadoop03服务器

[zookeeper@hadoop03 ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/zookeeper/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED


5、初始化zookeeper配置,执行以下命令:

[hadoop@hadoop01 ~]$ hdfs zkfc -formatZK

16/07/03 10:25:13 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at hadoop01/192.168.0.201:8020
16/07/03 10:25:14 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
16/07/03 10:25:14 INFO zookeeper.ZooKeeper: Client environment:host.name=hadoop01
16/07/03 10:25:14 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_92
16/07/03 10:25:14 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
16/07/03 10:25:14 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.8.0_92/jre

。。。

。。。

Proceed formatting /hadoop-ha/mycluster? (Y or N) Y

16/07/03 10:25:21 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/mycluster from ZK...
16/07/03 10:25:21 INFO ha.ActiveStandbyElector: Successfully deleted /hadoop-ha/mycluster from ZK.
16/07/03 10:25:22 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
16/07/03 10:25:22 INFO zookeeper.ZooKeeper: Session: 0x155ae929cb60000 closed
16/07/03 10:25:22 INFO zookeeper.ClientCnxn: EventThread shut down

6、执行start-all.sh命令,启动HDFS HA 集群,两台服务器分别为 Active状态和Standby状态。

[hadoop@hadoop01 ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hadoop01 hadoop02]
hadoop02: starting namenode, ......
hadoop01: starting namenode, ......
hadoop02: starting datanode, ......
hadoop01: starting datanode, ......
hadoop03: starting datanode, ......
Starting journal nodes [hadoop01 hadoop02 hadoop03]
hadoop01: starting journalnode, ......
hadoop02: starting journalnode, ......
hadoop03: starting journalnode, ......
Starting ZK Failover Controllers on NN hosts [hadoop01 hadoop02]
hadoop02: starting zkfc, ......
hadoop01: starting zkfc, ......
starting yarn daemons
starting resourcemanager, ......
hadoop01: starting nodemanager, ......
hadoop02: starting nodemanager, ......
hadoop03: starting nodemanager, ......

7、确认三台服务器上启动的进程情况:

#进入hadoop01服务器执行jps命令

[hadoop@hadoop01 ~]$ jps
2882 NameNode
4034 Jps
2995 DataNode
3466 ResourceManager
3578 NodeManager
3371 DFSZKFailoverController
3213 JournalNode


#进入hadoop02服务器执行jps命令

[hadoop@hadoop02 ~]$ jps
2791 NameNode
2955 JournalNode
3068 DFSZKFailoverController
3421 Jps
3166 NodeManager
2862 DataNode


#进入hadoop03服务器执行jps命令

[hadoop@hadoop03 ~]$ jps
2946 NodeManager
3159 Jps
2792 DataNode
2863 JournalNode

8、登录web,查看两台服务器上namenode状态:

     .hadoop01服务器上的Namenode状态为:Active

wKiom1gCORWTwjVIAAHS8byX5zE013.jpg-wh_50

     hadoop02服务器上的Namenode状态为:Standby

wKiom1gCOZHTvDwUAAHL8GzOI-Y467.jpg-wh_50

9、杀死hadoop01服务器上的Namenode进程,确认故障转移可以自动触发。

[hadoop@hadoop01 ~]$ jps|grep NameNode
nnnn  NameNode
[hadoop@hadoop01 ~]$ kill -9 nnnn

10、模拟故障发生后,检查两台服务器上的Namenode状态:

     .hadoop01提示为无法访问。

wKioL1gCO5CSyJjSAADZ6hvKNys451.jpg-wh_50


        .查看hadoop02服务器,Namenode已自动切换为 Active。

wKioL1gCPCji4IJWAAHOuiwsfQY545.jpg-wh_50


11、单独启动Namenode命令

[hadoop@hadoop01 ~]$ hadoop-daemon.sh start namenode

starting namenode, logging to /home/hadoop/hadoop-2.7.2//logs/hadoop-hadoop-namenode-hadoop01.out