4.Hadoop_HDFS2.x_高可用搭建

架构说明

HDFS 2.x HA

HDFS High Availability Using the Quorum Journal Manager

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1PAI5dr6-1580907488971)(4.实现hadoop高可用/image-20191030220802031.png)]

搭建说明

虚拟机NN-1NN-2DNZKZKFCJNN
node01***
node02*****
node03***
node04**

搭建步骤

官方文档: https://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

  1. 安装jdk、hadoop,并配置环境变量

  2. 设置ssh免密钥登录,node01、node02 相互免密钥访问。

  3. 通过官方文档说明配置 hdfs-site.xml 文件和 core-site.xml

    Configuration details

    To configure HA NameNodes, you must add several configuration options to your hdfs-site.xml configuration file.

    The order in which you set these configurations is unimportant, but the values you choose for dfs.nameservices and dfs.ha.namenodes.[nameservice ID] will determine the keys of those that follow. Thus, you should decide on these values before setting the rest of the configuration options.

    • dfs.nameservices

    - the logical name for this new nameservice

    Choose a logical name for this nameservice, for example “mycluster”, and use this logical name for the value of this config option. The name you choose is arbitrary. It will be used both for configuration and as the authority component of absolute HDFS paths in the cluster.

    Note: If you are also using HDFS Federation, this configuration setting should also include the list of other nameservices, HA or otherwise, as a comma-separated list.

    <property>
      <name>dfs.nameservices</name>
      <value>mycluster</value>
    </property>
    
    • dfs.ha.namenodes.[nameservice ID]

    - unique identifiers for each NameNode in the nameservice

    Configure with a list of comma-separated NameNode IDs. This will be used by DataNodes to determine all the NameNodes in the cluster. For example, if you used “mycluster” as the nameservice ID previously, and you wanted to use “nn1” and “nn2” as the individual IDs of the NameNodes, you would configure this as such:

    <property>
      <name>dfs.ha.namenodes.mycluster</name>
      <value>nn1,nn2</value>
    </property>
    

    Note: Currently, only a maximum of two NameNodes may be configured per nameservice.

    • dfs.namenode.rpc-address.[nameservice ID].[name node ID]

    - the fully-qualified RPC address for each NameNode to listen on

    For both of the previously-configured NameNode IDs, set the full address and IPC port of the NameNode processs. Note that this results in two separate configuration options. For example:

    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn1</name>
      <value>machine1.example.com:8020</value>
    </property>
    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn2</name>
      <value>machine2.example.com:8020</value>
    </property>
    

    Note: You may similarly configure the “servicerpc-address” setting if you so desire.

    • dfs.namenode.http-address.[nameservice ID].[name node ID]

    - the fully-qualified HTTP address for each NameNode to listen on

    Similarly to rpc-address above, set the addresses for both NameNodes’ HTTP servers to listen on. For example:

    <property>
      <name>dfs.namenode.http-address.mycluster.nn1</name>
      <value>machine1.example.com:50070</value>
    </property>
    <property>
      <name>dfs.namenode.http-address.mycluster.nn2</name>
      <value>machine2.example.com:50070</value>
    </property>
    

    Note: If you have Hadoop’s security features enabled, you should also set the https-address similarly for each NameNode.

    • dfs.namenode.shared.edits.dir

    - the URI which identifies the group of JNs where the NameNodes will write/read edits

    This is where one configures the addresses of the JournalNodes which provide the shared edits storage, written to by the Active nameNode and read by the Standby NameNode to stay up-to-date with all the file system changes the Active NameNode makes. Though you must specify several JournalNode addresses, you should only configure one of these URIs. The URI should be of the form: “qjournal://host1:port1;host2:port2;host3:port3/journalId”. The Journal ID is a unique identifier for this nameservice, which allows a single set of JournalNodes to provide storage for multiple federated namesystems. Though not a requirement, it’s a good idea to reuse the nameservice ID for the journal identifier.

    For example, if the JournalNodes for this cluster were running on the machines “node1.example.com”, “node2.example.com”, and “node3.example.com” and the nameservice ID were “mycluster”, you would use the following as the value for this setting (the default port for the JournalNode is 8485):

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://node1.example.com:8485;node2.example.com:8485;node3.example.com:8485/mycluster</value>
    </property>
    
    • dfs.client.failover.proxy.provider.[nameservice ID]

    - the Java class that HDFS clients use to contact the Active NameNode

    Configure the name of the Java class which will be used by the DFS Client to determine which NameNode is the current Active, and therefore which NameNode is currently serving client requests. The only implementation which currently ships with Hadoop is the ConfiguredFailoverProxyProvider, so use this unless you are using a custom one. For example:

    <property>
      <name>dfs.client.failover.proxy.provider.mycluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    
    • dfs.ha.fencing.methods

    - a list of scripts or Java classes which will be used to fence the Active NameNode during a failover

    It is desirable for correctness of the system that only one NameNode be in the Active state at any given time. Importantly, when using the Quorum Journal Manager, only one NameNode will ever be allowed to write to the JournalNodes, so there is no potential for corrupting the file system metadata from a split-brain scenario. However, when a failover occurs, it is still possible that the previous Active NameNode could serve read requests to clients, which may be out of date until that NameNode shuts down when trying to write to the JournalNodes. For this reason, it is still desirable to configure some fencing methods even when using the Quorum Journal Manager. However, to improve the availability of the system in the event the fencing mechanisms fail, it is advisable to configure a fencing method which is guaranteed to return success as the last fencing method in the list. Note that if you choose to use no actual fencing methods, you still must configure something for this setting, for example “shell(/bin/true)”.

    The fencing methods used during a failover are configured as a carriage-return-separated list, which will be attempted in order until one indicates that fencing has succeeded. There are two methods which ship with Hadoop: shell and sshfence. For information on implementing your own custom fencing method, see the org.apache.hadoop.ha.NodeFencer class.

    • sshfence

      - SSH to the Active NameNode and kill the process

      The sshfence option SSHes to the target node and uses fuser to kill the process listening on the service’s TCP port. In order for this fencing option to work, it must be able to SSH to the target node without providing a passphrase. Thus, one must also configure the dfs.ha.fencing.ssh.private-key-files option, which is a comma-separated list of SSH private key files. For example:

      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
      </property>
      
      <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/exampleuser/.ssh/id_rsa</value>
      </property>
      

      Optionally, one may configure a non-standard username or port to perform the SSH. One may also configure a timeout, in milliseconds, for the SSH, after which this fencing method will be considered to have failed. It may be configured like so:

      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence([[username][:port]])</value>
      </property>
      <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
      </property>
      
    • shell

      - run an arbitrary shell command to fence the Active NameNode

      The shell fencing method runs an arbitrary shell command. It may be configured like so:

      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>shell(/path/to/my/script.sh arg1 arg2 ...)</value>
      </property>
      

      The string between ‘(’ and ‘)’ is passed directly to a bash shell and may not include any closing parentheses.

      The shell command will be run with an environment set up to contain all of the current Hadoop configuration variables, with the ‘_’ character replacing any ‘.’ characters in the configuration keys. The configuration used has already had any namenode-specific configurations promoted to their generic forms – for example dfs_namenode_rpc-address will contain the RPC address of the target node, even though the configuration may specify that variable as dfs.namenode.rpc-address.ns1.nn1.

      Additionally, the following variables referring to the target node to be fenced are also available:

      $target_hosthostname of the node to be fenced
      $target_portIPC port of the node to be fenced
      $target_addressthe above two, combined as host:port
      $target_nameserviceidthe nameservice ID of the NN to be fenced
      $target_namenodeidthe namenode ID of the NN to be fenced

      These environment variables may also be used as substitutions in the shell command itself. For example:

      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>shell(/path/to/my/script.sh --nameservice=$target_nameserviceid $target_host:$target_port)</value>
      </property>
      

      If the shell command returns an exit code of 0, the fencing is determined to be successful. If it returns any other exit code, the fencing was not successful and the next fencing method in the list will be attempted.

      Note: This fencing method does not implement any timeout. If timeouts are necessary, they should be implemented in the shell script itself (eg by forking a subshell to kill its parent in some number of seconds).

    • fs.defaultFS

    - the default path prefix used by the Hadoop FS client when none is given

    Optionally, you may now configure the default path for Hadoop clients to use the new HA-enabled logical URI. If you used “mycluster” as the nameservice ID earlier, this will be the value of the authority portion of all of your HDFS paths. This may be configured like so, in your core-site.xml file:

    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://mycluster</value>
    </property>
    
    • dfs.journalnode.edits.dir

    - the path where the JournalNode daemon will store its local state

    This is the absolute path on the JournalNode machines where the edits and other local state used by the JNs will be stored. You may only use a single path for this configuration. Redundancy for this data is provided by running multiple separate JournalNodes, or by configuring this directory on a locally-attached RAID array. For example:

    <property>
      <name>dfs.journalnode.edits.dir</name>
      <value>/path/to/journal/node/local/data</value>
    </property>
    

    hdfs-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>2</value>
        </property>
        <property>
            <name>dfs.nameservices</name>
            <value>mycluster</value>
        </property>
        <property>
    	<name>dfs.ha.namenodes.mycluster</name>
    	<value>nn1,nn2</value>
        </property>
        <property>
    	<name>dfs.namenode.rpc-address.mycluster.nn1</name>
    	<value>node01:8020</value>
        </property>
        <property>
    	<name>dfs.namenode.rpc-address.mycluster.nn2</name>
    	<value>node02:8020</value>
        </property>
        <property>
    	<name>dfs.namenode.http-address.mycluster.nn1</name>
    	<value>node01:50070</value>
        </property>
        <property>
    	<name>dfs.namenode.http-address.mycluster.nn2</name>
    	<value>node02:50070</value>
        </property>
        <property>
    	<name>dfs.namenode.shared.edits.dir</name>
    	<value>qjournal://node01:8485;node02:8485;node03:8485/mycluster</value>
        </property>
        <property>
    	<name>dfs.client.failover.proxy.provider.mycluster</name>
    	<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <property>
    	<name>dfs.ha.fencing.methods</name>
    	<value>sshfence</value>
        </property>
    
        <property>
    	<name>dfs.ha.fencing.ssh.private-key-files</name>
    	<value>/root/.ssh/id_dsa</value>
        </property>
        <property>
    	<name>dfs.journalnode.edits.dir</name>
    	<value>/var/hadoop/ha/journalnode</value>
    </property>
    </configuration>
    
    

    core-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://mycluster</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/var/hadoop/ha</value>
        </property>
    </configuration>
    
  4. 根据官方提示 要使用Zookeepr ,需要写入hdfs-site.xml和core-site.xml 自动故障转移配置

    Configuring automatic failover

    The configuration of automatic failover requires the addition of two new parameters to your configuration. In your hdfs-site.xml file, add:

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    

    This specifies that the cluster should be set up for automatic failover. In your core-site.xml file, add:

    <property>
      <name>ha.zookeeper.quorum</name>
      <value>zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181</value>
    </property>
    

    This lists the host-port pairs running the ZooKeeper service.

    As with the parameters described earlier in the document, these settings may be configured on a per-nameservice basis by suffixing the configuration key with the nameservice ID. For example, in a cluster with federation enabled, you can explicitly enable automatic failover for only one of the nameservices by setting dfs.ha.automatic-failover.enabled.my-nameservice-id.

    There are also several other configuration parameters which may be set to control the behavior of automatic failover; however, they are not necessary for most installations. Please refer to the configuration key specific documentation for details.

    hdfs-site.xml

     <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>
    
    

    core-site.xml

     <property>
       <name>ha.zookeeper.quorum</name>
       <value>node02:2181,node03:2181,node04:2181</value>
     </property>
    
  5. 设置从节点

    # 设置从节点 把slaves中localhost 换成从节点主机名
    vi /opt/hadoop-2.6.5/etc/hadoop/slaves 
    node02
    node03
    node04
    
  6. 分发core-site.xml 、hdfs-site.xml、slaves到其他虚拟机

  7. 搭建ZooKeeper

    tar xf zookeeper-3.4.6.tar.gz -C /opt/
    
  8. 修改Zookeeper配置文件

    cd /opt/zookeeper-3.4.6/conf
    mv zoo_sample.cfg zoo.cfg
    vi zoo.cfg
    

    修改zoo.cfg配置文件中 dataDir= 文档放置位置,并在最后添加服务器地址与出从通讯端口,选举机制端口

    # The number of milliseconds of each tick
    tickTime=2000
    # The number of ticks that the initial 
    # synchronization phase can take
    initLimit=10
    # The number of ticks that can pass between 
    # sending a request and getting an acknowledgement
    syncLimit=5
    # the directory where the snapshot is stored.
    # do not use /tmp for storage, /tmp here is just 
    # example sakes.
    dataDir=/var/zookeeper
    # the port at which the clients will connect
    clientPort=2181
    # the maximum number of client connections.
    # increase this if you need to handle more clients
    #maxClientCnxns=60
    #
    # Be sure to read the maintenance section of the 
    # administrator guide before turning on autopurge.
    #
    # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
    #
    # The number of snapshots to retain in dataDir
    #autopurge.snapRetainCount=3
    # Purge task interval in hours
    # Set to "0" to disable auto purge feature
    #autopurge.purgeInterval=1
    server.1=node02:2888:3888
    server.2=node03:2888:3888
    server.3=node04:2888:3888
    
    
  9. 把Zookeeper整个文件夹分发到其他Zookeeper节点上

  10. 创建 dataDir所指定的文件位置,再创建myid (Zookeeper 当前节点id 与zoo.cfg 配置对应)

mkdir /var/zookeeper
echo 1 > /var/zookeeper/myid
  1. 配置Zookeeper到 环境变量,并分发到其他Zookeeper节点

  2. 启动Zookeeper zkServer.sh start

  3. 启动 journalnode hadoop-daemon.sh start journalnode

  4. 格式化NN节点 hdfs namenode -format

  5. 启动已格式化的NN主节点(官方文档没提示) hadoop-daemon.sh start namenode

  6. 根据官方提示 格式化后在NN**从节点(这里注意:非格式化NN节点 即node02节点上)**执行同步操作 hdfs namenode -bootstrapStandby

    Deployment details

    After all of the necessary configuration options have been set, you must start the JournalNode daemons on the set of machines where they will run. This can be done by running the command “hadoop-daemon.sh start journalnode” and waiting for the daemon to start on each of the relevant machines.

    Once the JournalNodes have been started, one must initially synchronize the two HA NameNodes’ on-disk metadata.

    • If you are setting up a fresh HDFS cluster, you should first run the format command (hdfs namenode -format) on one of NameNodes.
    • If you have already formatted the NameNode, or are converting a non-HA-enabled cluster to be HA-enabled, you should now copy over the contents of your NameNode metadata directories to the other, unformatted NameNode by running the command “hdfs namenode -bootstrapStandby” on the unformatted NameNode. Running this command will also ensure that the JournalNodes (as configured by dfs.namenode.shared.edits.dir) contain sufficient edits transactions to be able to start both NameNodes.
    • If you are converting a non-HA NameNode to be HA, you should run the command “hdfs -initializeSharedEdits”, which will initialize the JournalNodes with the edits data from the local NameNode edits directories.

    At this point you may start both of your HA NameNodes as you normally would start a NameNode.

    You can visit each of the NameNodes’ web pages separately by browsing to their configured HTTP addresses. You should notice that next to the configured address will be the HA state of the NameNode (either “standby” or “active”.) Whenever an HA NameNode starts, it is initially in the Standby state.

  7. 在NN主节点上 执行注册到Zookeeper 的命令 hdfs zkfc -formatZK

    Initializing HA state in ZooKeeper

    After the configuration keys have been added, the next step is to initialize required state in ZooKeeper. You can do so by running the following command from one of the NameNode hosts.

    $ hdfs zkfc -formatZK
    

    This will create a znode in ZooKeeper inside of which the automatic failover system stores its data.

  8. 通过Zookeeper zkCli.sh 可以查看到注册信息

    WatchedEvent state:SyncConnected type:None path:null
    [zk: localhost:2181(CONNECTED) 0] ls /
    [hadoop-ha, zookeeper]
    [zk: localhost:2181(CONNECTED) 1] ls /hadoop-ha/mycluster
    []
    [zk: localhost:2181(CONNECTED) 2] 
    
    
  9. 在NN主节点(对其他节点免密登录的节点)启动当前集群 start-dfs.sh

    [root@node01 hadoop]# start-dfs.sh
    Starting namenodes on [node01 node02]
    node02: starting namenode, logging to /opt/hadoop/hadoop-2.6.5/logs/hadoop-root-namenode-node02.out
    node01: namenode running as process 1754. Stop it first.
    node03: starting datanode, logging to /opt/hadoop/hadoop-2.6.5/logs/hadoop-root-datanode-node03.out
    node04: starting datanode, logging to /opt/hadoop/hadoop-2.6.5/logs/hadoop-root-datanode-node04.out
    node02: starting datanode, logging to /opt/hadoop/hadoop-2.6.5/logs/hadoop-root-datanode-node02.out
    Starting journal nodes [node01 node02 node03]
    node01: journalnode running as process 1464. Stop it first.
    node03: journalnode running as process 1288. Stop it first.
    node02: journalnode running as process 1565. Stop it first.
    Starting ZK Failover Controllers on NN hosts [node01 node02]
    node01: starting zkfc, logging to /opt/hadoop/hadoop-2.6.5/logs/hadoop-root-zkfc-node01.out
    node02: starting zkfc, logging to /opt/hadoop/hadoop-2.6.5/logs/hadoop-root-zkfc-node02.out
    
    
  10. 启动后 使用jps 检查是否有相关java进程

  11. 使用 hadoop-daemon.sh stop namenode 关闭主节点NN

    也可使用 hadoop-daemon.sh stop zkfc 关闭主节点zkfc 测试是否切换成功

其他命令

  • Zookeeper

    zkServer.sh start 启动

    zkServer.sh stop 关闭

    zkServer.sh status 查看状态

    zkCli.sh 客户端 ls / 查看根目录

  • 启动顺序 ZK>>JN >>HDFS

  • ss -nal 可以查看端口

  • start-dfs.sh 启动hdfs集群

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值