Hadoop High Availability的介绍

Hadoop High Availability

单点问题: 集群中含有NN和SNN,由于SNN只负责辅助NN进行FSimage和日志的合并,不能作为热备份。当NN宕机时集群将无法对外提供服务,客户找不到数据。
SNN有备份的日志和镜像,但是在恢复过程中需要时间,这个阶段集群依然无法对外提供服务。
在这里插入图片描述
高可用方案中NN有两个,一个是Active状态,表示正在对外提供服务的,活跃的NN。
StandBy状态,不是对外提供服务的,随时准备替换ActiveNN.

ZKFC:监控NN所在节点的硬件设备、软件(NN)、操作系统。同时维护与ZK的通信。HA方案中只有两个NN,每个NN都有一个ZKFC.

NN Active 状态和StandBy状态的确认
两个NN到ZK集群注册一个临时的ZNode,哪个先注册成功,哪个就是Active,另外一个就是StandBy.

当ActiveNN节点故障 ActiveZKFC通知ZK删除临时ZNode StandBy状态的ZKFC订阅这个临时ZNode的变换,若ZNode消失,立刻同时StandBy NN StandBy NN 远程登录NN,执行Kill -9 ActiveNN StandBy NN通知StandBy ZKFC去ZK上注册临时ZNode。

两个元数据信息如何实现快速同步?

==JournalNode(JN) :==高效的存储系统,能够实现数据的超快速写入与读取。JN是一个小集群(小的文件系统)。节点数需要是奇数台(1、3、5)
ActiveNN实时将FSimage和日志接入JN。StandByNN实时获取JN内部的数据,实现两个节点的实时元数据同步。
ResourceManager也存在单点问题
解决方案ResourceManager HAS
在这里插入图片描述
两个ResourceManager,一个Active 一个StandBy。
两个ResourceManager切换是在ZK注册临时节点。

这两个都需要依赖zk,zk宕机了不就都完了??
答案:是的。 解决方案:ZK部署成独立的集群。多部署节点。
HA集群部署
集群部署节点角色的规划(3节点)
------------------
node01 namenode resourcemanager zkfc nodemanager datanode zookeeper journal node
node02 namenode resourcemanager zkfc nodemanager datanode zookeeper journal node
node03 datanode nodemanager zookeeper journal node
--------------------------------------------------------------
安装步骤:
1.安装配置zooekeeper集群
1.1解压
tar -zxvf zookeeper-3.4.5.tar.gz -C /home/hadoop/app/
1.2修改配置
cd /home/hadoop/app/zookeeper-3.4.5/conf/
cp zoo_sample.cfg zoo.cfg

**vim zoo.cfg**
修改:dataDir=/home/hadoop/app/zookeeper-3.4.5/tmp
在最后添加:
server.1=hadoop05:2888:3888
server.2=hadoop06:2888:3888
server.3=hadoop07:2888:3888
保存退出
然后创建一个tmp文件夹
mkdir /home/hadoop/app/zookeeper-3.4.5/tmp
echo 1 > /home/hadoop/app/zookeeper-3.4.5/tmp/myid
1.3将配置好的zookeeper拷贝到其他节点(首先分别在hadoop06、hadoop07根目录下创建一个hadoop目录:mkdir /hadoop)
scp -r /home/hadoop/app/zookeeper-3.4.5/ hadoop06:/home/hadoop/app/ scp -r /home/hadoop/app/zookeeper-3.4.5/ hadoop07:/home/hadoop/app/
注意:修改hadoop06、hadoop07对应/hadoop/zookeeper-3.4.5/tmp/myid内容
hadoop06:
echo 2 > /home/hadoop/app/zookeeper-3.4.5/tmp/myid
hadoop07:
echo 3 > /home/hadoop/app/zookeeper-3.4.5/tmp/myid
2.安装配置hadoop集群
2.1解压
tar -zxvf hadoop-2.6.4.tar.gz -C /home/hadoop/app/
2.2配置HDFS(hadoop2.0所有的配置文件都在KaTeX parse error: Expected 'EOF', got '#' at position 31: …/hadoop目录下) #̲将hadoop添加到环境变量中…PATH: J A V A H O M E / c l u s t e r 1 n : JAVA_HOME/cluster1n: JAVAHOME/cluster1n:HADOOP_HOME/cluster1n

		#hadoop2.0的配置文件全部在$HADOOP_HOME/etc/hadoop下
		cd /home/hadoop/app/hadoop-2.6.4/etc/hadoop
		
		2.2.1修改hadoop-env.sh
		export JAVA_HOME=/export/servers/jdk1.8.0_65

#######################################
2.2.2修改core-site.xml

<configuration>
<!-- 集群名称在这里指定!该值来自于hdfs-site.xml中的配置 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster1</value>
</property>
<!-- 这里的路径默认是NameNode、DataNode、JournalNode等存放数据的公共目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/export/servers/hadoop-2.6.0-cdh5.14.0/HAhadoopDatas/tmp</value>
</property>

<!-- ZooKeeper集群的地址和端口。注意,数量一定是奇数,且不少于三个节点-->
<property>
<name>ha.zookeeper.quorum</name>
<value>node01:2181,node02:2181,node03:2181</value>
</property>
</configuration>

#######################################
2.2.3修改hdfs-site.xml

<configuration>
<!--指定hdfs的nameservice为cluster1,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>cluster1</value>
</property>
<!-- cluster1下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.cluster1</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.cluster1.nn1</name>
<value>node01:8020</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.cluster1.nn1</name>
<value>node01:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.cluster1.nn2</name>
<value>node02:8020</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.cluster1.nn2</name>
<value>node02:50070</value>
</property>
<!-- 指定NameNode的edits元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node01:8485;node02:8485;node03:8485/cluster1</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/export/servers/hadoop-2.6.0-cdh5.14.0/journaldata</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 指定该集群出故障时,哪个实现类负责执行故障切换 -->
<property>
<name>dfs.client.failover.proxy.provider.cluster1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>

#######################################
2.2.4修改mapred-site.xml

<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>	

#######################################
2.2.5修改yarn-site.xml

<configuration>
<!-- 开启RM高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node02</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node01:2181,node02:2181,node03:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

#######################################
2.2.6修改slaves
node01
node02
node03

将软件拷贝到所有节点
scp -r hadoop-2.6.0-cdh5.14.0 node02:/$PWD
scp -r hadoop-2.6.0-cdh5.14.0 node03:/$PWD
2.2.7配置免密码登陆
#首先要配置node01到node01、node02、node03 的免密码登陆
#在node01上生产一对钥匙
ssh-keygen
#将公钥拷贝到其他节点,包括自己
ssh-coyp-id node01
ssh-coyp-id node02
ssh-coyp-id node03
#注意:两个namenode之间要配置ssh免密码登陆 ssh远程补刀时候需要
#在node02上生产一对钥匙
ssh-keygen
#将公钥拷贝到node01
ssh-coyp-id node01
2.5启动zookeeper集群(分别在node01、node02、node03上启动zk
bin/zkServer.sh start
#查看状态:一个leader,两个follower
bin/zkServer.sh status
2.6手动启动journalnode(分别在在node01、node02、node03上执行)
hadoop-daemon.sh start journalnode
#运行jps命令检验,node01、node02、node03上多了JournalNode进程
2.7格式化namenode
#在node01上执行命令:
hdfs namenode -format
#格式化后会在根据core-site.xml中的hadoop.tmp.dir配置的目录下生成个hdfs初始化文件,
把hadoop.tmp.dir配置的目录下所有文件拷贝到另一台namenode节点所在的机器
scp -r tmp/ node02:/home/hadoop/app/hadoop-2.6.4/
##也可以这样,建议hdfs namenode -bootstrapStandby
2.8格式化ZKFC(在active上执行即可)
hdfs zkfc -formatZK
2.9启动HDFS(在node01上执行)
start-dfs.sh
2.10启动YARN
start-yarn.sh
还需要手动在standby上手动启动备份的 resourcemanager
yarn-daemon.sh start resourcemanager
到此,hadoop-2.6.4配置完毕,可以统计浏览器访问:
http://node01:50070
NameNode 'node01:8020' (active)
http://node02:50070
NameNode 'node02:8020' (standby)
验证HDFS HA
首先向hdfs上传一个文件
hadoop fs -put /etc/profile /profile
hadoop fs -ls /
然后再kill掉active的NameNode
kill -9
通过浏览器访问:http://node02:50070
NameNode ‘node02:8020’ (active)
这个时候node02上的NameNode变成了active
在执行命令:
hadoop fs -ls /
-rw-r–r-- 3 root supergroup 1926 2014-02-06 15:36 /profile
刚才上传的文件依然存在!!!
手动启动那个挂掉的NameNode
hadoop-daemon.sh start namenode
通过浏览器访问:http://node01:50070
NameNode ‘node01:8020’ (standby)
验证YARN:
运行一下hadoop提供的demo中的WordCount程序:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /profile /out
测试集群工作状态的一些指令 :
hdfs dfsadmin -report 查看hdfs的各节点状态信息
cluster1n/hdfs haadmin -getServiceState nn1 获取一个namenode节点的HA状态
scluster1n/hadoop-daemon.sh start namenode 单独启动一个namenode进程
./hadoop-daemon.sh start zkfc 单独启动一个zkfc进程

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Title: CentOS High Availability Author: Mitja Resman Length: 156 pages Edition: 1 Language: English Publisher: Packt Publishing Publication Date: 2015-04-30 ISBN-10: 1785282484 ISBN-13: 9781785282485 Leverage the power of high availability clusters on CentOS Linux, the enterprise-class, open source operating system About This Book Install, configure, and manage a multi-node cluster running on CentOS Linux Manage your cluster resources and learn how to start, stop, and migrate resources from one host to another Designed as a step-by-step guide, this book will help you become a master of cluster nodes, cluster resources, and cluster services on CentOS 6 and CentOS 7 Who This Book Is For This book is targeted at system engineers and system administrators who want to upgrade their knowledge and skills in high availability and want to learn practically how to achieve high availability with CentOS Linux. You are expected to have good CentOS Linux knowledge and basic networking experience. In Detail The high performance and stability of CentOS Linux are the key factors that make CentOS Linux the right Linux distribution to build high availability solutions on. This book introduces you to high availability before briefly walking you through the cluster stack and its layers. The book is then divided into two parts, part A and part B, based on CentOS versions 6 and 7 respectively. Each part begins with the installation and configuration of the Corosync cluster messaging software with CMAN or with the Pacemaker cluster resource management software. You will also be introduced to cluster service configuration and cluster service management before you configure fencing or STONITH on CentOS 6 and CentOS 7. By the end of this book, you will have the skills required to independently design, implement, and maintain a CentOS high availability multinode cluster environment. Table of Contents Chapter 1: Getting Started with High Availability Chapter 2: Meet the Cluster Stack on CentOS Chapter 3: Cluster Stack Software on CentOS 6 Chapter 4: Resource Manager on CentOS 6 Chapter 5: Playing with Cluster Nodes on CentOS 6 Chapter 6: Fencing on CentOS 6 Chapter 7: Testing Failover on CentOS 6 Chapter 8: Two-node Cluster Considerations in CentOS 6 Chapter 9: Cluster Stack Software on CentOS 7 Chapter 10: Resource Manager on CentOS 7 Chapter 11: Playing with Cluster Nodes on CentOS 7 Chapter 12: STONITH on CentOS 7 Chapter 13: Testing Failover on CentOS 7 Chapter 14: Two-node Cluster Considerations on CentOS 7

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值