Hadoop HA集群搭建

此次安装选用的是阿里云三台ECS按量付费机器

一,版本

组件名版本备注及下载地址
Centos7.2 64-bitlsb_release -a 命令查看操作系统版本 file /bin/ls 命令查看操作系统位数
jdkjava version “1.8.0_45”java.18都可以
Hadoophadoop-2.6.0-cdh5.15.1下载源码编译好的版本
zookeeperzookeeper-3.4.6.tar.gz

二,主机规划

IPHOST安装软件进程
172.19.252.139hadoop001hadoop zookeeperNameNode DFSZKFailoverController JournalNode DataNode ResourceManager ,JobHistoryServer, NodeManager QuorumPeerMain
172.19.252.140hadoop002hadoop zookeeperNameNode, DFSZKFailoverController, JournalNode, DataNode ResourceManager, NodeManager QuorumPeerMain
172.19.252.141hadoop003hadoop zookeeperJournalNode, DataNode NodeManager QuorumPeerMain

三,环境准备

①查看防火墙,并关闭防火墙(三台机器均要做)

[root@hadoop001 ~]# firewall-cmd --state
not running
[root@hadoop001 ~]# systemctl stop firewalld.service
#禁止开机启动
[root@hadoop001 ~]#systemctl disable firewalld.service 

②IP与hostname绑定

[root@hadoop001 ~]vi /etc/hosts
172.19.252.139 hadoop001
172.19.252.141 hadoop002
172.19.252.140 hadoop003

验证 ping hadoop002

③设置三台机器互相通信

[root@hadoop001 ~]#yum install -y lrzsz
#注意这里添加一个用户是没有密码的
[root@hadoop001 ~]#useradd hadoop 
[root@hadoop001 ~]#su - hadoop
[hadoop@hadoop001 ~]#ssh-keygen
几次回车后就行

[hadoop@hadoop001 ~]#cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop001 ~]#chmod 0600 ~/.ssh/authorized_keys
[hadoop@hadoop001 ~]#sz authorized_keys
在本地将其他两台机器的生成的公钥分别写入到这个文件以后,在将这个文件上传到三台机器的/home/hadoop/.ssh/ 目录下

验证(每台机器上执行下面3条命令,只输入yes,不输入密码,则这3台互相通信了)
ssh hadoop@hadoop001 date
ssh hadoop@hadoop002 date
ssh hadoop@hadoop003 date

④安装jdk

这里就不贴步骤了,注意安装目录尽量放在、/usr/java目录下,这个路径,cdh安装必须

四,安装zookeeper集群

上一篇博客已经介绍过了 zookeeper集群安装

五,安装 Hadoop(NameNode HA+ResourceManager HA)

①解压缩到app目录下

[hadoop@hadoop001 software]# tar -xvf hadoop-2.6.0-cdh5.15.1 -C /home/hadoop/app/
[hadoop@hadoop001 software]# ln -s hadoop-2.6.0-cdh5.15.1/  hadoop

②配置环境变量

export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.15.1/
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

③修改$HADOOP_HOME/etc/opt/software/hadoop-env.sh

export JAVA_HOME="/usr/java/jdk1.8.0_45"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"

④修改$HADOOP_HOME/etc/hadoop/core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<!--Yarn 需要使用 fs.defaultFS 指定NameNode URI -->
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://ruozeclusterg7</value>
        </property>
        <!--==============================Trash机制======================================= -->
        <property>
                <!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 -->
                <name>fs.trash.checkpoint.interval</name>
                <value>0</value>
        </property>
        <property>
                <!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 -->
                <name>fs.trash.interval</name>
                <value>1440</value>
        </property>

         <!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 -->
        <property>   
                <name>hadoop.tmp.dir</name>
                <value>/home/hadoop/tmp/hadoop</value>
        </property>

         <!-- 指定zookeeper地址 -->
        <property>
                <name>ha.zookeeper.quorum</name>
                <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
        </property>
         <!--指定ZooKeeper超时间隔,单位毫秒 -->
        <property>
                <name>ha.zookeeper.session-timeout.ms</name>
                <value>2000</value>
        </property>

        <property>
           <name>hadoop.proxyuser.hadoop.hosts</name>
           <value>*</value> 
        </property> 
        <property> 
            <name>hadoop.proxyuser.hadoop.groups</name> 
            <value>*</value> 
       </property> 
      <property>
		  <name>io.compression.codecs</name>
		  <value>org.apache.hadoop.io.compress.GzipCodec,
			org.apache.hadoop.io.compress.DefaultCodec,
			org.apache.hadoop.io.compress.BZip2Codec,
			org.apache.hadoop.io.compress.SnappyCodec
		  </value>
      </property>
</configuration>

⑤修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<!--HDFS超级用户 -->
	<property>
		<name>dfs.permissions.superusergroup</name>
		<value>hadoop</value>
	</property>

	<!--开启web hdfs -->
	<property>
		<name>dfs.webhdfs.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/home/hadoop/data/dfs/name</value>
		<description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
	</property>
	<property>
		<name>dfs.namenode.edits.dir</name>
		<value>${dfs.namenode.name.dir}</value>
		<description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/home/hadoop/data/dfs/data</value>
		<description>datanode存放block本地目录(需要修改)</description>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	<!-- 块大小128M (默认128M) -->
	<property>
		<name>dfs.blocksize</name>
		<value>134217728</value>
	</property>
	<!--======================================================================= -->
	<!--HDFS高可用配置 -->
	<!--指定hdfs的nameservice为ruozeclusterg7,需要和core-site.xml中的保持一致 -->
	<property>
		<name>dfs.nameservices</name>
		<value>ruozeclusterg7</value>
	</property>
	<property>
		<!--设置NameNode IDs 此版本最大只支持两个NameNode -->
		<name>dfs.ha.namenodes.ruozeclusterg7</name>
		<value>nn1,nn2</value>
	</property>

	<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
	<property>
		<name>dfs.namenode.rpc-address.ruozeclusterg7.nn1</name>
		<value>hadoop001:8020</value>
	</property>
	<property>
		<name>dfs.namenode.rpc-address.ruozeclusterg7.nn2</name>
		<value>hadoop002:8020</value>
	</property>

	<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
	<property>
		<name>dfs.namenode.http-address.ruozeclusterg7.nn1</name>
		<value>hadoop001:50070</value>
	</property>
	<property>
		<name>dfs.namenode.http-address.ruozeclusterg7.nn2</name>
		<value>hadoop002:50070</value>
	</property>

	<!--==================Namenode editlog同步 ============================================ -->
	<!--保证数据恢复 -->
	<property>
		<name>dfs.journalnode.http-address</name>
		<value>0.0.0.0:8480</value>
	</property>
	<property>
		<name>dfs.journalnode.rpc-address</name>
		<value>0.0.0.0:8485</value>
	</property>
	<property>
		<!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
		<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
		<name>dfs.namenode.shared.edits.dir</name>
		<value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/ruozeclusterg7</value>
	</property>

	<property>
		<!--JournalNode存放数据地址 -->
		<name>dfs.journalnode.edits.dir</name>
		<value>/home/hadoop/data/dfs/jn</value>
	</property>
	<!--==================DataNode editlog同步 ============================================ -->
	<property>
		<!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
                             <!-- 配置失败自动切换实现方式 -->
		<name>dfs.client.failover.proxy.provider.ruozeclusterg7</name>
		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
	</property>
	<!--==================Namenode fencing:=============================================== -->
	<!--Failover后防止停掉的Namenode启动,造成两个服务 -->
	<property>
		<name>dfs.ha.fencing.methods</name>
		<value>sshfence</value>
	</property>
	<property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>
		<value>/home/hadoop/.ssh/id_rsa</value>
	</property>
	<property>
		<!--多少milliseconds 认为fencing失败 -->
		<name>dfs.ha.fencing.ssh.connect-timeout</name>
		<value>30000</value>
	</property>

	<!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
	<!--开启基于Zookeeper  -->
	<property>
		<name>dfs.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>
	<!--动态许可datanode连接namenode列表 -->
	 <property>
	   <name>dfs.hosts</name>
	   <value>/home/hadoop/app/hadoop/etc/hadoop/slaves</value>
	 </property>
</configuration>

⑥修改$HADOOP_HOME/etc/hadoop/yarn-site.sh

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<!-- nodemanager 配置 ================================================= -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
	<property>
		<name>yarn.nodemanager.localizer.address</name>
		<value>0.0.0.0:23344</value>
		<description>Address where the localizer IPC is.</description>
	</property>
	<property>
		<name>yarn.nodemanager.webapp.address</name>
		<value>0.0.0.0:23999</value>
		<description>NM Webapp address.</description>
	</property>

	<!-- HA 配置 =============================================================== -->
	<!-- Resource Manager Configs -->
	<property>
		<name>yarn.resourcemanager.connect.retry-interval.ms</name>
		<value>2000</value>
	</property>
	<property>
		<name>yarn.resourcemanager.ha.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>
	<!-- 使嵌入式自动故障转移。HA环境启动,与 ZKRMStateStore 配合 处理fencing -->
	<property>
		<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
		<value>true</value>
	</property>
	<!-- 集群名称,确保HA选举时对应的集群 -->
	<property>
		<name>yarn.resourcemanager.cluster-id</name>
		<value>yarn-cluster</value>
	</property>
	<property>
		<name>yarn.resourcemanager.ha.rm-ids</name>
		<value>rm1,rm2</value>
	</property>

    <!--这里RM主备结点需要单独指定,(可选)
	<property>
		 <name>yarn.resourcemanager.ha.id</name>
		 <value>rm2</value>
	 </property>
	 -->
	<property>
		<name>yarn.resourcemanager.scheduler.class</name>
		<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
	</property>
	<property>
		<name>yarn.resourcemanager.recovery.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
		<value>5000</value>
	</property>
	<!-- ZKRMStateStore 配置 -->
	<property>
		<name>yarn.resourcemanager.store.class</name>
		<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
	</property>
	<property>
		<name>yarn.resourcemanager.zk-address</name>
		<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
	</property>
	<property>
		<name>yarn.resourcemanager.zk.state-store.address</name>
		<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
	</property>
	<!-- Client访问RM的RPC地址 (applications manager interface) -->
	<property>
		<name>yarn.resourcemanager.address.rm1</name>
		<value>hadoop001:23140</value>
	</property>
	<property>
		<name>yarn.resourcemanager.address.rm2</name>
		<value>hadoop002:23140</value>
	</property>
	<!-- AM访问RM的RPC地址(scheduler interface) -->
	<property>
		<name>yarn.resourcemanager.scheduler.address.rm1</name>
		<value>hadoop001:23130</value>
	</property>
	<property>
		<name>yarn.resourcemanager.scheduler.address.rm2</name>
		<value>hadoop002:23130</value>
	</property>
	<!-- RM admin interface -->
	<property>
		<name>yarn.resourcemanager.admin.address.rm1</name>
		<value>hadoop001:23141</value>
	</property>
	<property>
		<name>yarn.resourcemanager.admin.address.rm2</name>
		<value>hadoop002:23141</value>
	</property>
	<!--NM访问RM的RPC端口 -->
	<property>
		<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
		<value>hadoop001:23125</value>
	</property>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
		<value>hadoop002:23125</value>
	</property>
	<!-- RM web application 地址 -->
	<property>
		<name>yarn.resourcemanager.webapp.address.rm1</name>
		<value>hadoop001:8088</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address.rm2</name>
		<value>hadoop002:8088</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.https.address.rm1</name>
		<value>hadoop001:23189</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.https.address.rm2</name>
		<value>hadoop002:23189</value>
	</property>

	<property>
	   <name>yarn.log-aggregation-enable</name>
	   <value>true</value>
	</property>
	<property>
		 <name>yarn.log.server.url</name>
		 <value>http://hadoop001:19888/jobhistory/logs</value>
	</property>
	<property>
		<name>yarn.nodemanager.resource.memory-mb</name>
		<value>2048</value>
	</property>
	<property>
		<name>yarn.scheduler.minimum-allocation-mb</name>
		<value>1024</value>
		<discription>单个任务可申请最少内存,默认1024MB</discription>
	 </property>

  
  <property>
	<name>yarn.scheduler.maximum-allocation-mb</name>
	<value>2048</value>
	<discription>单个任务可申请最大内存,默认8192MB</discription>
  </property>

   <property>
       <name>yarn.nodemanager.resource.cpu-vcores</name>
       <value>2</value>
    </property>

</configuration>

⑦修改$HADOOP_HOEM/etc/hadoop/mapred-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<!-- 配置 MapReduce Applications -->
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	<!-- JobHistory Server ============================================================== -->
	<!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
	<property>
		<name>mapreduce.jobhistory.address</name>
		<value>hadoop001:10020</value>
	</property>
	<!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>hadoop001:19888</value>
	</property>

<!-- 配置 Map段输出的压缩,snappy-->
  <property>
      <name>mapreduce.map.output.compress</name> 
      <value>true</value>
  </property>
              
  <property>
      <name>mapreduce.map.output.compress.codec</name> 
      <value>org.apache.hadoop.io.compress.SnappyCodec</value>
   </property>
</configuration>

⑧修改$HADOOP_HOME/etc/hadoop/yarn-env.sh

#Yarn Daemon Options
#export YARN_RESOURCEMANAGER_OPTS
#export YARN_NODEMANAGER_OPTS
#export YARN_PROXYSERVER_OPTS
#export HADOOP_JOB_HISTORYSERVER_OPTS
#Yarn Logs
export YARN_LOG_DIR="/opt/software/hadoop/logs"

⑨修改slaves

[hadoop@hadoop001 hadoop]# vi slaves
hadoop001
hadoop002
hadoop003

⑩其他


 1. 将修改好的这些配置文件同时拷贝到其余两台机器上
 2. 创建临时文件夹和分发文件夹```
[hadoop@hadoop001 hadoop]# mkdir -p /home/hadoop/tmp
[hadoop@hadoop001 hadoop]# chmod -R 777 /home/hadoop/tmp
[hadoop@hadoop001 hadoop]# chown -R hadoop:hadoop/home/hadoop/tmp
[hadoop@hadoop001 hadoop]# scp -r hadoop hadoop@hadoop002:/home/hadoop/app/
[hadoop@hadoop001 hadoop]# scp -r hadoop hadoop@hadoop003:/home/hadoop/app/

六,启动集群

1,启动zookeeper

command: zkServer.sh start|stop|status

2.启动 hadoop(HDFS+YARN)
a.格式化前,先在 journalnode 节点机器上先启动 JournalNode 进程

hadoop-daemon.sh start journalnode
[hadoop@hadoop001 sbin]# jps
4016 Jps
3683 QuorumPeerMain
3981 JournalNode

b.NameNode 格式化

[hadoop@hadoop001 hadoop]# hadoop namenode -format

在这里插入图片描述
c.同步 NameNode 元数据
同步 hadoop001 元数据到 hadoop002
主要是:dfs.namenode.name.dir,dfs.namenode.edits.dir 还应该确保共享存储目录下
(dfs.namenode.shared.edits.dir ) 包含 NameNode 所有的元数据。

[hadoop@hadoop001 hadoop]# scp -r data/dfs/name/ root@hadoop002:/home/hadoop/data/dfs/name/

d.初始化 ZFCK
[hadoop@hadoop001 bin]# hdfs zkfc -formatZK
在这里插入图片描述

e.启动 HDFS 分布式存储系统
[hadoop@hadoop001 sbin]# start-dfs.sh

f.验证 namenode,datanode,zkfc

[hadoop@hadoop001 data]$ jps
7696 Jps
4784 JournalNode
4951 NameNode
5480 ResourceManager
2265 QuorumPeerMain
5083 DataNode
5371 DFSZKFailoverController
5580 NodeManager

[hadoop@hadoop002 ~]$ jps
4241 DFSZKFailoverController
5203 Jps
3907 JournalNode
4005 NameNode
4502 ResourceManager
4359 NodeManager
4108 DataNode
2253 QuorumPeerMain

[hadoop@hadoop003 ~]$ jps
29264 JournalNode
29360 DataNode
29496 NodeManager
29848 Jps
2251 QuorumPeerMain

从进程看看出与预期主机规划一致,
hadoop001,hadoop02上有两个namenode

页面
在这里插入图片描述
g.启动 YARN 框架
[hadoop@hadoop001 hadoop]# start-yarn.sh

hadoop002 备机启动 RM
[hadoop@hadoop002 ~]# yarn-daemon.sh start resourcemanager

####单进程启动###########

  1. ResourceManager(hadoop001, hadoop002)
    yarn-daemon.sh start resourcemanager
  2. NodeManager(hadoop001, hadoop002, hadoop003)
    yarn-daemon.sh start nodemanager

同样jps验证,hadoop001,hadoop002两个RM进程

ResourceManger(Active):http://XXX:8088
ResourceManger(Standby):http://XXX:8088/cluster/cluster
JobHistory:http://XXX:19888/jobhistory

在这里插入图片描述

七,关闭集群:

1.关闭 Hadoop(YARN–>HDFS)
[hadoop@hadoop001 sbin]# stop-yarn.sh
[hadoop@hadoop002 sbin]# yarn-daemon.sh stop resourcemanager
[hadoop@hadoop001 sbin]# stop-dfs.sh

2.关闭 Zookeeper
[hadoop@hadoop001 bin]# zkServer.sh stop
[hadoop@hadoop002 bin]# zkServer.sh stop
[hadoop@hadoop003 bin]# zkServer.sh stop

八,监控集群

[hadoop@hadoop001 ~]# hdfs dfsadmin -report

九,常见问题

a,
The authenticity of host ‘hadoop001 (172.19.252.139)’ can’t be established.
hadoop001: Host key verification failed.
解决办法:

找到/etc/ssh/ssh_config
在最后添加:
StrictHostKeyChecking no
UserKnownHostsFile /dev/null

或者:ssh  -o StrictHostKeyChecking=no  192.168.0.xxx

内网中非常信任的服务器之间的ssh连接

b,明明有slave文件却报找不到
在这里插入图片描述
解决办法

yum install -y dos2unix
dos2unix slave

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

冬瓜螺旋雪碧

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值