一、环境配置
1、环境介绍
操作系统:Centos 7.9
jdk版本:8u291
hadoop版本:2.10.1
2、资源划分
--------------hadoop节点---------------
nn01、nn02:
CPU:4核心
内存:8GB
数据盘:400G
jn01、jn02、jn03:
CPU:2核心
内存:2GB
rm01、rm02:
CPU:8核心
内存:8GB
dn01–dn05:
CPU:16核心
内存:16GB
数据盘:2T
--------------zookeeper节点---------------
zk01、zk02、zk03:
CPU:4核心
内存:8GB
数据盘:300GB
3、主机名和IP地址设置
注意:编写/etc/hosts和/etc/hostname
#zookeeper node
10.99.27.11 zk01.wtown.com
10.99.27.12 zk02.wtown.com
10.99.27.13 zk03.wtown.com
#namenode
10.99.27.21 nn01.wtown.com
10.99.27.22 nn02.wtown.com
#journalnode
10.99.27.31 jn01.wtown.com
10.99.27.32 jn02.wtown.com
10.99.27.33 jn03.wtown.com
#resourcemanager
10.99.27.41 rm01.wtown.com
10.99.27.42 rm02.wtown.com
#datanode
10.99.27.51 dn01.wtown.com
10.99.27.52 dn02.wtown.com
10.99.27.53 dn03.wtown.com
10.99.27.54 dn04.wtown.com
10.99.27.55 dn05.wtown.com
4、挂载数据盘到/data
注意:所有hadoop集群节点建立/data文件夹
https://blog.csdn.net/zyj81092211/article/details/118054000
5、安装JDK
https://blog.csdn.net/zyj81092211/article/details/118055068
6、安装zookeeper集群
https://blog.csdn.net/zyj81092211/article/details/118066724
7、添加hadoop用户并添加sudo权限(hadoop所有节点)
useradd hadoop
echo hadoop|passwd --stdin hadoop
visudo
添加如下
hadoop ALL=(ALL) NOPASSWD:ALL
8、配置ssh免认证(所有节点)
root用户:
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub root@nn01.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub root@nn02.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub root@jn01.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub root@jn02.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub root@jn03.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub root@rm01.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub root@rm02.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub root@dn01.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub root@dn02.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub root@dn03.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub root@dn04.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub root@dn05.wtown.com
hadoop用户:
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub hadoop@nn01.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub hadoop@nn02.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub hadoop@jn01.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub hadoop@jn02.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub hadoop@jn03.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub hadoop@rm01.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub hadoop@rm02.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub hadoop@dn01.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub hadoop@dn02.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub hadoop@dn03.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub hadoop@dn04.wtown.com
ssh-copy-id -i .ssh/id_rsa.pub hadoop@dn05.wtown.com
9、更改hadoop配置文件
注意:hadoop软件包配置文件目录hadoop\etc\hadoop
(1)修改hadoop-env.sh
export JAVA_HOME=/usr/local/java
(2)修改core-site.xml
配置configuration标签下添加如下内容
<configuration>
<!-- 指定hdfs的nameservice为ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/data</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>zk01.wtown.com:2181,zk02.wtown.com:2181,zk03.wtown.com:2181</value>
</property>
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>3000</value>
</property>
</configuration>
(3)修改hdfs-site.xml
配置configuration标签下添加如下内容
<configuration>
<!--指定hdfs的nameservice为ns1,与core-site.xml中的一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!-- ns1下面有两个NameNode,分别是nn01,nn02 ,多个NameNode用逗号隔开-->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn01,nn02</value>
</property>
<!-- nn01的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn01</name>
<value>nn01.wtown.com:8020</value>
</property>
<!-- nn01的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn01</name>
<value>nn01.wtown.com:50070</value>
</property>
<!-- nn02的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn02</name>
<value>nn02.wtown.com:8020</value>
</property>
<!-- nn02的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn02</name>
<value>nn02.wtown.com:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://jn01.wtown.com:8485;jn02.wtown.com:8485;jn03.wtown.com:8485/ns1</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/hadoop/journaldata</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行隔开-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆,使用hadoop用户-->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!--指定namenode名称空间的存储地址 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/name</value>
</property>
<!--指定datanode数据存储地址 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/hdfs/data</value>
</property>
<!--指定数据副本数 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
(4)修改mapred-site.xml
配置configuration标签下添加如下内容
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 配置 JobHistory Server 地址 ,默认端口10020 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<!-- 配置 JobHistory Server web ui 地址, 默认端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
</configuration>
(5)修改yarn-site.xml(配置yarn resourcemanager HA)
配置configuration标签下添加如下内容
<configuration>
<!-- 开启RM高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm01,rm02</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm01</name>
<value>rm01.wtown.com</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm02</name>
<value>rm02.wtown.com</value>
</property>
<!-- 分别指定web访问地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm01</name>
<value>rm01.wtown.com:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm02</name>
<value>rm02.wtown.com:8088</value>
</property>
<!-- 开启重启作业自动恢复 -->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!-- 将状态信息保存到zookeeper -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!-- ZooKeeper的领导选举时,用于存储领导信息的基本 znode 路径 -->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
<description>Optionalsetting.Thedefaultvalueis/yarn-leader-election</description>
</property>
<!-- 指定zookeeper集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>zk01.wtown.com:2181,zk02.wtown.com:2181,zk03.wtown.com:2181</value>
</property>
<!-- NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MapReduce程序 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
(6)修改slaves文件
注意:slaves文件,记录子节点,hdfs对应得子节点是datanode;yarn的resourcemanager对应的节点是nodemanager
添加如下:
dn01.wtown.com
dn02.wtown.com
dn03.wtown.com
dn04.wtown.com
dn05.wtown.com
10、创建配置文件中出现的对应文件夹
hadoop临时文件夹:hadoop/data
HDFS文件夹:hadoop/hdfs/name和hadoop/hdfs/data
journalnode数据文件:hadoop/journaldata
------------------------至此hadoop软件包准备完成------------------------
11、上传hadoop软件包到所有hadoop节点的/data目录下,并更改拥有者和拥有组为hadoop(所有hadoop节点)
chown -R hadoop.hadoop /data/
12、创建hadoop软连接到/usr/local/hadoop(所有hadoop节点)(注意:习惯,可以不做)
ln -s /data/hadoop /usr/local/hadoop
13、设置hadoop环境变量(所有hadoop节点)
vi /etc/profile
添加如下内容:
# hadoop environment
export HADOOP_HOME=/data/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
重新加载profile
source /etc/profile
14、添加hadoop命令执行权限(所有hadoop节点)
chmod +x /data/hadoop/bin/*
chmod +x /data/hadoop/sbin/*
二、创建集群
1、集群初始化
(1)、启动zookeeper集群(zookeeper节点)
zkServer.sh start
状态:正常
(2)用hadoop用户分别在jn01、jn02、jn03上启动启动journalnode(journalnode节点)
hadoop-daemon.sh start journalnode
正常状态:
(3)使用hadoop用户,格式化HDFS,只在在nn01上进行(namenode nn01节点)
hdfs namenode -format
(4)使用hadoop用户,格式化zkfc(namenode节点 nn01)
hdfs zkfc -formatZK
(5)使用hadoop用户,启动hdfs(namenode节点 nn01)
start-dfs.sh
namenode nn01状态:
datanode节点状态:
(6)使用hadoop用户,复制并启动HDFS备用节点(namemode nn02节点)
hdfs namenode -bootstrapStandby
hadoop-daemon.sh start namenode
namemode nn02状态:
(7)使用hadoop用户,启动yarn主节点(resourcemanager rm01节点)
start-yarn.sh
resourcemanager rm01状态:
datanode节点状态:
(8)使用hadoop用户,启动yarn备用节点(resourcemanager rm02节点)
yarn-daemon.sh start resourcemanager
resourcemanager rm02节点状态:
2、所有节点运行状态
HDFS节点:
yarn节点:
3、故障转移测试
(1)HDFS namenode节点故障转移测试
在namenode nn01上杀死namenode进程
kill -9 1714
查看namenode nn02 已经变为active
再次启动namenode nn01节点服务
hadoop-daemon.sh start namenode
查看namenode nn01 已经变为standby
(2)yarn resourcemanager故障切换
在resourcemanager rm01上杀死resourcemanger进程
kill -9 1509
查看resourcemanager rm02已经变为active
再次启动resourcemanager rm01节点服务
yarn-daemon.sh start resourcemanager
查看resourcemanager rm01已经变为standby