CentOS7-ZooKeeper 分布式集群安装部署
CentOS7-ZooKeeper 分布式集群安装部署
文章目录
前言
ZooKeeper 是一个分布式的,开放源码的分布式应用程序协调服务,是 Hadoop和 Hbase的重要组件。
一、安装模式选择?
ZooKeeper 安装模式有三种:
单机模式:ZooKeeper 只运行在一台服务器上,适合测试环境。
伪集群模式:一台物理机上运行多个 ZooKeeper 实例,适合测试环境。
分布式模式:ZooKeeper 运行于一个集群中,适合生产环境。
本次案例选择分布式模式,环境采用虚拟机复现
二、版本选择
个人复现环境时选择 Hadoop 版本时,通常需要考虑以下几个因素:
1.是否为开源软件,即是否免费。
2.是否有稳定版,这个一般软件官方网站会给出说明。
3.是否经实践验证,这个可通过检查是否有一些大公司已经在生产环境中使用可知。
4.是否有强大的社区支持,当出现一个问题时,能够通过社区、论坛等网络资源快速获取解决方法。
目前使用较多Hadoop 版本:
Cloudera 版本(Cloudera’s Distribution Including Apache Hadoop,简称 CDH)
Apache 基金会 Hadoop
Cloudera版 相关文件获取链接 https://archive.cloudera.com/
Apache版 相关文件获取链接 https://zookeeper.apache.org/
三、集群规划
3.1软件规划
hdp1 | apache-zookeeper-3.7.1 |
Hadoop-3.1.3 | |
jdk-8u162-linux-x64 | |
hdp2 | apache-zookeeper-3.7.1 |
Hadoop-3.1.3 | |
jdk-8u162-linux-x64 | |
hdp3 | apache-zookeeper-3.7.1 |
Hadoop-3.1.3 | |
jdk-8u162-linux-x64 |
3.2用户规划
用户组 用户 | |
---|---|
hdp1 | hadoop hadoop |
hdp2 | hadoop hadoop |
hdp3 | hadoop hadoop |
3.3目录规划
路径 | |
---|---|
软件目录 | /home/hadoop/app |
脚本目录 | /home/hadoop/tools |
数据目录 | /home/hadoop/data |
四、NTP同步
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime //更改时区
rpm -qa | grep ntp //检查是否安装
yum install -y ntp //免确认安装
ntpdate pool.ntp.org //同步时间
五、Hosts文件配置
vi /etc/hosts
写入自己规划的主机地址
192.168.34.140 hdp1
192.168.34.141 hdp2
192.168.34.142 hdp3
六、集群SSH节点免密登录
关闭防火墙
[root@hdp1 ~]# systemctl stop firewalld
[root@hdp1 ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
切换hadoop用户进行配置免密
[hadoop@hdp1 ~]$ su root
Password:
[root@hdp1 hadoop]# su hadoop
//生成秘钥(三台实验设备都要配置)
[hadoop@hdp1 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:UmNN+QarkUDP6Tv3gcgyS5U71FrfR2muPd7uh1Ga6MA hadoop@hdp1
The key's randomart image is:
+---[RSA 2048]----+
| .. .. |
| .o +o |
| .B..+ |
| +o+. o o|
| . S+o. . *.|
| =.*Eo..*. |
| + O ooo .+.|
| . + + ...+oo|
| . ...=*|
+----[SHA256]-----+
//生成认证文件(三台实验设备都要配置)
[hadoop@hdp1 ~]$ cd /home/hadoop/.ssh/
[hadoop@hdp1 .ssh]$ ll
total 8
-rw-------. 1 hadoop hadoop 1675 xxx xx xx:xx id_rsa
-rw-r--r--. 1 hadoop hadoop 393 xxx xx xx:xx id_rsa.pub
[hadoop@hdp1 .ssh]$ cp id_rsa.pub authorized_keys
//为.ssh 赋予权限
[hadoop@hdp1 .ssh]$ cd
[hadoop@hdp1 ~]$ chmod 700 .ssh
[hadoop@hdp1 ~]$ chmod 600 .ssh/*
//将hdp2的公钥id_rsa.pub复制到hdp1中的authorized_keys文件中
[hadoop@hdp2 ~]$ cat ~/.ssh/id_rsa.pub | ssh hadoop@cdh01 'cat >> ~/.ssh/authorized_keys'
The authenticity of host 'hdp1 (192.168.34.140)' can't be established.
ECDSA key fingerprint is SHA256:/ssRb/WEz4uGURV3g6Qbbqx6VnaG+/szWU10GK+xxxx.
ECDSA key fingerprint is MD5:c0:49:5e:8c:d6:45:45:92:3f:a7:7a:a1:c8:xx:xx:xx.
Are you sure you want to continue connecting (yes/no)?yes
Warning: Permanently added '192.168.34.140' (ECDSA) to the list of known hosts.
hadoop@hdp1's password:
//将hdp3的公钥id_rsa.pub复制到hdp1中的authorized_keys文件中
[hadoop@hdp3 ~]$ cat ~/.ssh/id_rsa.pub | ssh hadoop@cdh01 'cat >> ~/.ssh/authorized_keys'
The authenticity of host 'hdp1 (192.168.34.140)' can't be established.
ECDSA key fingerprint is SHA256:/ssRb/WEz4uGURV3g6Qbbqx6VnaG+/szWU10GK+xxxx.
ECDSA key fingerprint is MD5:c0:49:5e:8c:d6:45:45:92:3f:a7:7a:a1:c8:xx:xx:xx.
Are you sure you want to continue connecting (yes/no)?yes
Warning: Permanently added '192.168.34.140' (ECDSA) to the list of known hosts.
hadoop@hdp1's password:
//将hdp1中的authorized_keys文件分发到hdp2和hdp3节点中
[hadoop@hdp1 ~]$ scp -r .ssh/authorized_keys hadoop@hdp2:~/.ssh/
[hadoop@hdp1 ~]$ scp -r .ssh/authorized_keys hadoop@hdp3:~/.ssh/
//在hdp1节点测试SSH登录集群其他节点(其他节点同样测试)
[hadoop@hdp1 ~]$ ssh hdp2
Last login: xxx xxx
[hadoop@hdp1 ~]$ exit
logout
Connection to hdp2 closed.
[hadoop@hdp1 ~]$ ssh hdp3
Last login: xxx xxx
[hadoop@hdp1 ~]$ exit
logout
Connection to hdp3 closed.
七、配置文件规划集群节点角色
在hdp1上创建/home/hadoop/tools 脚本存放目录
[hadoop@hdp1 ~]$ mkdir tools
[hadoop@hdp1 ~]$ cd tools
[hadoop@hdp1 tools]$ pwd
/home/hadoop/tools
编写脚本配置文件 deploy.conf(集群角色规划)
[hadoop@hdp1 tools]$ vi deploy.conf
[hadoop@hdp1 tools]$ cat deploy.conf
hdp1,master,all,zk,
hdp2,slave,all,zk,
hdp3,slave,all,zk,
八、集群脚本准备
编写集群分发 shell 脚本 deploy.sh
#!/bin/bash
if [ $# -lt 3 ]
then
echo "Usge:./deploy.sh srcFile(or Dir) descFile(or Dir) MachineTag"
echo "Usge:./deploy.sh srcFile(or Dir) descFile(or Dir) MachineTag confFile"
exit
fi
src=$1
dest=$2
tag=$3
if [ 'b'$4'b' == 'bb' ]
then
confFile=/home/hadoop/tools/deploy.conf
else
confFile=$4
fi
if [ -f $confFile ]
then
if [ -f $src ]
then
for server in `cat $confFile | grep -v '^#' | grep ','$tag',' | awk -F',' '{print $1}'`
do
scp $src $server":"$dest
done
elif [ -d $src ]
then
for server in `cat $confFile | grep -v '^#' | grep ','$tag',' | awk -F',' '{print $1}'`
do
scp -r $src $server":"$dest
done
else
echo "Error:No source file esist"
fi
else
echo "Error:Please assign config file"
fi
脚本添加执行权限
[hadoop@hdp1 tools]$ chmod u+x deploy.sh
编写集群远程执行 shell 脚本runRemoteCmd.sh
#!/bin/bash
if [ $# -lt 2 ]
then
echo "Usage:./runRemoteCmd.sh Command MachineTag"
echo "Usage:./runRemoteCmd.sh Command MachineTag confFile"
exit
fi
cmd=$1
tag=$2
if [ 'b'$3'b' == 'bb' ]
then
confFile=/home/hadoop/tools/deploy.conf
else
confFile=$3
fi
if [ -f $confFile ]
then
for server in `cat $confFile | grep -v '^#' | grep ','$tag',' | awk -F',' '{print $1}'`
do
echo "*****************$server**********************"
ssh $server "source ~/.bashrc; $cmd"
done
else
echo "Error:Please assign config file"
fi
脚本添加执行权限
[hadoop@hdp1 tools]$ chmod u+x runRemoteCmd.sh
针对 hadoop 用户编写环境变量文件.bashrc
在末尾添加
PATH=/home/hadoop/tools:$PATH
export PATH
[hadoop@hdp1 tools]$ cd
[hadoop@hdp1 ~]$ vi .bashrc
[hadoop@hdp1 ~]$ source ~/.bashrc
[hadoop@hdp1 ~]$ cat .bashrc
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=
# User specific aliases and functions
PATH=/home/hadoop/tools:$PATH
export PATH
九、JAVA JDK配置
执行远程脚本创建规划文件夹app data
[hadoop@hdp1 ~]$ runRemoteCmd.sh "mkdir /home/hadoop/app" all
*****************hdp1**********************
*****************hdp2**********************
*****************hdp3**********************
[hadoop@hdp1 ~]$ runRemoteCmd.sh "mkdir /home/hadoop/data" all
*****************hdp1**********************
*****************hdp2**********************
*****************hdp3**********************
[hadoop@hdp1 ~]$ runRemoteCmd.sh "ls" all
*****************hdp1**********************
app
data
tools
*****************hdp2**********************
app
data
*****************hdp3**********************
app
data
JDK配置
下载 JDK可以到官网下载对应的 JDK:
https://www.oracle.com/java/technologies/javase/javase8-archive-downloads.html
本次实验采用jdk-8u162-linux-x64
[hadoop@hdp1 ~]$ cd app
[hadoop@hdp1 app]$ sudo tar -zxvf jdk-8u162-linux-x64.tar.gz
[hadoop@hdp1 app]$ ls
jdk1.8.0_162 jdk-8u162-linux-x64.tar.gz
[hadoop@hdp1 app]$ rm jdk-8u162-linux-x64.tar.gz
配置JAVA环境变量
[hadoop@hdp1 app]$ vi ~/.bashrc
删除先前配置的两个变量并在末尾处添加:
JAVA_HOME=/home/hadoop/app/jdk1.8.0_162
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
PATH=$JAVA_HOME/bin:/home/hadoop/tools:$PATH
export JAVA_HOME CLASSPATH PATH
[hadoop@hdp1 app]$ source ~/.bashrc
[hadoop@hdp1 app]$ java -version
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
其他节点JAVA配置
[hadoop@hdp1 app]$ deploy.sh jdk1.8.0_162 /home/hadoop/app/ slave
[hadoop@hdp2 app] vi ~/.bashrc
末尾处添加:
JAVA_HOME=/home/hadoop/app/jdk1.8.0_162
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME CLASSPATH PATH
[hadoop@hdp3 app] vi ~/.bashrc
末尾处添加:
JAVA_HOME=/home/hadoop/app/jdk1.8.0_162
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME CLASSPATH PATH
通过 source ~/.bashrc 命令使得hdp2,hdp3刚刚配置的环境变量生效
[hadoop@hdp2 app]$ source ~/.bashrc
[hadoop@hdp2 app]$ java -version
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
[hadoop@hdp3 app]$ source ~/.bashrc
[hadoop@hdp3 app]$ java -version
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
十、ZooKeeper配置
本次实验采用apache-zookeeper-3.7.1-bin
开始解压软件包
[hadoop@hdp1 app]$ sudo tar -zxvf apache-zookeeper-3.7.1-bin.tar.gz
[hadoop@hdp1 app]$ ls
apache-zookeeper-3.7.1-bin jdk1.8.0_162
apache-zookeeper-3.7.1-bin.tar.gz jdk-8u162-linux-x64.tar.gz
[hadoop@hdp1 app]$
[hadoop@hdp1 app]$ cd apache-zookeeper-3.7.1-bin
[hadoop@hdp1 apache-zookeeper-3.7.1-bin]$ ls
bin conf docs lib LICENSE.txt NOTICE.txt README.md README_packaging.md
[hadoop@hdp1 apache-zookeeper-3.7.1-bin]$ cd conf
[hadoop@hdp1 conf]$ cp zoo_sample.cfg zoo.cfg
[hadoop@hdp1 conf]$
[hadoop@hdp1 conf]$ vi zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/hadoop/data/zookeeper/zkdata
dataLogDir=/home/hadoop/data/zookeeper/zkdatalog
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hdp1:2888:3888
server.2=hdp2:2888:3888
server.3=hdp3:2888:3888
将ZooKeeper 安装目录整体分发到其他节点
[hadoop@hdp1 conf]$ cd /home/hadoop/app
[hadoop@hdp1 app]$ deploy.sh apache-zookeeper-3.7.1-bin /home/hadoop/app/ slave
创建ZooKeeper配置路径
[hadoop@hdp1 app]$ runRemoteCmd.sh "mkdir -p /home/hadoop/data/zookeeper/zkdata" all
*****************hdp1**********************
*****************hdp2**********************
*****************hdp3**********************
[hadoop@hdp1 app]$ runRemoteCmd.sh "mkdir -p /home/hadoop/data/zookeeper/zkdatalog" all
*****************hdp1**********************
*****************hdp2**********************
*****************hdp3**********************
修改各个节点服务编号
!!!这里编辑内容 1,2,3 时不能有任何空格或换行符!!!
[hadoop@hdp1 app]$ cd /home/hadoop/data/zookeeper/zkdata
[hadoop@hdp1 zkdata]$ vi myid
[hadoop@hdp1 zkdata]$ sudo vi myid
[hadoop@hdp1 zkdata]$ cat myid
1
[hadoop@hdp1 zkdata]$
[hadoop@hdp2 app]$ cd /home/hadoop/data/zookeeper/zkdata
[hadoop@hdp2 zkdata]$ vi myid
[hadoop@hdp2 zkdata]$ sudo vi myid
[hadoop@hdp2 zkdata]$ cat myid
2
[hadoop@hdp3 app]$ cd /home/hadoop/data/zookeeper/zkdata
[hadoop@hdp3 zkdata]$ vi myid
[hadoop@hdp3 zkdata]$ sudo vi myid
[hadoop@hdp3 zkdata]$ cat myid
3
启动 ZooKeeper
[hadoop@hdp1 zkdata]$ runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh start" all
*****************hdp1**********************
ZooKeeper JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
*****************hdp2**********************
ZooKeeper JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
*****************hdp3**********************
ZooKeeper JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
查看ZooKeeper进程
[hadoop@hdp1 zkdata]$ runRemoteCmd.sh "jps" all
*****************hdp1**********************
88755 Jps
88686 QuorumPeerMain
*****************hdp2**********************
5077 QuorumPeerMain
5157 Jps
*****************hdp3**********************
3795 QuorumPeerMain
3863 Jps
查看ZooKeeper状态
[hadoop@hdp1 zkdata]$ runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh status" all
*****************hdp1**********************
ZooKeeper JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
*****************hdp2**********************
ZooKeeper JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader
*****************hdp3**********************
ZooKeeper JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
关闭ZooKeeper
[hadoop@hdp1 zkdata]$ runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh stop" all
*****************hdp1**********************
ZooKeeper JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
*****************hdp2**********************
ZooKeeper JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
*****************hdp3**********************
ZooKeeper JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
总结
虽然ZooKeeper⾮常容易上⼿,但是在现实环境中还是有很多参数需要微调,ZooKeeper的可靠性和性能取决于正确的配置,因此了解ZooKeeper是如何⼯作的以及各个参数的具体含义非常重要。如果参数中的时间、quorum配置合理,ZooKeeper可以适应很多种网络拓扑结构。