根据前面三篇博客的操作,已经制作好master-server.tar.gz 和agent-server.tar.gz镜像包,以及提前给你准备hadoop_CDH.tar.gz,它包含了Hadoop生态的相关组件,额外制作的Flink组件和MySQL驱动包,接下来拷贝这些安装包到生产环境中进行安装。
生产环境的宿主机的Docker环境提前准备好,集群规模是在一台宿主机上分三个节点,分别是server001,server002,server003这三个节点。
1. 创建集群网络
自定义网络,使三个节点之间能够相互通讯。
docker network create --subnet=172.30.0.0/24 cdh-net \
&& docker network ls \
&& docker network inspect cdh-net
2. 创建hosts文件
该操作是将容器内的/etc/hosts映射到宿主机/usr/local/src/hosts,不然每次启动容器都将初始化/etc/hosts配置。
提前在宿主机上创建/usr/local/src/hosts文件:
172.30.0.11 server001
172.30.0.12 server002
172.30.0.13 server003
3. 启动server001节点容器
把server001容器作为主容器。启动运行master-server容器,并设置hostname为server001。
将容器内需要对外开放的端口,映射到宿主机上,-p 宿主机:容器,宿主机上的端口只能和一个容器的端口进行映射。
使用 -v 将宿主机和容器之间的文件进行绑定,修改文件,宿主机和容器同时改变
使用 -e 设置容器时区
# 往docker内加载镜像包:
docker load -i master-server.tar.gz && docker images
结果:
REPOSITORY TAG IMAGE ID CREATED SIZE
server001/cdh 6.3.2 8797163b052f 11 minutes ago 3.62GB
# 创建启动容器:
docker run \
--restart always \
-d --name server001 \
--hostname server001 \
--net cdh-net \
--ip 172.30.0.11 \
-p 8020:8020 \
-p 8088:8088 \
-p 19888:19888 \
-p 9870:9870 \
-p 9000:9000 \
-p 7180:7180 \
-p 2181:2181 \
-p 10000:10000 \
--privileged=true \
-v /usr/local/src/hosts:/etc/hosts \
-v /etc/localtime:/etc/localtime:ro \
-e TZ="Asia/Shanghai" \
master-server/cdh:6.3.2 \
/usr/sbin/init \
&& docker ps
# 进入容器:
docker exec -ti --privileged=true server001 /bin/bash
4. 启动server002节点容器
在agent-server镜像的基础上运行从节点容器,并设置容器名为server002。
# 往docker内加载镜像包:
docker load -i agent-server.tar.gz && docker images
# 创建启动容器:
docker run -d \
--restart always \
--hostname server002 \
--name server002 \
--net cdh-net \
--ip 172.30.0.12 \
--privileged=true \
-v /usr/local/src/hosts:/etc/hosts \
-v /etc/localtime:/etc/localtime \
agent-server/cdh:6.3.2 \
/usr/sbin/init \
&& docker ps
# 进入容器:
docker exec -ti --privileged=true server002 /bin/bash
5. 启动server003节点容器
在agent-server镜像的基础上运行从节点容器,并设置容器名为server003。
# 创建启动容器:
docker run -d \
--restart always \
--hostname server003 \
--name server003 \
--net cdh-net \
--ip 172.30.0.13 \
--privileged=true \
-v /usr/local/src/hosts:/etc/hosts \
-v /etc/localtime:/etc/localtime \
agent-server/cdh:6.3.2 \
/usr/sbin/init \
&& docker ps
# 进入容器:
docker exec -ti --privileged=true server003 /bin/bash
6. 测试ping(各个容器)
由于各个容器/etc/hosts已经设置相关映射,所以可以通过hostname相互ping
ping server001 -c 3 && ping server002 -c 3 && ping server003 -c 3
结果:
PING asrserver001 (172.30.0.11) 56(84) bytes of data.
64 bytes from server001 (172.30.0.11): icmp_seq=1 ttl=64 time=0.033 ms
64 bytes from server001 (172.30.0.11): icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from server001 (172.30.0.11): icmp_seq=3 ttl=64 time=0.066 ms
--- server001 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.033/0.051/0.066/0.016 ms
PING server002 (172.30.0.12) 56(84) bytes of data.
64 bytes from server002 (172.30.0.12): icmp_seq=1 ttl=64 time=0.074 ms
64 bytes from server002 (172.30.0.12): icmp_seq=2 ttl=64 time=0.082 ms
64 bytes from server002 (172.30.0.12): icmp_seq=3 ttl=64 time=0.084 ms
--- server002 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.074/0.080/0.084/0.004 ms
PING server003 (172.30.0.13) 56(84) bytes of data.
64 bytes from server003 (172.30.0.13): icmp_seq=1 ttl=64 time=0.073 ms
64 bytes from server003 (172.30.0.13): icmp_seq=2 ttl=64 time=0.053 ms
64 bytes from server003 (172.30.0.13): icmp_seq=3 ttl=64 time=0.082 ms
--- server003 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.053/0.069/0.082/0.013 ms
7. 配置ssh(各个容器)
将公钥发送给每台节点,每个节点都要操作一遍
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa \
&& ssh-copy-id server001 \
&& ssh-copy-id server002 \
&& ssh-copy-id server003
# root的密码: 123456
8. 拷贝安装包
在宿主机上解压hadoop_CDH.tar.gz安装包,然后将安装包拷贝到server001容器内,接下来都在server001上操作
先在宿主机上,将安装包拷贝到server001容器的/root目录下:
[root@cdhuser ~]# tree /root/hadoop_CDH/
hadoop_CDH/
├── flink-csd
│ ├── FLINK-1.10.2.jar
│ └── FLINK_ON_YARN-1.10.2.jar
├── mysql-jdbc
│ └── mysql-connector-java.jar
└── parcel
├── CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel
├── FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel
└── manifest.json
3 directories, 6 files
在宿主机上执行,开始拷贝:
[root@cdhuser ~]# docker cp /root/hadoop_CDH/ server001:/root
9. 拷贝mysql jdbc驱动
该驱动是为了让CDH服务能够连接MySQL数据库,初始化数据
# 回到server001容器内
mkdir -p /usr/share/java/ \
&& cp /root/hadoop_CDH/mysql-jdbc/mysql-connector-java.jar /usr/share/java/ \
&& rm -rf /root/hadoop_CDH/mysql-jdbc/ \
&& ls /usr/share/java/
结果:
mysql-connector-java.jar
10. 配置parcel库
安装CDH时,将从parcel库中下载组件并分发到各个节点进行安装配置
cd /opt/cloudera/parcel-repo/;mv /root/hadoop_CDH/parcel/* ./ \
&& sha1sum CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel| awk '{ print $1 }' > CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha \
&& sha1sum FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel | awk '{ print $1 }' > FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel.sha \
&& rm -rf /root/hadoop_CDH/parcel/ \
&& chown -R cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo/* \
&& ll /opt/cloudera/parcel-repo/
结果:
total 2330364
-rw-r--r-- 1 cloudera-scm cloudera-scm 2082186246 7月 9 09:56 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel
-rw-r--r-- 1 root root 41 7月 9 10:07 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha
-rw-r--r-- 1 cloudera-scm cloudera-scm 304055379 7月 9 09:55 FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel
-rw-r--r-- 1 root root 41 7月 9 10:07 FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel.sha
-rw-r--r-- 1 cloudera-scm cloudera-scm 34411 7月 9 09:55 manifest.json
11. 移动Flink
上面已经Flink的parcel包拷贝到/opt/cloudera/parcel-repo/下,在将Flink的jar拷贝到/opt/cloudera/csd/
cp /root/hadoop_CDH/flink-csd/* /opt/cloudera/csd/ \
&& ll /opt/cloudera/csd/ \
&& rm -rf /root/hadoop_CDH/flink-csd/
结果:
total 20
-rw-r--r-- 1 root root 7737 7月 9 10:01 FLINK-1.10.2.jar
-rw-r--r-- 1 root root 8260 7月 9 10:01 FLINK_ON_YARN-1.10.2.jar
12. 初始化scm库
MySql和CDH是在同一个容器内,这样方便迁移。之前尝试过将mysql单独做成一个容器,但是在迁移的过程遇到不明的问题,暂且放弃这样方案
CDH的操作数据也可以存储在oracle等数据库中,从scm_prepare_database.sh可知;
从/etc/cloudera-scm-server/db.properties可以看到数据库连接信息
mysql和cdh同容器:
/opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm 123456
如果是容器外的mysql数据库,则需要指定-h主机(默认本地) -p端口(默认3306)
/opt/cloudera/cm/schema/scm_prepare_database.sh mysql -hlocalhost scm scm 123456
结果:
JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
Verifying that we can write to /etc/cloudera-scm-server
Creating SCM configuration file in /etc/cloudera-scm-server
Executing: /usr/java/jdk1.8.0_181-cloudera/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/opt/cloudera/cm/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
Tue Jul 06 08:58:16 UTC 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[ main] DbCommandExecutor INFO Successfully connected to database.
All done, your SCM database is configured correctly!
13. 启动所有节点的cloudera-scm-agent服务
systemctl restart cloudera-scm-agent
14. 启动server001上的cloudera-scm-server服务
systemctl start cloudera-scm-server \
&& sleep 2 \
&& tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
结果:等待启动
2021-07-06 09:01:33,685 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server. 启动成功标识
2021-07-06 09:02:23,792 INFO avro-servlet-hb-processor-2:com.cloudera.server.common.AgentAvroServlet: (5 skipped) AgentAvroServlet: heartbeat processing stats: average=46ms, min=11ms, max=192ms.
15. 验证启动
systemctl start cloudera-scm-server \
&& sleep 2 \
&& tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
结果:等待启动
2021-07-06 09:01:33,685 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server. 启动成功标识
2021-07-06 09:02:23,792 INFO avro-servlet-hb-processor-2:com.cloudera.server.common.AgentAvroServlet: (5 skipped) AgentAvroServlet: heartbeat processing stats: average=46ms, min=11ms, max=192ms.
# 在容器内运行:
[root@asrserver001 ~]# curl http://172.30.0.11:7180
<head><meta http-equiv="refresh" content="0;url=/cmf/"></head>
# 在宿主机上运行:
[root@cdhuser ~]# curl http://192.168.60.100:7180
<head><meta http-equiv="refresh" content="0;url=/cmf/"></head>
说明scm已经启动成功,能够登录cm平台
登录账号密码: admin/admin
到此为止,CDH的master-server和agent-server已经启动成功,接下来开始配置CDH,使用CDH安装大数据所需的组件。
有问题欢迎在评论区留言,你的提问就是对我最大的支持