apache大数据数仓各组件部署搭建
第一章 环境准备
1. 机器规划
准备3台服务器用于集群部署,系统建议CentOS7+,2核8G内存
172.19.195.228 hadoop101
172.19.195.229 hadoop102
172.19.195.230 hadoop103
[root@hadoop101 ~]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
[root@hadoop101 ~]# hostname
hadoop101
2. 安装包下载准备
数仓部署组件安装包:
链接:https://pan.baidu.com/s/1Wjx6TNkedMTmmnuWREW-OQ
提取码:bpk0
已经把相关组件均上传至网盘,也可自行去各自官方地址去下载收集;
3. 配置服务器hosts
3台机器的/etc/hosts主机名解析配置:
# hadoop101
[root@hadoop101 ~]# cat /etc/hosts
127.0.0.1 localhost localhost
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
# hadoop集群
172.19.195.228 hadoop101
172.19.195.229 hadoop102
172.19.195.230 hadoop103
# hadoop102
[root@hadoop102 ~]# cat /etc/hosts
127.0.0.1 localhost localhost
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
# hadoop集群
172.19.195.228 hadoop101
172.19.195.229 hadoop102
172.19.195.230 hadoop103
# hadoop103
[root@hadoop103 ~]# cat /etc/hosts
127.0.0.1 localhost localhost
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
# hadoop集群
172.19.195.228 hadoop101
172.19.195.229 hadoop102
172.19.195.230 hadoop103
4. 配置服务器之间免密登录
# 创建密钥
[root@hadoop101 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:pgAtkJ9Tmf8sqBYOkK2gr/d7woIPXDguOiHRxRHDVH4 root@hadoop101
The key's randomart image is:
+---[RSA 2048]----+
|.. +=*. |
|.. .Bo |
| =o+... E |
|= Bo .. |
|+= o.. oS |
|O + ...oo |
|+O + .. |
|=.B o . |
|o*.oo+ |
+----[SHA256]-----+
# 分发密钥
[root@hadoop101 ~]# ssh-copy-id hadoop101
Are you sure you want to continue connecting (yes/no)? yes
root@hadoop101's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop101'"
and check to make sure that only the key(s) you wanted were added.
[root@hadoop101 ~]# ssh-copy-id hadoop102
Are you sure you want to continue connecting (yes/no)? yes
root@hadoop102's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop102'"
and check to make sure that only the key(s) you wanted were added.
[root@hadoop101 ~]# ssh-copy-id hadoop103
Are you sure you want to continue connecting (yes/no)? yes
root@hadoop103's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop103'"
and check to make sure that only the key(s) you wanted were added.
# hadoop102和hadoop103操作相同
[root@hadoop102 ~]# ssh-keygen -t rsa
[root@hadoop102 ~]# ssh-copy-id hadoop101
[root@hadoop102 ~]# ssh-copy-id hadoop102
[root@hadoop102 ~]# ssh-copy-id hadoop103
[root@hadoop103 ~]# ssh-keygen -t rsa
[root@hadoop103 ~]# ssh-copy-id hadoop101
[root@hadoop103 ~]# ssh-copy-id hadoop102
[root@hadoop103 ~]# ssh-copy-id hadoop103
# 验证
[root@hadoop101 ~]# for i in hadoop101 hadoop102 hadoop103;do ssh $i hostname;done
hadoop101
hadoop102
hadoop103
5. 创建安装包和应用目录
[root@hadoop101 ~]# for i in hadoop101 hadoop102 hadoop103;do ssh $i mkdir /opt/{software,module};done
6. 上传安装包至服务器
# 上传所有安装包至hadoop101 /opt/software目录
[root@hadoop101 ~]# cd /opt/software/
[root@hadoop101 software]# ll
total 1489048
-rw-r--r-- 1 root root 278813748 Aug 26 15:13 apache-hive-3.1.2-bin.tar.gz
-rw-r--r-- 1 root root 9136463 Aug 26 15:13 apache-maven-3.6.1-bin.tar.gz
-rw-r--r-- 1 root root 9311744 Aug 26 15:13 apache-zookeeper-3.5.7-bin.tar.gz
-rw-r--r-- 1 root root 338075860 Aug 26 15:14 hadoop-3.1.3.tar.gz
-rw-r--r-- 1 root root 314030393 Aug 26 15:14 hue.tar
-rw-r--r-- 1 root root 194990602 Aug 26 15:15 jdk-8u211-linux-x64.tar.gz
-rw-r--r-- 1 root root 70057083 Aug 26 15:14 kafka_2.11-2.4.0.tgz
-rw-r--r-- 1 root root 77807942 Aug 26 15:15 mysql-libs.zip
-rw-r--r-- 1 root root 232530699 Aug 26 15:15 spark-2.4.5-bin-hadoop2.7.tgz
7. 各服务器关闭防火墙
# 防火墙关闭
[root@hadoop101 software]# systemctl stop firewalld.service
[root@hadoop101 software]# systemctl disable firewalld.service
[root@hadoop102 ~]# systemctl stop firewalld.service
[root@hadoop102 ~]# systemctl disable firewalld.service
[root@hadoop103 ~]# systemctl stop firewalld.service
[root@hadoop103 ~]# systemctl disable firewalld.service
8. 本地Windows电脑配置hosts( 可选 )
【注意】如果不配置则涉及URL访问时,浏览器访问时需要使用各服务器ip地址
C:\Windows\System32\drivers\etc\hosts
# hadoop集群
139.224.229.107 hadoop101
139.224.66.13 hadoop102
139.224.228.144 hadoop103
9. 安装java ( jdk1.8 )
# 解压包
[root@hadoop101 software]# tar -xf jdk-8u211-linux-x64.tar.gz -C /opt/module/
[root@hadoop101 software]# ls /opt/module/
jdk1.8.0_211
# 增加环境变量配置
[root@hadoop101 software]# vim /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_211
export PATH=$PATH:$JAVA_HOME/bin
# 分发jdk目录至hadoop102、hadoop103
[root@hadoop101 software]# scp -r /opt/module/jdk1.8.0_211 hadoop102:/opt/module/
[root@hadoop101 software]# scp -r /opt/module/jdk1.8.0_211 hadoop103:/opt/module/
# 分发环境变量配置至hadoop102、hadoop103
[root@hadoop101 software]# scp /etc/profile hadoop102:/etc/
[root@hadoop101 software]# scp /etc/profile hadoop103:/etc/
# source应用环境变量hadoop101、hadoop102、hadoop103
[root@hadoop101 software]# source /etc/profile && java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
[root@hadoop102 ~]# source /etc/profile && java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
[root@hadoop103 ~]# source /etc/profile && java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
第二章 zookeeper安装部署
zookeeper完整详细的内容介绍可见zookeeper简介、zookeeper部署以及原理介绍
1. 解压zookeeper安装包
[root@hadoop101 software]# tar -xf apache-zookeeper-3.5.7-bin.tar.gz -C /opt/module/
2. 创建zkData目录
[root@hadoop101 software]# mkdir -p /opt/module/apache-zookeeper-3.5.7-bin/zkData
3. 设定节点myid号
[root@hadoop101 software]# echo 1 > /opt/module/apache-zookeeper-3.5.7-bin/zkData/myid
4. 修改zoo.cfg配置文件
[root@hadoop101 software]# cd /opt/module/apache-zookeeper-3.5.7-bin/conf/
[root@hadoop101 conf]# ll
total 12
-rw-r--r-- 1 502 games 535 May 4 2018 configuration.xsl
-rw-r--r-- 1 502 games 2712 Feb 7 2020 log4j.properties
-rw-r--r-- 1 502 games 922 Feb 7 2020 zoo_sample.cfg
[root@hadoop101 conf]# mv zoo_sample.cfg zoo.cfg
[root@hadoop101 conf]# vim zoo.cfg
# 为方便查看,将注释行都删去,可保留
[root@hadoop101 conf]# cat zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/module/apache-zookeeper-3.5.7-bin/zkData
clientPort=2181
server.1=hadoop101:2888:3888
server.2=hadoop102:2888:3888
server.3=hadoop103:2888:3888
[root@hadoop101 conf]#
5. 分发应用目录
[root@hadoop101 module]# scp -r apache-zookeeper-3.5.7-bin hadoop102:/opt/module/
[root@hadoop101 module]# scp -r apache-zookeeper-3.5.7-bin hadoop103:/opt/module/
6. 更改其它节点myid号
[root@hadoop102 ~]# echo 2 > /opt/module/apache-zookeeper-3.5.7-bin/zkData/myid
[root@hadoop103 ~]# echo 3 > /opt/module/apache-zookeeper-3.5.7-bin/zkData/myid
7. 检查各节点myid号确保正确
[root@hadoop101 module]# for i in hadoop101 hadoop102 hadoop103;do ssh $i cat /opt/module/apache-zookeeper-3.5.7-bin/zkData/myid;done
1
2
3
8. 启动集群各节点服务
[root@hadoop101 module]# /opt/module/apache-zookeeper-3.5.7-bin/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@hadoop102 module]# /opt/module/apache-zookeeper-3.5.7-bin/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@hadoop103 ~]# /opt/module/apache-zookeeper-3.5.7-bin/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
9. 查验服务是否运行
[root@hadoop101 module]# for i in hadoop101 hadoop102 hadoop103;do ssh $i $JAVA_HOME/bin/jps -l|grep -v jps;done
5856 org.apache.zookeeper.server.quorum.QuorumPeerMain
5747 org.apache.zookeeper.server.quorum.QuorumPeerMain
5754 org.apache.zookeeper.server.quorum.QuorumPeerMain
第三章 hadoop集群安装部署
hadoop完整详细的内容介绍可见hadoop介绍部署文档
1. 解压hadoop安装包
[root@hadoop101 software]# tar -xf hadoop-3.1.3.tar.gz -C /opt/module/
2. 配置hadoop环境变量文件
# 文件末尾增加如下配置
[root@hadoop101 software]# vim /etc/profile
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
3. 各配置文件路径
[root@hadoop101 software]# cd /opt/module/hadoop-3.1.3/etc/hadoop/
[root@hadoop101 hadoop]# ll
total 176
-rw-r--r-- 1 1000 1000 8260 Sep 12 2019 capacity-scheduler.xml
-rw-r--r-- 1 1000 1000 1335 Sep 12 2019 configuration.xsl
-rw-r--r-- 1 1000 1000 1940 Sep 12 2019 container-executor.cfg
-rw-r--r-- 1 1000 1000 1353 Aug 26 16:29 core-site.xml
-rw-r--r-- 1 1000 1000 3999 Sep 12 2019 hadoop-env.cmd
-rw-r--r-- 1 1000 1000 15946 Aug 26 16:42 hadoop-env.sh
-rw-r--r-- 1 1000 1000 3323 Sep 12 2019 hadoop-metrics2.properties
-rw-r--r-- 1 1000 1000 11392 Sep 12 2019 hadoop-policy.xml
-rw-r--r-- 1 1000 1000 3414 Sep 12 2019 hadoop-user-functions.sh.example
-rw-r--r-- 1 1000 1000 2956 Aug 26 16:28 hdfs-site.xml
-rw-r--r-- 1 1000 1000 1484 Sep 12 2019 httpfs-env.sh
-rw-r--r-- 1 1000 1000 1657 Sep 12 2019 httpfs-log4j.properties
-rw-r--r-- 1 1000 1000 21 Sep 12 2019 httpfs-signature.secret
-rw-r--r-- 1 1000 1000 620 Sep 12 2019 httpfs-site.xml
-rw-r--r-- 1 1000 1000 3518 Sep 12 2019 kms-acls.xml
-rw-r--r-- 1 1000 1000 1351 Sep 12 2019 kms-env.sh
-rw-r--r-- 1 1000 1000 1747 Sep 12 2019 kms-log4j.properties
-rw-r--r-- 1 1000 1000 682 Sep 12 2019 kms-site.xml
-rw-r--r-- 1 1000 1000 13326 Sep 12 2019 log4j.properties
-rw-r--r-- 1 1000 1000 951 Sep 12 2019 mapred-env.cmd
-rw-r--r-- 1 1000 1000 1764 Sep 12 2019 mapred-env.sh
-rw-r--r-- 1 1000 1000 4113 Sep 12 2019 mapred-queues.xml.template
-rw-r--r-- 1 1000 1000 758 Sep 12 2019 mapred-site.xml
drwxr-xr-x 2 1000 1000 4096 Sep 12 2019 shellprofile.d
-rw-r--r-- 1 1000 1000 2316 Sep 12 2019 ssl-client.xml.example
-rw-r--r-- 1 1000 1000 2697 Sep 12 2019 ssl-server.xml.example
-rw-r--r-- 1 1000 1000 2642 Sep 12 2019 user_ec_policies.xml.template
-rw-r--r-- 1 1000 1000 30 Aug 26 16:33 workers
-rw-r--r-- 1 1000 1000 2250 Sep 12 2019 yarn-env.cmd
-rw-r--r-- 1 1000 1000 6056 Sep 12 2019 yarn-env.sh
-rw-r--r-- 1 1000 1000 2591 Sep 12 2019 yarnservice-log4j.properties
-rw-r--r-- 1 1000 1000 2029 Aug 26 16:32 yarn-site.xml
4. 修改hdfs-site配置文件
[root@hadoop101 hadoop]# vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!--配置副本数-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!--配置nameservice-->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!--配置多NamenNode-->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2,nn3</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop101:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop102:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn3</name>
<value>hadoop103:8020</value>
</property>
<!--为NamneNode设置HTTP服务监听-->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop101:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop102:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn3</name>
<value>hadoop103:9870</value>
</property>
<!--配置jn节点,该节点用于各NameNode节点通信-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop101:8485;hadoop102:8485;hadoop103:8485/mycluster</value>
</property>
<!--配置HDFS客户端联系Active NameNode节点的Java类-->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh无秘钥登录-->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!-- 关闭权限检查-->
<property>
<name>dfs.permissions.enable</name>
<value>false</value>
</property>
<!--配置故障自动转移-->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
5. 修改core-site配置文件
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!--指定defaultFS-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!--指定jn存储路径-->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/module/hadoop-3.1.3/JN/data</value>
</property>
<!--配置hadoop运行时临时文件-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.3/tmp</value>
</property>
<!--配置zookeeper地址-->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop101:2181,hadoop102:2181,hadoop103:2181</value>
</property>
</configuration>
6. 修改yarn-site配置文件
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!--yarn 高可用配置-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop101</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop103</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop101:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop103:8088</value>
</property>
<property>
<name>hadoop.zk.address</name>
<value>hadoop101:2181,hadoop102:2181,hadoop103:2181</value>
</property>
<!--启用自动恢复-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!--指定resourcemanager的状态信息存储在zookeeper集群-->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
</configuration>
7. 修改workers定义集群各节点
[root@hadoop101 hadoop]# vim workers
hadoop101
hadoop102
hadoop103
8. 修改hadoop-env.sh定义java环境变量
# 文件末尾增添一行即可
[root@hadoop101 hadoop]# vim hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_211
9. 各启动脚本路径
[root@hadoop101 hadoop]# cd /opt/module/hadoop-3.1.3/sbin/
[root@hadoop101 sbin]# ll
total 112
-rwxr-xr-x 1 1000 1000 2756 Sep 12 2019 distribute-exclude.sh
drwxr-xr-x 4 1000 1000 4096 Sep 12 2019 FederationStateStore
-rwxr-xr-x 1 1000 1000 1983 Sep 12 2019 hadoop-daemon.sh
-rwxr-xr-x 1 1000 1000 2522 Sep 12 2019 hadoop-daemons.sh
-rwxr-xr-x 1 1000 1000 1542 Sep 12 2019 httpfs.sh
-rwxr-xr-x 1 1000 1000 1500 Sep 12 2019 kms.sh
-rwxr-xr-x 1 1000 1000 1841 Sep 12 2019 mr-jobhistory-daemon.sh
-rwxr-xr-x 1 1000 1000 2086 Sep 12 2019 refresh-namenodes.sh
-rwxr-xr-x 1 1000 1000 1779 Sep 12 2019 start-all.cmd
-rwxr-xr-x 1 1000 1000 2221 Sep 12 2019 start-all.sh
-rwxr-xr-x 1 1000 1000 1880 Sep 12 2019 start-balancer.sh
-rwxr-xr-x 1 1000 1000 1401 Sep 12 2019 start-dfs.cmd
-rwxr-xr-x 1 1000 1000 5325 Aug 26 16:37 start-dfs.sh
-rwxr-xr-x 1 1000 1000 1793 Sep 12 2019 start-secure-dns.sh
-rwxr-xr-x 1 1000 1000 1571 Sep 12 2019 start-yarn.cmd
-rwxr-xr-x 1 1000 1000 3427 Aug 26 16:39 start-yarn.sh
-rwxr-xr-x 1 1000 1000 1770 Sep 12 2019 stop-all.cmd
-rwxr-xr-x 1 1000 1000 2166 Sep 12 2019 stop-all.sh
-rwxr-xr-x 1 1000 1000 1783 Sep 12 2019 stop-balancer.sh
-rwxr-xr-x 1 1000 1000 1455 Sep 12 2019 stop-dfs.cmd
-rwxr-xr-x 1 1000 1000 4053 Aug 26 16:38 stop-dfs.sh
-rwxr-xr-x 1 1000 1000 1756 Sep 12 2019 stop-secure-dns.sh
-rwxr-xr-x 1 1000 1000 1642 Sep 12 2019 stop-yarn.cmd
-rwxr-xr-x 1 1000 1000 3168 Aug 26 16:40 stop-yarn.sh
-rwxr-xr-x 1 1000 1000 1982 Sep 12 2019 workers.sh
-rwxr-xr-x 1 1000 1000 1814 Sep 12 2019 yarn-daemon.sh
-rwxr-xr-x 1 1000 1000 2328 Sep 12 2019 yarn-daemons.sh
10. 修改start-dfs脚本,增添用户变量
# 在脚本开头增加变量配置
[root@hadoop101 sbin]# vim start-dfs.sh
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
11. 修改stop-dfs脚本,增添用户变量
# 在脚本开头增加变量配置
[root@hadoop101 sbin]# vim stop-dfs.sh
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
12. 修改start-yarn脚本,增添用户变量
# 在脚本开头增加变量配置
[root@hadoop101 sbin]# vim start-yarn.sh
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
13. 修改stop-yarn脚本,增添用户变量
# 在脚本开头增加变量配置
[root@hadoop101 sbin]# vim stop-yarn.sh
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
14. 分发安装目录
[root@hadoop101 module]# scp -r hadoop-3.1.3 hadoop102:/opt/module/
[root@hadoop101 module]# scp -r hadoop-3.1.3 hadoop103:/opt/module/
15. 分发环境变量文件profile
# 分发至其它节点
[root@hadoop101 module]# scp /etc/profile hadoop102:/etc/
[root@hadoop101 module]# scp /etc/profile hadoop103:/etc/
# 各节点source引用
[root@hadoop101 ~]# source /etc/profile
[root@hadoop102 ~]# source /etc/profile
[root@hadoop103 ~]# source /etc/profile
16. 各节点启动journalnode服务
[root@hadoop101 module]# hadoop-daemon.sh start journalnode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating.
[root@hadoop102 module]# hadoop-daemon.sh start journalnode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating.
[root@hadoop103 ~]# hadoop-daemon.sh start journalnode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating.
17. 在nn1( hadoop101 )上对namenode进行格式化
[root@hadoop101 module]# hdfs namenode -format
2021-08-26 16:53:52,236 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hadoop101/172.19.195.228
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.1.3
...
...
2021-08-26 16:53:54,483 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
2021-08-26 16:53:54,484 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop101/172.19.195.228
************************************************************/
18. nn1启动namenode,nn2、nn3同步
[root@hadoop101 module]# hadoop-daemon.sh start namenode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
[root@hadoop101 module]# jps
5856 QuorumPeerMain
9681 NameNode # 成功启动
8379 JournalNode
9790 Jps
# 同步
[root@hadoop102 module]# hdfs namenode -bootstrapStandby
[root@hadoop103 module]# hdfs namenode -bootstrapStandby
# 启动
[root@hadoop102 module]# hadoop-daemon.sh start namenode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
[root@hadoop103 ~]# hadoop-daemon.sh start namenode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
19. 关闭全部hdfs服务
[root@hadoop101 module]# stop-all.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Stopping namenodes on [hadoop101 hadoop102 hadoop103]
Last login: Thu Aug 26 15:30:42 CST 2021 from 172.19.195.228 on pts/1
Stopping datanodes
Last login: Thu Aug 26 17:19:24 CST 2021 on pts/0
Stopping journal nodes [hadoop101 hadoop102 hadoop103]
Last login: Thu Aug 26 17:19:25 CST 2021 on pts/0
Stopping ZK Failover Controllers on NN hosts [hadoop101 hadoop102 hadoop103]
Last login: Thu Aug 26 17:19:27 CST 2021 on pts/0
Stopping nodemanagers
Last login: Thu Aug 26 17:19:30 CST 2021 on pts/0
Stopping resourcemanagers on [ hadoop101 hadoop103]
Last login: Thu Aug 26 17:19:30 CST 2021 on pts/0
[root@hadoop101 module]#
20. 初始化HA在Zookeeper中状态
[root@hadoop101 module]# hdfs zkfc -formatZK
2021-08-26 17:20:32,622 INFO tools.DFSZKFailoverController: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DFSZKFailoverController
STARTUP_MSG: host = hadoop101/172.19.195.228
STARTUP_MSG: args = [-formatZK]
STARTUP_MSG: version = 3.1.3
...
...
2021-08-26 17:20:33,263 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x100004943610000
2021-08-26 17:20:33,265 INFO tools.DFSZKFailoverController: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DFSZKFailoverController at hadoop101/172.19.195.228
************************************************************/
21. 启动全部hdfs服务
[root@hadoop101 module]# start-all.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [hadoop101 hadoop102 hadoop103]
Last login: Thu Aug 26 17:19:34 CST 2021 on pts/0
Starting datanodes
Last login: Thu Aug 26 17:21:17 CST 2021 on pts/0
Starting journal nodes [hadoop101 hadoop102 hadoop103]
Last login: Thu Aug 26 17:21:20 CST 2021 on pts/0
Starting ZK Failover Controllers on NN hosts [hadoop101 hadoop102 hadoop103]
Last login: Thu Aug 26 17:21:25 CST 2021 on pts/0
Starting resourcemanagers on [ hadoop101 hadoop103]
Last login: Thu Aug 26 17:21:29 CST 2021 on pts/0
Starting nodemanagers
Last login: Thu Aug 26 17:21:36 CST 2021 on pts/0
22. 页面URL验证
管理首页:
http://hadoop101:9870
http://hadoop102:9870
http://hadoop103:9870
节点任务页:8088端口
第四章 MySQL安装部署
【注意】MySQL是公用中间件,1台服务器部署即可
1. 卸载mysql-libs
lib库会涉及有mariadb-libs
[root@hadoop101 hadoop]# yum remove mysql-libs
2. 安装mysql公用依赖
[root@hadoop101 hadoop]# yum install -y libaio autoconf
3. 官方下载对应版本依赖安装
# 切换至安装包存放目录
[root@hadoop101 hadoop]# cd /opt/software/
# 使用官方地址下载依赖
[root@hadoop101 software]# wget https://downloads.mysql.com/archives/get/p/23/file/MySQL-shared-compat-5.6.24-1.el6.x86_64.rpm
[root@hadoop101 software]# wget https://downloads.mysql.com/archives/get/p/23/file/MySQL-shared-5.6.24-1.el7.x86_64.rpm
# 将下载的依赖包安装
[root@hadoop101 software]# rpm -ivh MySQL-shared-5.6.24-1.el7.x86_64.rpm
[root@hadoop101 software]# rpm -ivh MySQL-shared-compat-5.6.24-1.el6.x86_64.rpm
4. 解压mysql安装包
# 安装unzip解压工具
[root@hadoop101 software]# yum install -y unzip
# 解压上传的mysql安装压缩包
[root@hadoop101 software]# unzip mysql-libs.zip
Archive: mysql-libs.zip
creating: mysql-libs/
inflating: mysql-libs/MySQL-client-5.6.24-1.el6.x86_64.rpm
inflating: mysql-libs/mysql-connector-java-5.1.27.tar.gz
inflating: mysql-libs/MySQL-server-5.6.24-1.el6.x86_64.rpm
[root@hadoop101 software]#
[root@hadoop101 software]# cd mysql-libs
[root@hadoop101 mysql-libs]# ls
MySQL-client-5.6.24-1.el6.x86_64.rpm mysql-connector-java-5.1.27.tar.gz MySQL-server-5.6.24-1.el6.x86_64.rpm
5. 安装mysql服务端server
# 安装server包
[root@hadoop101 mysql-libs]# rpm -ivh MySQL-server-5.6.24-1.el6.x86_64.rpm
# 查看root用户初始密码 (local time): wQIhYE7v2l6Jg6Y6 <- 初始密码
[root@hadoop101 mysql-libs]# cat /root/.mysql_secret
# The random password set for the root user at Fri Aug 27 09:41:28 2021 (local time): wQIhYE7v2l6Jg6Y6
[root@hadoop101 mysql-libs]#
# 启动server服务
[root@hadoop101 mysql-libs]# service mysql start
Starting MySQL. SUCCESS!
# 查看server运行情况
[root@hadoop101 mysql-libs]# service mysql status
SUCCESS! MySQL running (7950)
[root@hadoop101 mysql-libs]# mysqld -V
mysqld Ver 5.6.24 for Linux on x86_64 (MySQL Community Server (GPL))
6. 安装mysql客户端client
# 安装client包
[root@hadoop101 mysql-libs]# rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm
# 尝试连入server
[root@hadoop101 mysql-libs]# mysql -uroot -pwQIhYE7v2l6Jg6Y6
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.24
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
# 将root初始密码设置为方便易用的密码
mysql> SET PASSWORD=PASSWORD('123456');
Query OK, 0 rows affected (0.00 sec)
# 退出mysql
mysql> exit;
Bye
7. 修改mysql用户权限和连接策略
[root@hadoop101 mysql-libs]# mysql -uroot -p123456
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.6.24 MySQL Community Server (GPL)
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
4 rows in set (0.00 sec)
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select User, Host, Password from user;
+------+-----------+-------------------------------------------+
| User | Host | Password |
+------+-----------+-------------------------------------------+
| root | localhost | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| root | hadoop101 | *E288B1DC67ADA34893E6D82AFAAFA408E6BB29D4 |
| root | 127.0.0.1 | *E288B1DC67ADA34893E6D82AFAAFA408E6BB29D4 |
| root | ::1 | *E288B1DC67ADA34893E6D82AFAAFA408E6BB29D4 |
+------+-----------+-------------------------------------------+
4 rows in set (0.00 sec)
mysql> update user set host='%' where host='localhost';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> delete from user where host!='%';
Query OK, 3 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> select User, Host, Password from user;
+------+------+-------------------------------------------+
| User | Host | Password |
+------+------+-------------------------------------------+
| root | % | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
+------+------+-------------------------------------------+
1 row in set (0.00 sec)
mysql> exit;
Bye
第五章 hive安装部署
hive完整详细的内容介绍可见Hive简介及hive部署、原理和使用介绍
1. 安装hive包
[root@hadoop101 mysql-libs]# cd /opt/software/
[root@hadoop101 software]# tar -xf apache-hive-3.1.2-bin.tar.gz -C /opt/module/
2. 拷贝mysql-connector驱动到hive库
# 部署MySQL应用时的zip包中有mysql-connector工具
[root@hadoop101 software]# cd mysql-libs
[root@hadoop101 mysql-libs]# ls
MySQL-client-5.6.24-1.el6.x86_64.rpm mysql-connector-java-5.1.27.tar.gz MySQL-server-5.6.24-1.el6.x86_64.rpm
[root@hadoop101 mysql-libs]# tar -xf mysql-connector-java-5.1.27.tar.gz
[root@hadoop101 mysql-libs]# cd mysql-connector-java-5.1.27
[root@hadoop101 mysql-connector-java-5.1.27]# ls
build.xml CHANGES COPYING docs mysql-connector-java-5.1.27-bin.jar README README.txt src
[root@hadoop101 mysql-connector-java-5.1.27]# cp mysql-connector-java-5.1.27-bin.jar /opt/module/apache-hive-3.1.2-bin/lib/
3. hive配置文件定义
# 最重要目的为配置hive元数据到MySql
[root@hadoop101 mysql-connector-java-5.1.27]# cd /opt/module/apache-hive-3.1.2-bin/conf/
[root@hadoop101 conf]# vim hive-site.xml
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop101:3306/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop101:9083</value>
</property>
<!-- hiveserver2 -->
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hadoop101</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.server2.active.passive.ha.enable</name>
<value>true</value>
</property>
</configuration>
4. 配置hive环境变量
# profile末尾追加hive配置
[root@hadoop101 conf]# vim /etc/profile
#HIVE
export HIVE_HOME=/opt/module/apache-hive-3.1.2-bin
export PATH=$PATH:$HIVE_HOME/bin
# 引用生效
[root@hadoop101 conf]# source /etc/profile
5. 替换hive中guava.jar包
# 查看hive中guava包版本
[root@hadoop101 conf]# cd /opt/module/apache-hive-3.1.2-bin/lib/
[root@hadoop101 lib]# ls *guava*
guava-19.0.jar jersey-guava-2.25.1.jar
# 查看hadoop中guava包版本
[root@hadoop101 lib]# cd /opt/module/hadoop-3.1.3/share/hadoop/common/lib/
[root@hadoop101 lib]# ls *guava*
guava-27.0-jre.jar listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
#######################
# hadoop中guava包版本版本号为guava-27.0-jre.jar
# hive 中guava包版本版本号为guava-19.0.jar
# 需要删除hive原有guava的jar包guava-19.0.jar,并将hadoop中的guava-27.0-jre.jar复制过去
#######################
[root@hadoop101 lib]# cp guava-27.0-jre.jar /opt/module/apache-hive-3.1.2-bin/lib/
[root@hadoop101 lib]# cd /opt/module/apache-hive-3.1.2-bin/lib/
[root@hadoop101 lib]# ls -l *guava*
-rw-r--r-- 1 root root 2308517 Sep 27 2018 guava-19.0.jar
-rw-r--r-- 1 root root 2747878 Aug 27 10:21 guava-27.0-jre.jar
-rw-r--r-- 1 root root 971309 May 21 2019 jersey-guava-2.25.1.jar
[root@hadoop101 lib]# rm -f guava-19.0.jar
6. 启动元数据服务并在后台运行服务
[root@hadoop101 apache-hive-3.1.2-bin]# nohup hive --service metastore > metastore.log>&1 &
[1] 10656
[root@hadoop101 apache-hive-3.1.2-bin]# nohup hive --service hiveserver2 > hiveserver2.log>&1 &
[2] 10830
【注意】:hive 2.x版本以上的高版本需要启动两个服务metastore和hiveserver2
7. 启动hive验证hive命令行
[root@hadoop101 apache-hive-3.1.2-bin]# hive
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/opt/module/jdk1.8.0_211/bin:/opt/module/jdk1.8.0_211/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/jdk1.8.0_211/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = a1dcf9b4-2d47-4a68-9505-b53267e45438
Logging initialized using configuration in jar:file:/opt/module/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = 5a498398-2233-402e-9720-cd066aa68add
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive (default)> show tables;
OK
tab_name
Time taken: 0.725 seconds
hive (default)> create table blizzard(id int,game string);
OK
Time taken: 2.648 seconds
hive (default)> insert into blizzard values(1,'wow');
hive (default)> insert into blizzard values(2,'ow');
hive (default)> select * from blizzard;
OK
blizzard.id blizzard.game
1 wow
2 ow
Time taken: 1.209 seconds, Fetched: 2 row(s)
hive (default)>
第六章 Hue可视化工具安装部署
1. 安装包准备
[root@hadoop101 apache-hive-3.1.2-bin]# cd /opt/software/
[root@hadoop101 software]# scp hue.tar hadoop102:/opt/software/
hue.tar
【注意】:
- 这里的hue.tar包是将正常运行环境配置好以后的应用打成的tar包,也就是按照服务器集群配置操作就能开箱即用,否则需要去自行编译打包进行多处的配置修改定义。
- hue可视化工具部署在主机名为 hadoop102的节点安装,因配置文件中按照hadoop102来配置的。
2. 解压安装hue包
[root@hadoop102 ~]# cd /opt/software/
[root@hadoop102 software]# ll
total 306676
-rw-r--r-- 1 root root 314030393 Aug 27 10:58 hue.tar
[root@hadoop102 software]# tar -xf hue.tar -C /opt/module/
3. 安装各依赖和相关组件
[root@hadoop102 software]# yum install -y ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel
4. 修改hadoop集群配置
[root@hadoop102 software]# cd /opt/module/hadoop-3.1.3/etc/hadoop/
【注意】:增加的配置均在标签<configuration></configuration>中
# 新增选项
[root@hadoop102 hadoop]# vim hdfs-site.xml
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
# 新增选项
[root@hadoop102 hadoop]# vim core-site.xml
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
# 新增选项
[root@hadoop102 hadoop]# vim httpfs-site.xml
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
5. 分发hadoop配置同步至hadoop101、103
[root@hadoop102 hadoop]# scp hdfs-site.xml core-site.xml httpfs-site.xml hadoop101:/opt/module/hadoop-3.1.3/etc/hadoop/
[root@hadoop102 hadoop]# scp hdfs-site.xml core-site.xml httpfs-site.xml hadoop103:/opt/module/hadoop-3.1.3/etc/hadoop/
6. hue配置文件
[root@hadoop102 hadoop]# cd /opt/module/hue-master/desktop/conf
[root@hadoop102 conf]# ll
total 164
-rw-r--r-- 1 root root 2155 Feb 21 2020 log.conf
-rw-r--r-- 1 root root 77979 Feb 26 2020 pseudo-distributed.ini
-rw-r--r-- 1 root root 78005 Feb 21 2020 pseudo-distributed.ini.tmpl
pseudo-distributed.ini配置文件中如果是自行源码编译打包的,则需要逐个配置比对,涉及的修改项很多,例如:
hue服务和端口
# Webserver listens on this address and port
http_host=hadoop102
http_port=8000
hdfs集群信息
fs_defaultfs=hdfs://mycluster:8020
# NameNode logical name.
logical_name=mycluster
数据库信息
# Port the database server is listening to. Defaults are:
# 1. MySQL: 3306
# 2. PostgreSQL: 5432
# 3. Oracle Express Edition: 1521
port=3306
# Username to authenticate with when connecting to the database.
user=root
...
...
...
7. 将hue应用赋予hue用户所属
# 新增hue用户以及配置密码123456
[root@hadoop102 conf]# useradd hue
[root@hadoop102 conf]# passwd hue
Changing password for user hue.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
# 将应用目录所属主改为hue
[root@hadoop102 conf]# chown -R hue:hue /opt/module/hue-master/
[root@hadoop102 conf]# cd /opt/module/
[root@hadoop102 module]# ll
total 16
drwxr-xr-x 8 root root 4096 Aug 26 16:12 apache-zookeeper-3.5.7-bin
drwxr-xr-x 11 root root 4096 Aug 26 17:15 hadoop-3.1.3
drwxr-xr-x 14 hue hue 4096 Feb 24 2020 hue-master
drwxr-xr-x 7 root root 4096 Aug 26 15:49 jdk1.8.0_211
[root@hadoop102 module]#
8. 重启hdfs集群
# 在hadoop101上把集群服务重启
[root@hadoop101 software]# stop-all.sh
[root@hadoop101 software]# start-all.sh
9. mysql新建hue库
[root@hadoop101 software]# mysql -uroot -p123456
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 66
Server version: 5.6.24 MySQL Community Server (GPL)
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> create database hue;
Query OK, 1 row affected (0.00 sec)
10. 初始化hue数据库
[root@hadoop102 module]# cd /opt/module/hue-master/
[root@hadoop102 hue-master]# build/env/bin/hue syncdb
[root@hadoop102 hue-master]# build/env/bin/hue migrate
11. 启动hue服务
[root@hadoop102 hue-master]# nohup build/env/bin/supervisor > supervisor.log>&1 &
12. 页面访问验证
hadoop102 8000端口
默认用户名密码均为admin / admin
hive中相关操作可以在hue界面上简单使用较为方便
第七章 kafka安装部署
kafka完整详细的内容介绍可见https://blog.csdn.net/wt334502157/article/details/116518259
1. 安装kafka包
[root@hadoop101 ~]# cd /opt/software/
[root@hadoop101 software]# tar -xf kafka_2.11-2.4.0.tgz -C /opt/module/
2. 创建kafka的logs目录
[root@hadoop101 software]# cd /opt/module/kafka_2.11-2.4.0/
[root@hadoop101 kafka_2.11-2.4.0]# mkdir logs
3. kafka配置文件修改
[root@hadoop101 kafka_2.11-2.4.0]# cd config/
[root@hadoop101 config]# vim server.properties
# broker的全局唯一编号,各节点必须不重复
broker.id=0
# 开启关闭删除topic功能
delete.topic.enable=true
# 日志存放路径
log.dirs=/opt/module/kafka_2.11-2.4.0/logs
# Zookeeper集群地址
zookeeper.connect=hadoop101:2181,hadoop102:2181,hadoop103:2181/kafka_2.4
# 全部配置项如下
[root@hadoop101 config]# cat server.properties | grep -vE "^$|^#"
broker.id=0
delete.topic.enable=true
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/opt/module/kafka_2.11-2.4.0/logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=hadoop101:2181,hadoop102:2181,hadoop103:2181/kafka_2.4
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
4. 分发kafka应用目录
[root@hadoop101 config]# cd /opt/module/
[root@hadoop101 module]# scp -r kafka_2.11-2.4.0 hadoop102:/opt/module/
[root@hadoop101 module]# scp -r kafka_2.11-2.4.0 hadoop103:/opt/module/
5. 修改其它节点配置broker.id
[root@hadoop102 ~]# cd /opt/module/kafka_2.11-2.4.0/config/
[root@hadoop102 config]# vim server.properties
broker.id=1
[root@hadoop103 ~]# cd /opt/module/kafka_2.11-2.4.0/config/
[root@hadoop103 config]# vim server.properties
broker.id=2
6. 检查zk运行状态
[root@hadoop101 module]# /opt/module/apache-zookeeper-3.5.7-bin/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower
[root@hadoop102 config]# /opt/module/apache-zookeeper-3.5.7-bin/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: leader
[root@hadoop103 config]# /opt/module/apache-zookeeper-3.5.7-bin/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower
7. 依次启动kafka集群服务
[root@hadoop101 module]# /opt/module/kafka_2.11-2.4.0/bin/kafka-server-start.sh -daemon /opt/module/kafka_2.11-2.4.0/config/server.properties
[root@hadoop102 config]# /opt/module/kafka_2.11-2.4.0/bin/kafka-server-start.sh -daemon /opt/module/kafka_2.11-2.4.0/config/server.properties
[root@hadoop103 config]# /opt/module/kafka_2.11-2.4.0/bin/kafka-server-start.sh -daemon /opt/module/kafka_2.11-2.4.0/config/server.properties
8. 检查zookeeper中kafka注册信息
[root@hadoop101 module]# /opt/module/apache-zookeeper-3.5.7-bin/bin/zkCli.sh
Connecting to localhost:2181
Welcome to ZooKeeper!
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /
[hadoop-ha, kafka_2.4, rmstore, yarn-leader-election, zookeeper]
[zk: localhost:2181(CONNECTED) 1] ls /kafka_2.4
[admin, brokers, cluster, config, consumers, controller, controller_epoch, isr_change_notification, latest_producer_id_block, log_dir_event_notification]
[zk: localhost:2181(CONNECTED) 2] quit
[root@hadoop101 module]#
9. 简单使用验证
# 创建topic
[root@hadoop101 module]# /opt/module/kafka_2.11-2.4.0/bin/kafka-topics.sh --zookeeper hadoop101:2181/kafka_2.4 --create --replication-factor 2 --partitions 3 --topic test0827
Created topic test0827.
[root@hadoop101 module]# /opt/module/kafka_2.11-2.4.0/bin/kafka-topics.sh --zookeeper hadoop101:2181/kafka_2.4 --create --replication-factor 2 --partitions 3 --topic wangt
Created topic wangt.
[root@hadoop101 module]# /opt/module/kafka_2.11-2.4.0/bin/kafka-topics.sh --zookeeper hadoop101:2181/kafka_2.4 --create --replication-factor 2 --partitions 3 --topic wow
Created topic wow.
# 查询topic清单
[root@hadoop101 module]# /opt/module/kafka_2.11-2.4.0/bin/kafka-topics.sh --list --zookeeper hadoop101:2181/kafka_2.4
test0827
wangt
wow
第八章 spark安装部署
kafka完整详细的内容介绍可见https://blog.csdn.net/wt334502157/article/details/119205087
1. 安装spark包
[root@hadoop101 module]# cd /opt/software/
[root@hadoop101 software]# tar -xf spark-2.4.5-bin-hadoop2.7.tgz -C /opt/module/
2. 查看hadoop支持压缩清单
# 查看hadoop支持的压缩清单
[root@hadoop101 conf]# hadoop checknative
2021-08-27 15:09:02,783 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
2021-08-27 15:09:02,787 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2021-08-27 15:09:02,792 WARN zstd.ZStandardCompressor: Error loading zstandard native libraries: java.lang.InternalError: Cannot load libzstd.so.1 (libzstd.so.1: cannot open shared object file: No such file or directory)!
2021-08-27 15:09:02,802 WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable
Native library checking:
hadoop: true /opt/module/hadoop-3.1.3/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
zstd : false
snappy: true /lib64/libsnappy.so.1
lz4: true revision:10301
bzip2: true /lib64/libbz2.so.1
openssl: false Cannot load libcrypto.so (libcrypto.so: cannot open shared object file: No such file or directory)!
ISA-L: false libhadoop was built without ISA-L support
3. 修改spark-env配置项
[root@hadoop101 software]# cd /opt/module/spark-2.4.5-bin-hadoop2.7/conf/
[root@hadoop101 conf]# mv spark-env.sh.template spark-env.sh
# 配置文件最后新增配置项
[root@hadoop101 conf]# vim spark-env.sh
YARN_CONF_DIR=/opt/module/hadoop-3.1.3/etc/hadoop
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
4. 复制hive配置文件至spark
[root@hadoop101 conf]# cp /opt/module/apache-hive-3.1.2-bin/conf/hive-site.xml /opt/module/spark-2.4.5-bin-hadoop2.7/conf/
[root@hadoop101 conf]# ls
docker.properties.template fairscheduler.xml.template hive-site.xml log4j.properties.template metrics.properties.template slaves.template spark-defaults.conf.template spark-env.sh
5. 环境变量配置
[root@hadoop101 conf]# vim /etc/profile
#SPARK_HOME
export SPARK_HOME=/opt/module/spark-2.4.5-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
[root@hadoop101 conf]#
[root@hadoop101 conf]# source /etc/profile
6. 基于spark功能修改hadoop配置
[root@hadoop101 conf]# cd /opt/module/hadoop-3.1.3/etc/hadoop
# yarn-site新增如下配置
[root@hadoop101 hadoop]# vim yarn-site.xml
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,服务器配置很高可以不设置,默认是true -->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<!-- 开启日志聚合 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志聚合HDFS目录 -->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/opt/module/hadoop-3.1.3/yarn-logs</value>
</property>
<!-- 日志保存时间7days,单位秒 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop102:19888/jobhistory/logs</value>
</property>
# mapred-site新增如下配置
[root@hadoop101 hadoop]# vim mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>指定mr框架为yarn方式 </description>
</property>
<!-- 历史日志服务jobhistory相关配置 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop102:10020</value>
<description>历史服务器端口号</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop102:19888</value>
<description>历史服务器的WEB UI端口号</description>
</property>
【注意】:历史服务相关配置都配置的为hadoop102,所以历史服务相关启动停止需要在对应的hadoop102上操作。
7. 分发配置文件
[root@hadoop101 hadoop]# scp yarn-site.xml mapred-site.xml hadoop102:/opt/module/hadoop-3.1.3/etc/hadoop/
[root@hadoop101 hadoop]# scp yarn-site.xml mapred-site.xml hadoop103:/opt/module/hadoop-3.1.3/etc/hadoop/
8. hadoop101节点上重启hadoop集群
[root@hadoop101 hadoop]# stop-all.sh
[root@hadoop101 hadoop]#
[root@hadoop101 hadoop]# start-all.sh
9. hadoop102节点启动hadoop历史服务
[root@hadoop102 config]# mr-jobhistory-daemon.sh start historyserver
WARNING: Use of this script to start the MR JobHistory daemon is deprecated.
WARNING: Attempting to execute replacement "mapred --daemon start" instead.
[root@hadoop102 config]#
[root@hadoop102 config]# jps | grep JobHistory
29311 JobHistoryServer
[root@hadoop102 config]#
10. 配置spark历史服务
[root@hadoop101 hadoop]# cd /opt/module/spark-2.4.5-bin-hadoop2.7/conf/
[root@hadoop101 conf]# ls
docker.properties.template fairscheduler.xml.template hive-site.xml log4j.properties.template metrics.properties.template slaves.template spark-defaults.conf.template spark-env.sh
[root@hadoop101 conf]# mv spark-defaults.conf.template spark-defaults.conf
[root@hadoop101 conf]# vim spark-defaults.conf
spark.yarn.historyServer.address=hadoop102:18080
spark.yarn.historyServer.allowTracking=true
spark.eventLog.dir=hdfs://mycluster/spark_historylog
spark.eventLog.enabled=true
spark.history.fs.logDirectory=hdfs://mycluster/spark_historylog
spark.executor.extraLibraryPath=/opt/module/hadoop-3.1.3/lib/native/
spark.history.fs.logDirectory=hdfs://mycluster/spark_historylog配置对应的目录需要在hdfs上创建
[root@hadoop101 conf]# hadoop dfs -mkdir -p /spark_historylog
11. 分发spark安装目录
[root@hadoop101 module]# scp -r spark-2.4.5-bin-hadoop2.7 hadoop102:/opt/module/
[root@hadoop101 module]# scp -r spark-2.4.5-bin-hadoop2.7 hadoop103:/opt/module/
12. 其余节点配置环境变量
[root@hadoop102 config]# vim /etc/profile
#SPARK_HOME
export SPARK_HOME=/opt/module/spark-2.4.5-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
[root@hadoop102 config]# source /etc/profile
[root@hadoop103 config]# vim /etc/profile
#SPARK_HOME
export SPARK_HOME=/opt/module/spark-2.4.5-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
[root@hadoop103 config]# source /etc/profile
13. hadoop102上启动spark历史服务
[root@hadoop102 config]# start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /opt/module/spark-2.4.5-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-hadoop102.out
[root@hadoop102 config]# jps | grep History
30192 HistoryServer
29311 JobHistoryServer
14. 验证使用spark服务
[root@hadoop102 config]# spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop102:4040
Spark context available as 'sc' (master = local[*], app id = local-1630050046140).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.5
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_211)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("SELECT * FROM blizzard").show
2021-08-27 15:41:40,070 WARN conf.HiveConf: HiveConf of name hive.metastore.event.db.notification.api.auth does not exist
2021-08-27 15:41:40,071 WARN conf.HiveConf: HiveConf of name hive.server2.active.passive.ha.enable does not exist
+---+----+
| id|game|
+---+----+
| 1| wow|
| 2| ow|
+---+----+
第九章 配置以及优化
设置物理核和虚拟核占比 : 当前虚拟机为处理其为2核,那么虚拟化为4核让他比值为1比2
yarn-site.xml
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
修改单个容器下最大cpu资源申请 : 任务提交时,比如spark-submit,executor-core参数不得超过4个
yarn-site.xml
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>4</value>
</property>
设置每个任务容器内存大小和节点内存大小 :控制任务提交每个容器内存的上限,以及yarn所可以占用的内存上限
yarn-site.xml
<!--设置任务容器内存上限-->
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<!--设置每个节点最大可占用内存-->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>7168</value>
</property>
配置容量调度器队列 :容量调度器默认root队列,现在改为spark, hive两个队列,并设置spark队列资源占比为80%,hive为20%
yarn-site.xml
<!--配置容量调度器-->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.queues</name>
<value>spark,hive</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.spark.capacity</name>
<value>80</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.hive.capacity</name>
<value>20</value>
</property>
hive设置队列
hive-site.xml
<!--设置相应队列-->
<property>
<name>mapred.job.queue.name</name>
<value>hive</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>hive</value>
</property>
<property>
<name>mapred.queue.names</name>
<value>hive</value>
</property>
配置垃圾回收站
core-site.xml
<property>
<name>fs.trash.interval</name>
<value>30</value>
</property>
HDFS配置域名访问
hdfs-site.xml
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
配置HADOOP_MAPRED_HOME
mapred-site.xml
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3</value>
</property>