操作系统 | IP地址 | 主机名 | 主机名 | 主机名 |
---|---|---|---|---|
CentOS 7.2.1511 | 192.168.42.131 | node01 | jdk1.8.0_91 | scala-2.12.13 |
CentOS 7.2.1511 | 192.168.42.132 | node02 | jdk1.8.0_91 | scala-2.12.13 |
CentOS 7.2.1511 | 192.168.42.139 | node03 | jdk1.8.0_91 | scala-2.12.13 |
安装的组件 | 版本 |
---|---|
zookeeper | 3.4.6 |
hadoop | 3.2.2 |
hbase | 2.0.6 |
hive | 3.1.2 |
kafka | 2.11-2.0.0 |
solr | 8.9.0 |
atlas | 2.1.0 |
spark | 3.0.3 |
sqoop | 1.4.6 |
flume | 1.9.0 |
elasticsearch | 7.14.1 |
kibana | 7.14.1 |
注:本文所需所有安装包请于我的资源下载(附带本文的PDF版文档):大数据各组件安装(数据中台搭建)所需安装包
一、基础环境配置(三台机器都操作)
1.修改主机名:
hostnamectl set-hostname node01
hostnamectl set-hostname node02
hostnamectl set-hostname node03
vim /etc/hosts
# 增加对应的内容:
192.168.42.131 node01
192.168.42.132 node02
192.168.42.139 node03
2.关闭防火墙:
# Centos 7:
systemctl stop firewalld.service
# 设置开机不启动:
systemctl disable firewalld.service
# 查看防火墙状态
firewall-cmd --state
# Centos 6:
/etc/init.d/iptables stop
# 设置开机不启动:
chkconfig iptables off
# 查看防火墙状态
service iptables status
3.关闭Selinux:
# 查看SELinux状态:如果SELinux status参数为enabled即为开启状态
sestatus
# 临时关闭,不用重启机器
setenforce 0
# 永久关闭
vim /etc/selinux/config
# 修改如下配置项:
SELINUX=disabled
4.文件描述符配置:
Linux操作系统会对每个进程能打开的文件数进行限制(某用户下某进程),Hadoop生态系统的很多组件一般都会打开大量的文件,因此要调大相关参数(生产环境必须调大,学习环境稍微大点就可以)。检查文件描述符限制数,可以用如下命令检查当前用户下一个进程能打开的文件数:
ulimit -Sn
1024
ulimit -Hn
4096
建议的最大打开文件描述符数为10000或更多:
vim /etc/security/limits.conf
# 直接添加内容:
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
5.关闭 THP:
如果不关闭THP,Hadoop的系统CPU使用率很高。
# 查看:
[root@node01 mnt]# cat /sys/kernel/mm/transparent_hugepage/defrag
[always] madvise never
[root@node01 mnt]# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
# 关闭:
vim /etc/rc.d/rc.local
# 在文末加入一段代码:
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
# 保存退出,然后赋予rc.local文件执行权限:
chmod +x /etc/rc.d/rc.local
三台机器重启生效:reboot
[root@node01 ~]# getenforce
Disabled
[root@node01 ~]# ulimit -Sn
65536
[root@node01 ~]# ulimit -Hn
65536
[root@node01 ~]# cat /sys/kernel/mm/transparent_hugepage/defrag
always madvise [never]
[root@node01 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
6.自定义 JDK 安装:
6.1 删除默认openJDK:
删除所有机器上默认openJDK,注意这里是只写了一个。一些开发版的centos会自带jdk,我们一般用自己的jdk,把自带的删除。先看看有没有安装:
[root@node01 ~]# java -version
openjdk version "1.8.0_65"
OpenJDK Runtime Environment (build 1.8.0_65-b17)
OpenJDK 64-Bit Server VM (build 25.65-b01, mixed mode)
查找安装位置:
[root@node01 ~]# rpm -qa | grep java
java-1.8.0-openjdk-headless-1.8.0.65-3.b17.el7.x86_64
javapackages-tools-3.4.1-11.el7.noarch
java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64
tzdata-java-2015g-1.el7.noarch
java-1.7.0-openjdk-1.7.0.91-2.6.2.3.el7.x86_64
java-1.7.0-openjdk-headless-1.7.0.91-2.6.2.3.el7.x86_64
python-javapackages-3.4.1-11.el7.noarch
删除全部,noarch文件可以不用删除:
[root@node01 ~]# rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.65-3.b17.el7.x86_64
[root@node01 ~]# rpm -e --nodeps java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64
[root@node01 ~]# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.91-2.6.2.3.el7.x86_64
[root@node01 ~]# rpm -e --nodeps java-1.7.0-openjdk-headless-1.7.0.91-2.6.2.3.el7.x86_64
# 检查有没有删除:
[root@node01 ~]# java -version
-bash: /usr/bin/java: No such file or directory
6.2 安装jdk1.8.0_91:
mkdir /opt/java
tar -zxvf jdk-8u91-linux-x64.tar.gz -C /opt/java/
vim /etc/profile
export JAVA_HOME=/opt/java/jdk1.8.0_91
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
source /etc/profile
[root@node01 ~]# java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
7.创建Hadoop用户:
[root@node01 ~]# adduser hadoop
[root@node01 ~]# passwd hadoop(密码123456)
Changing password for user hadoop.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
8.配置SSH免密登录:
# 三台都做
[hadoop@node01 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:omdYd47S9iPHrABsycOTFaJc3xowcTOYokKtf0bAKZg root@node03
The key's randomart image is:
+---[RSA 2048]----+
|..o B+= |
|E+.Bo* = |
|..=.. + . |
|o. + = o |
|. . % + S . |
| o X + + |
| = = +o. |
| o +..= |
| .+.. |
+----[SHA256]-----+
# 之后你会发现,在/root/.ssh目录下生成了公钥文件:
[hadoop@node01 ~]# ll /root/.ssh
total 8
-rw-------. 1 root root 1679 Mar 24 06:47 id_rsa
-rw-r--r--. 1 root root 393 Mar 24 06:47 id_rsa.pub
ssh-copy-id node01
ssh-copy-id node02
ssh-copy-id node03
# 检查免密登录是否设置成功:
ssh node01
ssh node02
ssh node03
二、大数据组件安装
1.安装Zookeeper:
[hadoop@node01 ~]$ tar -zxvf /mnt/zookeeper-3.4.6.tar.gz -C .
将zookeeper-3.4.6/conf目录下面的 zoo_sample.cfg修改为zoo.cfg:
[hadoop@node01 ~]$ cd zookeeper-3.4.6/conf
[hadoop@node01 conf]$ mv zoo_sample.cfg zoo.cfg
[hadoop@node01 conf]$ vim zoo.cfg
# 添加:
dataDir=/home/hadoop/zookeeper-3.4.6/data
dataLogDir=/home/hadoop/zookeeper-3.4.6/log
server.1=192.168.42.131:2888:3888
server.2=192.168.42.132:2888:3888
server.3=192.168.42.133:2888:3888
# 注:2888端口号是zookeeper服务之间通信的端口,而3888是zookeeper与其他应用程序通信的端口
注意:原配置文件中已经有:
tickTime=2000
initLimit=10
syncLimit=5
clientPort=2181
创建目录:
[hadoop@node01 conf]$ cd ..
[hadoop@node01 zookeeper-3.4.6]$ mkdir -pv data log
拷贝给所有节点:
scp -r /home/hadoop/zookeeper-3.4.6 node02:/home/hadoop
scp -r /home/hadoop/zookeeper-3.4.6 node03:/home/hadoop
在节点1上设置myid为1,节点2上设置myid为2,节点3上设置myid为3:
[hadoop@node01 zookeeper-3.4.6]$ vim /home/hadoop/zookeeper-3.4.6/data/myid
1
[hadoop@node02 ~]$ vim /home/hadoop/zookeeper-3.4.6/data/myid
2
[hadoop@node03 ~]$ vim /home/hadoop/zookeeper-3.4.6/data/myid
3
所有节点都切换到root用户,/var 目录有其他用户写权限:
[root@node01 ~]# chmod 777 /var
[root@node02 ~]# chmod 777 /var
[root@node03 ~]# chmod 777 /var
启动:
[hadoop@node01 ~]$ cd zookeeper-3.4.6/bin/
[hadoop@node01 bin]$ ./zkServer.sh start
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@node02 bin]$ ./zkServer.sh start
[hadoop@node03 bin]$ ./zkServer.sh start
分别在3个节点上查看状态:
[hadoop@node01 bin]$ ./zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
[hadoop@node02 bin]$ ./zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
[hadoop@node03 bin]$ ./zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
[hadoop@node01 bin]$ jps
19219 QuorumPeerMain
19913 Jps
[hadoop@node02 bin]$ jps
20016 Jps
19292 QuorumPeerMain
[hadoop@node03 bin]$ jps
19195 QuorumPeerMain
19900 Jpsnode01
注:你会发现node01和node03是follower,node02是leader,你会有所疑问node03的myid不是最大他为什么不是leader呢?
zookeeper是两个两个比较,当node01和node02比较时node02的myid大就将node02选举为leader了,就不和后面的node03做比较了。。。
注意:在Centos和RedHat中一定要将防火墙和selinux关闭,否则在查看状态的时候会报这个错:
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
2.安装Hadoop:
[hadoop@node01 ~]$ tar -zxvf /mnt/hadoop-3.2.2.tar.gz -C .
配置核心组件文件(core-site.xml):
[hadoop@node01 ~]$ vim hadoop-3.2.2/etc/hadoop/core-site.xml
#修改 etc/hadoop/core-site.xml 文件,增加以下配置(一开始的配置文件没有内容的!!!)
<configuration>
<!-- 指定HDFS老大(namenode)的通信地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://node01:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-3.2.2/tmp</value>
</property>
</configuration>
配置文件系统(hdfs-site.xml):
[hadoop@node01 ~]$ vim hadoop-3.2.2/etc/hadoop/hdfs-site.xml
#修改 etc/hadoop/hdfs-site.xml 文件,增加以下配置
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/hadoop-3.2.2/hdfs/name</value>
<description>namenode上存储hdfs名字空间元数据 </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hadoop-3.2.2/hdfs/data</value>
<description>datanode上数据块的物理存储位置</description>
</property>
<!-- 设置hdfs副本数量 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
配置MapReduce计算框架文件:
[hadoop@node01 ~]$ vim hadoop-3.2.2/etc/hadoop/mapred-site.xml
#增加以下配置
<configuration>
<!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--
可以设置AM【AppMaster】端的环境变量
如果上面缺少配置,可能会造成mapreduce失败
-->
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${
HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${
HADOOP_HOME}</value>
</property>
</configuration>
配置yarn-site.xml文件:
[hadoop@node01 ~]$ vim hadoop-3.2.2/etc/hadoop/yarn-site.xml
#增加以下配置
<configuration>
<!--集群master,-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node01</value>
</property>
<!-- reducer取数据的方式是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 关闭内存检测,虚拟机需要,不配会报错-->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
配置Hadoop环境变量:
[hadoop@node01 ~]$ vim hadoop-3.2.2/etc/hadoop/hadoop-env.sh
# 添加如下内容:
export JAVA_HOME=/opt/java/jdk1.8.0_91
【选】配置workers文件(hadoop3.x修改workers):
vim etc/hadoop/workers
node01
node02
node03
【选】配置slaves文件(hadoop2.x修改slaves):
vim etc/hadoop/slaves
node02
node03
复制node01上的Hadoop到node02和node03节点上:
[hadoop@node01 ~]$ scp -r hadoop-3.2.2/ node02:/home/hadoop/
[hadoop@node01 ~]$ scp -r hadoop-3.2.2/ node03:/home/hadoop/
配置操作系统环境变量(需要在所有节点上进行,且使用一般用户权限):
[hadoop@node01 ~]$ vim ~/.bash_profile
#以下是新添加入代码
export JAVA_HOME=/opt/java/jdk1.8.0_91
export PATH=$JAVA_HOME/bin:$PATH
#hadoop
export HADOOP_HOME=/home/hadoop/hadoop-3.2.2
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
[hadoop@node02 ~]$ vim ~/.bash_profile
[hadoop@node03 ~]$ vim ~/.bash_profile
# 使配置文件生效(三台都执行):
source ~/.bash_profile
格式化文件系统(主节点进行):
[hadoop@node01 ~]$ cd hadoop-3.2.2/
[hadoop@node01 hadoop-3.2.2]$ ./bin/hdfs namenode -format
启动Hadoop:
[hadoop@node01 hadoop-3.2.2]$ ./sbin/start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [node01]
Starting datanodes
node03: WARNING: /home/hadoop/hadoop-3.2.2/logs does not exist. Creating.
node02: WARNING: /home/hadoop/hadoop-3.2.2/logs does not exist. Creating.
Starting secondary namenodes [node01]
Starting resourcemanager
Starting nodemanagers
启动成功结果:
[hadoop@node01 hadoop-3.2.2]$ jps
24625 NodeManager
19219 QuorumPeerMain
23766 DataNode
25095 Jps
23547 NameNode
24092 SecondaryNameNode
24476 ResourceManager
[hadoop@node02 ~]$ jps
22196 DataNode
22394 NodeManager
22555 Jps
19292 QuorumPeerMain
[hadoop&#