1. 主机准备
1.1 主机规划
主机 | IP | HostName | CPU | MEMERY | USER | PWD |
---|---|---|---|---|---|---|
hadoop181 | 192.168.207.181 | hadoop181 | 4 CORE | 8G | hadoop | hadoop |
hadoop182 | 192.168.207.182 | hadoop182 | 4 CORE | 4G | hadoop | hadoop |
hadoop183 | 192.168.207.183 | hadoop183 | 4 CORE | 4G | hadoop | hadoop |
1.2 主机初始化
(1) 克隆三台虚拟机
(2) 创建用户密码(三台都要)
groupadd hadoop
useradd -s /bin/bash -d /home/hadoop -g hadoop hadoop
passwd hadoop # 设置账户密码都为Hadoop
(4) 关闭防火墙 (三台都要)
systemctl disable firewalld
systemctl stop firewalld
(5)关闭 seLinux(三台都要)
# 关闭防火墙, 设置 SELINUX=disabled 即可
vim /etc/selinux/config
SELINUX=disabled
(6)修改ip地址(三台都要)
vim /etc/sysconfig/network-scripts/ifcfg-ens33
(7) 修改hostname (三台都要)
hostnamectl set-hostname hadoop181 # 第一台
hostnamectl set-hostname hadoop182 # 第二台
hostnamectl set-hostname hadoop183 # 第三台
(8) 配置sudo 提权(三台都要)
## Next comes the main part: which users can run what software on
## which machines (the sudoers file can be shared between multiple
## systems).
## Syntax:
##
## user MACHINE=COMMANDS
##
## The COMMANDS section may have other options added to it.
##
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
hadoop ALL=(ALL) NOPASSWD:ALL #增加这一行
2. 安装前主机准备
2. 1. 集群命令操作脚本准备
-
集群ssh操作执行命令脚本
参见我另一份博客 ssh集群批量操作脚本 -
集群文件分发脚本
参见我另一份博客 集群间文件分发
2. 2. 修改hosts文件,追加host配置
[root@hadoop181 ~]# xssh cat /etc/hosts
[DEBUG] 1 command is :cat /etc/hosts
[DEBUG] ssh to hadoop181 to execute commands [ cat /etc/hosts]
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.207.181 hadoop181
192.168.207.182 hadoop182
192.168.207.183 hadoop183
[DEBUG] ssh to hadoop182 to execute commands [ cat /etc/hosts]
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.207.181 hadoop181
192.168.207.182 hadoop182
192.168.207.183 hadoop183
[DEBUG] ssh to hadoop183 to execute commands [ cat /etc/hosts]
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.207.181 hadoop181
192.168.207.182 hadoop182
192.168.207.183 hadoop183
[root@hadoop181 ~]#
2. 3. 配置hadoop用户的集群间免密登录
源主机 | 目标1 | 目标2 | 目标3 |
---|---|---|---|
hadoop181 | hadoop181 | hadoop182 | hadoop183 |
hadoop182 | hadoop181 | hadoop182 | hadoop183 |
hadoop183 | hadoop181 | hadoop182 | hadoop183 |
# 密钥生成
[hadoop@hadoop181 ~]$ ssh-keygen -t rsa
[hadoop@hadoop182 ~]$ ssh-keygen -t rsa
[hadoop@hadoop183 ~]$ ssh-keygen -t rsa
# 分发hadoop181 到 其他机器的 密钥命令
[hadoop@hadoop181 ~]$ ssh-copy-id hadoop@hadoop181
[hadoop@hadoop181 ~]$ ssh-copy-id hadoop@hadoop182
[hadoop@hadoop181 ~]$ ssh-copy-id hadoop@hadoop183
# 分发hadoop182 到 其他机器的 密钥命令
[hadoop@hadoop182 ~]$ ssh-copy-id hadoop@hadoop181
[hadoop@hadoop182 ~]$ ssh-copy-id hadoop@hadoop182
[hadoop@hadoop182 ~]$ ssh-copy-id hadoop@hadoop183
# 分发hadoop183 到 其他机器的 密钥命令
[hadoop@hadoop183 ~]$ ssh-copy-id hadoop@hadoop181
[hadoop@hadoop183 ~]$ ssh-copy-id hadoop@hadoop182
[hadoop@hadoop183 ~]$ ssh-copy-id hadoop@hadoop183
2.4 集群时间同步(略)
3.HADOOP 集群搭建
3.1 服务规划
服务 | hadoop181 | hadoop182 | hadoop183 |
---|---|---|---|
Name Node | √ | ||
DataNode | √ | √ | √ |
ResourceManager | √ | ||
NodeManager | √ | √ | √ |
HistoryServer | √ | ||
Zookeeper | √ | √ | √ |
Secondary NameNode | √ |
3.2 安装包准备
(1) hadoop 包下载地址 (hadoop 官网)
(2) 将下载的包 上传到 hadoop181 /home/hadoop/ 路径下
(本次安装使用3.*版本)
3.3 JDK安装
(1) 解压 jdk
## 解压
[hadoop@hadoop181 ~]$ tar -zxvf jdk-8u144-linux-x64.tar.gz
## 进入到解压后的目录
[hadoop@hadoop181 ~]$ cd jdk1.8.0_144
## 拿到目录地址
[hadoop@hadoop181 jdk1.8.0_144]$ pwd
/home/hadoop/jdk1.8.0_144
(2) 配置环境变量
[hadoop@hadoop181 ~]$ vim .bashrc
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=
# User specific aliases and functions
# JAVA HOME
export JAVA_HOME=/home/hadoop/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
(3) 使配置环境变量生效
[hadoop@hadoop181 ~]$ source .bashrc
[hadoop@hadoop181 ~]$
# 测试java环境是否生效
[hadoop@hadoop181 ~]$ java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
[hadoop@hadoop181 ~]$
[hadoop@hadoop181 ~]$
[hadoop@hadoop181 ~]$
(4) jdk 与环境变量分发
[hadoop@hadoop181 ~]$ xsync jdk1.8.0_144
[hadoop@hadoop181 ~]$ xsync .bashrc
(5) 测试
[hadoop@hadoop181 ~]$ xssh java -version
[DEBUG] 1 command is :java -version
[DEBUG] 1 command is :java -version
[DEBUG] ssh to hadoop181 to execute commands [ java -version]
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
[DEBUG] ssh to hadoop182 to execute commands [ java -version]
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
[DEBUG] ssh to hadoop183 to execute commands [ java -version]
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
[hadoop@hadoop181 ~]$
3.4 集群配置
3.4.0 安装包处理
(1) 解压hadoop安装包
[hadoop@hadoop181 ~]$ tar -zxvf hadoop-3.1.3.tar.gz
(2)配置环境变量
# 拿到解压目录
[hadoop@hadoop181 hadoop-3.1.3]$ pwd
/home/hadoop/hadoop-3.1.3
# 配置环境变量
[hadoop@hadoop181 ~]$ vim ~/.bashrc
export HADOOP_HOME
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=
# User specific aliases and functions
# JAVA HOME
export JAVA_HOME=/home/hadoop/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
# HADOOP HOME
export HADOOP_HOME=/home/hadoop/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/sbin
export PATH=$PATH:$HADOOP_HOME/bin
(3)使配置生效
[hadoop@hadoop181 ~]$ source .bashrc
3.4.1 核心配置文件
(1) 修改 vim hadoop-env.sh 文件
[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh
配置JAVA_HOME 路径
export JAVA_HOME=/home/hadoop/jdk1.8.0_144
(2)修改core-site.xml 文件
[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/etc/hadoop/core-site.xml
增加如下内容
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop181:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-3.1.3/data/tmp</value>
</property>
</configuration>
3.4.2 HDFS 配置文件
(1) 配置hdfs-site.xml 文件
[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml
配置如下内容
<!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配置 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop183:50090</value>
</property>
3.4.3 YARN 配置文件
(1) 修改 vim hadoop-env.sh 文件
[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/etc/hadoop/yarn-env.sh
配置JAVA_HOME 路径
export JAVA_HOME=/home/hadoop/jdk1.8.0_144
(2) 修改 yarn-site.xml 文件
[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
增加如下内容
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop182</value>
</property>
3.4.4 MapReduce 配置文件
(1) 修改 mapred-env.sh文件
[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/etc/hadoop/mapred-env.sh
增加JAVA_HOME的配置
export JAVA_HOME=/home/hadoop/jdk1.8.0_144
(2) 修改mapred-site.xml文件
[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
增加如下配置
<!-- 指定MR运行在YARN上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
3.4.5 日志服务器配置
(1) 历史服务器配置
[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
增加如下内容
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop181:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop181:19888</value>
</property>
(2) 日志聚集
[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
增加如下内容
<!-- 日志聚集功能使能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志保留时间设置7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
3.4.6 文件分发
[hadoop@hadoop181 ~]$ xsync .bashrc
[hadoop@hadoop181 ~]$ xsync hadoop-3.1.3
3.5 格式化集群
# 在NameNode 节点上执行格式化命令
[hadoop@hadoop181 ~]$ hdfs namenode -format
3.6 手动启动服务
(1) 启动NameNode
方式1
hadoop-daemon.sh start namenode
方式2
hdfs --daemon start
(2) 启动DataNode
方式1
hadoop-daemon.sh start datanode
方式2
hdfs --daemon start
(3) 检查是否启动完成
xssh jps -l
(4)启动ResourceManager
方式1
yarn-daemon.sh start resourcemanager
方式2
yarn --daemon start
(5)启动NodeManager
方式1
yarn-daemon.sh start nodemanager
方式2
yarn --daemon start
(6)启动历史服务器
(7)启动secondary NameNode
方式1
mr-jobhistory-daemon.sh start historyserver
方式2
mapred --daemon start
3.7 批量启动集群
(1) 增加workers配置
[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/etc/hadoop/workers
增加内容如下内容(需要分发到所有机器)
hadoop181
hadoop182
hadoop183
(1)批量启动hdfs
start-dfs.sh
stop-dfs.sh
(2)批量起停yarn
start-yarn.sh
stop-yarn.sh
(3)全部起,全部停
start-all.sh
stop-all.sh
3.8 查看集群
(1) hdfs 浏览器页面查看
访问地址 http://hadoop181:9870/
(2) secondaryNameNode状态查看
http://hadoop183:50090/status.html
NOTE:
这个SecondaryNameNode 状态可能不会显示,这个时候修改 dfs-dust.js 文件即可解决
[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/share/hadoop/hdfs/webapps/static/dfs-dust.js
[hadoop@hadoop181 ~]$ xsync $HADOOP_HOME/share/hadoop/hdfs/webapps/static/dfs-dust.js
修改内容如下, 注释moment的行,增加一个新行返回Date的格式即可