Hadoop 完全分布模式搭建
1. 模板机准备
- 安装CentOS Stream虚拟机,更换镜像源
- 配置虚拟机IP为192.168.10.100,主机名欸hadoop100
- 虚拟机安装软件
sudo dnf update
sudo dnf install epel-release
sudo dnf install net-tools
sudo dnf install vim
- 关闭系统防火墙
sudo service firewalld disable
- 创建bigdata用户,修改用户密码
sudo useradd bigdata
sudo passwd bigdata
- 配置权限
sudo vim /etc/sudoers
bigdata ALL=(ALL) NOPASSWD:ALL
- 创建工作目录
sudo mkdir /opt/module
sudo mkdir /opt/software
- 修改目录所有者
sudo chown -R bigdata:bigdata /opt/module/
sudo chown -R bigdata:bigdata /opt/software/
- 卸载机器自带的jdk
- 下载jdk和hadoop到/opt/software目录中
wget https://dlcdn.apache.org/hadoop/common/had
oop-3.3.4/hadoop-3.3.4.tar.gz
- 解压文件到/opt/module目录
tar hadoop-3.3.4.tar.gz -C /opt/modul
e/
tar jdk-11.0.17_linux-x64_bin.tar.gz
-C /opt/module/
- 配置环境变量
sudo vim /etc/profile.d/env_profile.sh
#for bash
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk-11.0.17
export PATH= P A T H : PATH: PATH:JAVA_HOME/bin#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.3.4
export PATH= P A T H : PATH: PATH:HADOOP_HOME/bin
export PATH= P A T H : PATH: PATH:HADOOP_HOME/sbin
#for fish
#JAVA_HOME
set -x JAVA_HOME /opt/module/jdk-11.0.17
fish_add_path /opt/module/jdk-11.0.17/bin#HADOOP_HOME
set -x HADOOP_HOME /opt/module/hadoop-3.3.4
fish_add_path /opt/module/hadoop-3.3.4/bin
fish_add_path /opt/module/hadoop-3.3.4/sbin
- 修改虚拟机hosts文件
sudo vim /etc/hosts
192.168.10.100 hadoop100
192.168.10.101 hadoop101
192.168.10.102 hadoop102
192.168.10.103 hadoop103
192.168.10.104 hadoop104
192.168.10.105 hadoop105
192.168.10.106 hadoop106
192.168.10.107 hadoop107
192.168.10.108 hadoop108
- 关闭虚拟机,准备进行复制
2. 克隆虚拟机
- 克隆三台虚拟机,分别为hadoop101,hadoop102,hadoop103
- 修改IP为192.168.10.10*
- 修改虚拟机的hostname为hadoop10*
sudo vim /etc/hostname
- 重启虚拟机
3. 配置Hadoop集群
3.1 编写集群分发脚本
- 查看是否安装远程同步工具rsync
# centos默认安装,无需执行
sudo dnf install rsync
- 创建脚本文件
sudo mkdir /home/bigdata/bin
sudo vim xrsync
#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍历集群所有机器
for host in hadoop101 hadoop102 hadoop103
do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@
do
#4. 判断文件是否存在
if [ -e $file ]
then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host “mkdir -p $pdir”
rsync -av p d i r / pdir/ pdir/fname h o s t : host: host:pdir
else
echo $file does not exists!
fi
done
done
- 修改脚本权限
sudo chmod +x xrsync
- 复制到系统环境
sudo cp xrsync /bin/
3.2 配置SSH免密登录
cd /home/bigdata/.ssh
ssh-keygen -t rsa
ssh-copy-id hadoop101
ssh-copy-id hadoop102
ssh-copy-id hadoop103
同时也要在hadoop102和hadoop103上面进行相同操作
3.3 修改配置文件
hadoop101 | hadoop102 | hadoop103 | |
---|---|---|---|
HDFS | NameNode DataNode | DataNode | SecondaryNameNode DataNode |
YARN | Nodemanager | ResourceManager Nodemanager | NodeManager |
cd /opt/module/hadoop-3.3.4/etc/hadoop
sudo vim core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> fs.defaultFS
hdfs://hadoop101:8020
hadoop.tmp.dir
/opt/module/hadoop-3.4.3/data
hadoop.http.staticuser.user
bigdata
sudo vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> dfs.namenode.http-address
hadoop101:9870
dfs.namenode.secondary.http-address
hadoop103:9868
sudo vim yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.hostname
hadoop102
yarn.nodemanager.env-whitelist
JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
sudo vim mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> mapreduce.framework.name
yarn
sudo vim workers
hadoop101
hadoop102
hadoop103
分发配置文件
xrsync /opt/module/hadoop-3.3.4/etc/hadoop
3.4 启动集群
hdfs namenode -format # 第一次启动初始化
sbin/start-dfs.sh # hadoop101
sbin/start-yarn.sh # hadoop102
访问网站,查看图形化界面