机器配置
1. 修改三台主机名称
vim /etc/hostname
hadoop 1001
hadoop 1002
hadoop 1003
2. 添加其他主机(当前主机使用内外地址!!!)
vim /etc/hosts
192.168.122.232 hadoop1001
192.168.122.232 hadoop1002
192.168.122.232 hadoop1003
3.安装epel-release
yum install -y epel-release
如果报错:
1. cd /etc/yum.repos.d
2. rm -rf ./*
3. wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-vault-8.5.2111.repo
4. yum makecache
4.创建账号
useradd roy
passwd roy 回车之后输入密码
5.配置账号赋予root权限,修改/etc/sudoers文件,在%wheel这行下面添加一行
vim /etc/sudoers
roy ALL=(ALL) NOPASSWD:ALL
6.创建文件目录
mkdir /opt/module
mkdir /opt/software
7.修改目录权限
chown roy:roy /opt/module
chown roy:roy /opt/software
8.卸载自带的jdk
rpm -qa | grep -i java | xargs -n1 rpm -e --nodeps
9.重启机器
reboot
集群搭建
1.安装jdk&hadoop
切换用户!!!
解压:
java:
tar -zxvf jdk-8u212-linux-x64.tar.gz -C /opt/module/
hadoop:
tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/
注: /etc/profile 配置文件会遍历/etc/profile.d/*.sh 结尾的文件,所以我们可以创建一个my_env.sh
作为配置文件的合集
配置环境变量:
1. sudo vim /etc/profile.d/my_env.sh
2.
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
3.刷新环境变量
source /etc/profile
2.运行hadoop
本地模式运行:
1.cd /opt/module/hadoop3.1
2.mkdir wcinput
3.vim word.txt
hadoop yarn
hadoop mapreduce
roy roy
4.hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount wcinput wcoutput
5.查看结果
cat wcoutput/part-r-00000
3.同步配置:
分发配置 同步配置 配置ssh免密登录时 注意账户密码匹配
分发配置
1.拷贝:
在hadoop1002上将hadoop1002的jdk拷贝到hadoop1003 /opt/module
scp -r /opt/module/jdk1.8.0_212 roy@hadoop103:/opt/module
2.复制:
在hadoop1003上将hadoop1002的/opt/module/hadoop-3.1.3 拷贝到hadoop1003/opt/module/
scp -r roy@hadoop102:/opt/module/hadoop-3.1.3 /opt/module/
3.牛逼复制同步:
在hadoop1003上将hadoop1002/opt/module下所有的文件拷贝到hadoop1004/opt/module下
scp -r roy@hadoop102:/opt/module/* roy@hadoop104:/opt/module
同步配置
1.在hadoop002上同步hadoop1002 hadoop-3.1.3/下所有的文件到hadoop1003/opt/module/hadoop-3.1.3上
(只会同步不一样的,所以速度快)
rsync -av hadoop-3.1.3/ roy@hadoop1003:/opt/module/hadoop-3.1.3
2.echo $PATH
会自动读取 /home/roy/bin 目录下的脚本,所有我们可以再此目录下新建脚本
cd /home/roy
mkdir bin
cd bin
vim xsync
...(添加下面的(xsync)shell脚本)
chmod 777 xsync
使用: cd /home/roy
xsync bin/
查看其他机器是否同步:注意用户
3.同步环境变量:
cd /home/roy
sudo ./bin/xsync /etc/profile.d/my_env.sh
4.刷新环境变量:
source /etc/profile
注意:安装rsync命令:
sudo yum install rsync -y
xsync
#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍历集群所有机器
for host in hadoop1002 hadoop1003 hadoop1004
do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@
do
#4. 判断文件是否存在
if [ -e $file ]
then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
4.ssh免密登录配置
普通账户配置:
账户不用 ssh需要多次配置
如果没有.ssh目录 执行 ssh hadoop1003 生成一下
1.切换到目录
cd /home/roy/.ssh
2.生成私钥公钥
ssh-keygen -t rsa
3.将公钥私钥拷贝到指定机器上
ssh-copy-id hadoop1002
ssh-copy-id hadoop1003
ssh-copy-id hadoop1004
root账户配置:
1.切换到目录
cd /root/.sshcd
2.生成私钥公钥
ssh-keygen -t rsa
3.将公钥私钥拷贝到指定机器上
ssh-copy-id hadoop1002
ssh-copy-id hadoop1003
ssh-copy-id hadoop1004
ssh原理:私钥加密 公钥解密
5.集群配置
myhadoop
#!/bin/bash
if [ $# -lt 1 ]
then
echo "No Args Input..."
exit ;
fi
case $1 in
"start")
echo " =================== 启动 hadoop集群 ==================="
echo " --------------- 启动 hdfs ---------------"
ssh hadoop1002 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh"
echo " --------------- 启动 yarn ---------------"
ssh hadoop1003 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh"
echo " --------------- 启动 historyserver ---------------"
ssh hadoop1002 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
;;
"stop")
echo " =================== 关闭 hadoop集群 ==================="
echo " --------------- 关闭 historyserver ---------------"
ssh hadoop1002 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
echo " --------------- 关闭 yarn ---------------"
ssh hadoop1003 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh"
echo " --------------- 关闭 hdfs ---------------"
ssh hadoop1002 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh"
;;
*)
echo "Input Args Error..."
;;
esac
jpsall
#!/bin/bash
for host in hadoop1002 hadoop1003 hadoop1004
do
echo =============== $host ===============
ssh $host jps
done
freem
#!/bin/bash
for host in hadoop1002 hadoop1003 hadoop1004
do
echo =============== $host ===============
ssh $host free -m
done
core-site.xml
<!-- 指定NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1002:8020</value>
</property>
<!-- 指定hadoop数据的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.3/data</value>
</property>
<!-- 配置HDFS网页登录使用的静态用户为atguigu -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>roy</value>
</property>
hdfs-site.xml
<!-- nn web端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop1002:9870</value>
</property>
<!-- 2nn web端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop1004:9868</value>
</property>
yarn-site.xml
<!-- 指定MR走shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定ResourceManager的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1003</value>
</property>
<!-- 环境变量的继承 -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<!-- 开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop1002:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
mapred-site.xml
<!-- 指定MapReduce程序运行在Yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1002:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop1002:19888</value>
</property>
workers
hadoop1002
hadoop1003
hadoop1004
第一次启动格式化集群(每台机器执行一下,会生成data logs目录)
hdfs namenode -format
6.运行wordcount
文件:
vim word.txt
java java
创建目录:
hadoop fs -mkdir /wcinput
添加原始文件:
hadoop fs -put word.txt /wcinput
运行程序:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /wcinput /wcoutput
查看结果:
hadoop fs -cat /wcoutput/part-r-00000
注: 如果是三个不同厂商的服务器 /etc/hosts 配置时当前机器应用内网地址