配置系统环境
修改/etc/hosts文件
192.168.36.128 master
192.168.36.129 worker1
192.168.36.130 worker2
192.168.36.128,192.168.36.129,192.168.36.130都是Ubuntu20 Server LTS,都是虚拟机。
除了需要在hosts文件中修改之外,每台机器需要用以下命令修改主机名,否则,在使用192.168.36.128:50070查看hadoop集群时会发现,overview面板中的live nodes数量为2(两个worker),datanodes面板中的node只有1个,原因可能是因为两个worker的主机名重复
hostnamectl set-hostname master
然后需要reboot重启。
安装ssh并实现无密码登录
确认在每台机器上都安装ssh,确认后对ssh进行配置。
修改ssh的配置文件如下,配置文件的位置是/etc/ssh/sshd_config
Port 22
PermitRootLogin yes
PubkeyAuthentication yes
PasswordAuthentication yes
UsePAM yes
配置修改完成后重启sshd服务
systemctl restart sshd
在master上生成ssh秘钥,公钥,并将公钥拷贝到每个worker节点,这里是worker1和worker2(hadoop4…)
ssh-keygen -t rsa //注:在三次需要填密码的地方直接Enter
scp ~/.ssh/id_rsa.pub root@worker1:~/.ssh/
注意:如果某个节点的~/.ssh/不存在,可以自己用ssh命令连接一下自己,那么该文件夹就会自动创建
在每个节点里面操作,生成authorized_keys
cat ~/.ssh/id_rsa.pub>>~/.ssh/authorized_keys
测试能不能使用ssh连自己
[root@localhost sbin]# ssh master
注意:如果出现下面的情况.
ECDSA host key for hadoop1 has changed and you have requested strict checking.
Host key verification failed.
则使用下面命令,原来的known_hosts文件会被保存为known_hosts.old,新的known_hosts会生成
[root@localhost sbin]# ssh-keygen -R master
# Host hadoop1 found: line 2
/root/.ssh/known_hosts updated.
Original contents retained as /root/.ssh/known_hosts.old
此时再连接,应该就可以成功了,对于hadoop2也做同样的ssh配置操作。
安装大数据组件
获取大数据组件
链接:https://pan.baidu.com/s/1oVtPeiKUnAfrcofHO5VfoQ
提取码:atbk
压缩包中包含:hadoop(3.3.4),jdk(1.8),scala(2.12.17),zookeeper(3.6.1),template文件夹,setup.sh
使用简易部署脚本setup.sh进行配置
在每台机器上执行简易部署脚本完成各个组件的配置
#!/bin/bash
function log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') ""$@"
}
CRTDIR=$(pwd)
JDK_DIR="$CRTDIR""/jdk"
HADOOP_DIR="$CRTDIR""/hadoop"
SCALA_DIR="$CRTDIR""/scala"
SPARK_DIR="$CRTDIR""/spark"
TEMPLATE_CONF_DIR="$CRTDIR""/template/"
ZK_DIR="$CRTDIR""/zookeeper"
MASTER="master"
WORKERS="worker1,worker2"
#"/etc/profile"
ENV="/etc/profile"
log "checking components jdk,hadoop,scala,spark,zookeeper..."
if [ ! -d $JDK_DIR ]; then
log "jdk is not found"
exit -1
fi
if [ ! -d $HADOOP_DIR ]; then
log "hadoop is not found"
exit -1
fi
if [ ! -d $SCALA_DIR ]; then
log "scala is not found"
exit -1
fi
if [ ! -d $SPARK_DIR ]; then
log "spark is not found"
exit -1
fi
if [ ! -d $ZK_DIR ]; then
log "zookeeper is not found"
exit -1
fi
log "Installing JDK..."
if [ -z $JAVA_HOME ]; then
log "JAVA_HOME is not found,using the JDK in the package to install..."
log "Configuring JDK's environment..."
echo "export JAVA_HOME=$JDK_DIR" >>$ENV
echo 'export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib' >>$ENV
echo 'export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin' >>$ENV
else
log "JDK already existed"
fi
log "The JDK's environment was configured successfully"
source $ENV
java -version
#----------------------------------------------Hadoop start------------------------------------------------
log "Installing Hadoop..."
log "Configuring core-site.xml..."
FILE_NAME="core-site.xml"
CORE_SITE_TMP="$TEMPLATE_CONF_DIR""$FILE_NAME"
CORE_SITE_TEMPLATE="$CORE_SITE_TMP"".template"
cp $CORE_SITE_TEMPLATE $CORE_SITE_TMP
sed -i "s|MASTER|$MASTER|g" $CORE_SITE_TMP
sed -i "s|CRTDIR|$CRTDIR|g" $CORE_SITE_TMP
CORE_SITE="$HADOOP_DIR""/etc/hadoop/""$FILE_NAME"
CORE_SITE_BAK="$CORE_SITE"".bak"
rm -rf $CORE_SITE_BAK
mv $CORE_SITE $CORE_SITE_BAK
cp $CORE_SITE_TMP $CORE_SITE
log "Configuring hdfs-site.xml..."
FILE_NAME="hdfs-site.xml"
HDFS_SITE_TMP="$TEMPLATE_CONF_DIR""$FILE_NAME"
HDFS_SITE_TEMPLATE="$HDFS_SITE_TMP"".template"
cp $HDFS_SITE_TEMPLATE $HDFS_SITE_TMP
sed -i "s|MASTER|$MASTER|g" $HDFS_SITE_TMP
sed -i "s|CRTDIR|$CRTDIR|g" $HDFS_SITE_TMP
HDFS_SITE="$HADOOP_DIR""/etc/hadoop/""$FILE_NAME"
HDFS_SITE_BAK="$HDFS_SITE"".bak"
rm -rf $HDFS_SITE_BAK
mv $HDFS_SITE $HDFS_SITE_BAK
cp $HDFS_SITE_TMP $HDFS_SITE
log "Configuring mapred-site.xml..."
FILE_NAME="mapred-site.xml"
MAPRED_SI