Hadoop环境搭建
一.安装包下载
- 可以在浏览器输入地址下载:https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
- 可以在Linux服务器上使用命令下载:
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
二.创建Linux用户
- 创建非root用户【模拟在实际工作中,无root用户权限的时候】
- 如果有非root用户,可跳过此步骤
[root@maggie ~]# useradd hadoop
[root@maggie ~]# cd /home/hadoop/
[root@maggie hadoop]# ll -a
total 20
drwx------ 2 hadoop hadoop 4096 Nov 1 00:00 .
drwxr-xr-x. 4 root root 4096 Nov 1 00:00 ..
-rw-r--r-- 1 hadoop hadoop 18 Aug 8 2019 .bash_logout
-rw-r--r-- 1 hadoop hadoop 193 Aug 8 2019 .bash_profile
-rw-r--r-- 1 hadoop hadoop 231 Aug 8 2019 .bashrc
[root@maggie hadoop]# mkdir software app log data lib tmp source
[root@maggie hadoop]# ll -a
total 48
drwx------ 9 hadoop hadoop 4096 Nov 1 00:01 .
drwxr-xr-x. 4 root root 4096 Nov 1 00:00 ..
drwxr-xr-x 2 root root 4096 Nov 1 00:01 app 【解压后的软件包位置】
-rw-r--r-- 1 hadoop hadoop 18 Aug 8 2019 .bash_logout
-rw-r--r-- 1 hadoop hadoop 193 Aug 8 2019 .bash_profile
-rw-r--r-- 1 hadoop hadoop 231 Aug 8 2019 .bashrc
drwxr-xr-x 2 root root 4096 Nov 1 00:01 data
drwxr-xr-x 2 root root 4096 Nov 1 00:01 lib
drwxr-xr-x 2 root root 4096 Nov 1 00:01 log 【日志文件】
drwxr-xr-x 2 root root 4096 Nov 1 00:01 software【上传的压缩包位置】
drwxr-xr-x 2 root root 4096 Nov 1 00:01 source 【源码】
drwxr-xr-x 2 root root 4096 Nov 1 00:01 tmp
[root@maggie ~]# chown -R hadoop:hadoop /home/hadoop/*
[root@maggie ~]# mv /root/software/hadoop-3.3.4.tar.gz /home/hadoop/software/
[root@maggie ~]# chown -R hadoop:hadoop /home/hadoop/software/
[root@maggie ~]# cd /home/hadoop/software/
[root@maggie software]# chown -R hadoop:hadoop /home/hadoop/software/
[root@maggie software]# ll
total 679164
-rw-r--r-- 1 hadoop hadoop 695457782 Oct 31 11:39 hadoop-3.3.4.tar.gz
三.解压hadoop软件
- 解压hadoop软件:tar -xzvf hadoop-3.3.4.tar.gz -C …/app/
- 设置软连接: ln -s hadoop-3.3.4 hadoop
[hadoop@maggie software]$ tar -xzvf hadoop-3.3.4.tar.gz -C ../app/
......(解压过程省略)
[hadoop@maggie app]$ ln -s hadoop-3.3.4 hadoop 【设置软连接】
[hadoop@maggie app]$ cd hadoop
[hadoop@maggie hadoop]$ ll
total 116
drwxr-xr-x 2 hadoop hadoop 4096 Jul 29 21:44 bin 【脚本,运行命令】
drwxr-xr-x 3 hadoop hadoop 4096 Jul 29 20:35 etc 【配置文件,重要的有hadopp-env.sh; core-site.yml; hdfs-site.yml; yarn-site.yml;等相关文件】
drwxr-xr-x 2 hadoop hadoop 4096 Jul 29 21:44 include
drwxr-xr-x 3 hadoop hadoop 4096 Jul 29 21:44 lib
drwxr-xr-x 4 hadoop hadoop 4096 Jul 29 21:44 libexec
-rw-rw-r-- 1 hadoop hadoop 24707 Jul 29 04:30 LICENSE-binary
drwxr-xr-x 2 hadoop hadoop 4096 Jul 29 21:44 licenses-binary
-rw-rw-r-- 1 hadoop hadoop 15217 Jul 17 02:20 LICENSE.txt
-rw-rw-r-- 1 hadoop hadoop 29473 Jul 17 02:20 NOTICE-binary
-rw-rw-r-- 1 hadoop hadoop 1541 Apr 22 2022 NOTICE.txt
-rw-rw-r-- 1 hadoop hadoop 175 Apr 22 2022 README.txt
drwxr-xr-x 3 hadoop hadoop 4096 Jul 29 20:35 sbin 【脚本,启动,停止; start/stop-dfs.sh; start/stop-all.sh; start/stop-yarn.sh】
drwxr-xr-x 4 hadoop hadoop 4096 Jul 29 22:21 share 【官方提供的jar包,测试使用的案例】
在$HADOOP_HOME/sbin 目录下, 有如下启动和停止的运行命令:
# 启动 hdfs 、启动 yarn
start-dfs.sh
start-yarn.sh
start-all.sh
# 关闭
stop-alll.sh
stop-dfs.sh
stop-yarn.sh
四.配置JDK
根据官网步骤(Single Node Setup)去搭建环境:
- https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
- 部署的依赖包:JDK(略)
相关JDK版本,推荐查看官网配置:https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions
- 根据官网推荐,即便Linux中配置了JDK配置,仍然需要配置JDK地址:
- vi etc/hadoop/hadoop-env.sh 添加 export JAVA_HOME=/root/app/jdk1.8.0_333【JDK环境变量】
[root@maggie java]# echo $JAVA_HOME
/usr/java/jdk1.8.0_333
[hadoop@maggie ~]$ pwd
/home/hadoop
[hadoop@maggie ~]$ cd app/hadoop/etc/hadoop/
[hadoop@maggie hadoop]$ vi hadoop-env.sh
添加如下配置:
export JAVA_HOME=/usr/java/jdk1.8.0_333
五.设置公钥和私钥(ssh免密登录)
前提,需要在/etc/hosts下面添加机器集群【一起购买的云主机】:
[xiaofeng@maggie303 ~]$ vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
(有些内网地址是:192.168.x.x,通过ifconfig 查看eth0 下面的 inet中的局域网地址)
10.0.X.X maggie101
10.0.X.X maggie102
- ssh-keygen 【在家目录下,生成公钥和私钥】
- cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 【生成一个新的文件】
- chmod 0600 ~/.ssh/authorized_keys 【给新文件赋权】
如果还出现了ssh需要密码的问题,解决方案: 配置ssh免密登录后,依然需要输入密码登录&解决方案
[hadoop@maggie101 hadoop]$ ssh-keygen 【后面敲入3次回车】
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):