Hadoop集群环境安装
准备工作
准备三台机器:
- 192.168.100.133 hadoop1(
master
) - 192.168.100.134 hadoop2
- 192.168.100.135 hadoop3
三个前提:
-
首先保证三个虚拟机上的jdk环境都已安装,并配置了
JAVA_HOME
。安装
JDK
,安装方式如下或者其他:[root@localhost ~]# yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
通过
yum
安装JDK
,是不会自动配置JAVA_HOME
环境变量的,需要手动配置JAVA_HOME
,方式如下:[root@dc6-80-283 ~]# which java /usr/bin/java [root@dc6-80-283 ~]# ll /usr/bin/java lrwxrwxrwx. 1 root root 22 Jun 1 11:33 /usr/bin/java -> /etc/alternatives/java [root@dc6-80-283 ~]# ll /etc/alternatives/java lrwxrwxrwx. 1 root root 74 Jun 1 11:33 /etc/alternatives/java -> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64/jre/bin/java [root@dc6-80-283 ~]# ll /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64 total 184 -rw-r--r--. 1 root root 1522 May 10 22:52 ASSEMBLY_EXCEPTION drwxr-xr-x. 2 root root 4096 Jun 1 11:33 bin drwxr-xr-x. 3 root root 132 Jun 1 11:33 include drwxr-xr-x. 4 root root 95 Jun 1 11:33 jre drwxr-xr-x. 3 root root 146 Jun 1 11:33 lib -rw-r--r--. 1 root root 19274 May 10 22:52 LICENSE drwxr-xr-x. 2 root root 208 Jun 1 11:33 tapset -rw-r--r--. 1 root root 157063 May 10 22:52 THIRD_PARTY_README [root@dc6-80-283 ~]#
可以确认
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64
就应该是的JAVA_HOME
了,然后修改环境变量,也就是修改/etc/profile
文件。vim /etc/profile
在
profile
文件的末尾,按i
切换到输入模式,插入如下配置# java export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64 export JRE_HOME=$JAVA_HOME/jre export PATH=$PATH:$JAVA_HOME/bin
修改环境变量以后还需使之生效
source /etc/profile
通过
echo $JAVA_HOME
命令,检验结果,如下说明成功。[root@dc6-80-283 ~]# echo $JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64
-
在
/etc/hosts
里面配置节点和别名(hadoop1,hadoop2,hadoop3)[root@dc6-80-283 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.100.133 hadoop1 192.168.100.134 hadoop2 192.168.100.135 hadoop3
-
保证三台服务器之间可以相互SSH免密码登录。
参考教程:https://blog.csdn.net/u010698107/article/details/119079821
下载Hadoop安装包
Hadoop官网:http://hadoop.apache.org
使用的Hadoop版本下载地址:http://archive.apache.org/dist/hadoop/core/hadoop-3.3.1/
解压
-
新建目录
/opt/hadoop
mkdir /opt/hadoop
-
将下载好的包解压到
/opt/hadoop
目录。tar -zxvf hadoop-3.3.1.tar.gz
-
修改文件夹名称
mv hadoop-3.3.1 hadoop
-
最终三台机器hadoop都在如下路径
/opt/hadoop/hadoop
修改配置文件
-
进入目录
/opt/hadoop/hadoop/etc/hadoop
-
core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/hadoopdata</value> </property> </configuration>
-
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
-
yarn-site.xml
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>hadoop1:18040</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoop1:18030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoop1:18025</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>hadoop1:18141</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hadoop1:18088</value> </property> </configuration>
-
mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
-
配置
workers
文件注意:在
hadoop3.0
中slaves
改为了workers
vi /opt/hadoop/hadoop/etc/hadoop/workers
-
加入从服务器名称,如果文件中有localhost这行信息,直接删除掉
hadoop1 hadoop2 hadoop3
配置系统环境变量
此步骤需要在三个节点中都配置,其他无特殊说明的只需要在主节点master
上配置
cd /opt/hadoop
vim ~/.bash_profile
在其中加入如下内容
#HADOOP
export HADOOP_HOME=/opt/hadoop/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
修改环境变量以后还需使之生效
source ~/.bash_profile
在三个节点中都执行以上命令
创建数据存放目录
mkdir /opt/hadoop/hadoopdata
格式化文件系统
hadoop namenode -format
将配置文件复制到workers
节点上
scp -r /opt/hadoop root@hadoop2:/opt
scp -r /opt/hadoop root@hadoop3:/opt
启动和关闭Hadoop集群
启动命令
cd /opt/hadoop/hadoop/sbin
start-all.sh
[root@dc6-80-283 sbin]# start-all.sh
Starting namenodes on [hadoop1]
Last login: Wed Jun 1 15:10:33 CST 2022 on pts/3
Starting datanodes
Last login: Wed Jun 1 15:10:39 CST 2022 on pts/3
Starting secondary namenodes [dc6-80-283.novalocal]
Last login: Wed Jun 1 15:10:41 CST 2022 on pts/3
2022-06-01 15:10:59,223 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Last login: Wed Jun 1 15:10:44 CST 2022 on pts/3
Starting nodemanagers
Last login: Wed Jun 1 15:11:00 CST 2022 on pts/3
关闭命令
cd /opt/hadoop/hadoop/sbin
stop-all.sh
[root@dc6-80-283 sbin]# stop-all.sh
Stopping namenodes on [hadoop1]
Last login: Wed Jun 1 15:11:02 CST 2022 on pts/3
Stopping datanodes
Last login: Wed Jun 1 15:12:26 CST 2022 on pts/3
Stopping secondary namenodes [dc6-80-283.novalocal]
Last login: Wed Jun 1 15:12:26 CST 2022 on pts/3
2022-06-01 15:12:32,120 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping nodemanagers
Last login: Wed Jun 1 15:12:28 CST 2022 on pts/3
hadoop3: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
hadoop2: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
Stopping resourcemanager
Last login: Wed Jun 1 15:12:32 CST 2022 on pts/3
检查是否启动成功
jps
# 主节点
[root@dc6-80-283 sbin]# jps
10672 NodeManager
10145 SecondaryNameNode
10966 Jps
9912 DataNode
# 从节点
[root@dc6-80-275 hadoop]# jps
26458 NodeManager
26286 DataNode
26639 Jps
踩坑
ERROR: Attempting to operate on hdfs namenode as root的方法
-
描述:hadoop-3.3.1启动hadoop集群时还有可能可能会报如下错误
[root@localhost sbin]# start-all.sh Starting namenodes on [hadoop] ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes ERROR: Attempting to operate on hdfs datanode as root ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation. Starting secondary namenodes [hadoop] ERROR: Attempting to operate on hdfs secondarynamenode as root ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation. 2018-07-16 05:45:04,628 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting resourcemanager ERROR: Attempting to operate on yarn resourcemanager as root ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation. Starting nodemanagers ERROR: Attempting to operate on yarn nodemanager as root ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
-
解决方案一
输入如下命令,在环境变量中添加下面的配置
vim /etc/profile
然后向里面加入如下的内容
export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root
输入如下命令使改动生效
source /etc/profile
-
解决方案二
将
start-dfs.sh
,stop-dfs.sh
(在hadoop安装目录的sbin里)两个文件顶部添加以下参数HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
将
start-yarn.sh
,stop-yarn.sh
(在hadoop安装目录的sbin里)两个文件顶部添加以下参数YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
-
参考文章
ERROR: JAVA_HOME is not set and could not be found
-
问题描述:启动时显示如下错误
[root@dc6-80-283 sbin]# start-all.sh Starting namenodes on [hadoop1] Last login: Wed Jun 1 14:47:53 CST 2022 on pts/3 Starting datanodes Last login: Wed Jun 1 14:58:55 CST 2022 on pts/3 hadoop2: Warning: Permanently added 'hadoop2' (ECDSA) to the list of known hosts. hadoop2: ERROR: JAVA_HOME is not set and could not be found. hadoop3: ERROR: JAVA_HOME is not set and could not be found. Starting secondary namenodes [dc6-80-283.novalocal] Last login: Wed Jun 1 14:58:57 CST 2022 on pts/3 2022-06-01 14:59:19,231 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting resourcemanager Last login: Wed Jun 1 14:58:58 CST 2022 on pts/3 Starting nodemanagers Last login: Wed Jun 1 14:59:20 CST 2022 on pts/3 hadoop3: ERROR: JAVA_HOME is not set and could not be found. hadoop2: ERROR: JAVA_HOME is not set and could not be found.
-
解决方案
查看
JAVA_HOME
目录[root@dc6-80-283 ~]# echo $JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64
在hadoop的配置目录etc/hadoop中(我的是
/opt/hadoop/hadoop/etc/hadoop/
)修改hadoop-env.sh
配置vim /opt/hadoop/hadoop/etc/hadoop/hadoop-env.sh
把
JAVA_HOME
换成正确的目录(/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64
)# Technically, the only required environment variable is JAVA_HOME. # All others are optional. However, the defaults are probably not # preferred. Many sites configure these options outside of Hadoop, # such as in /etc/profile.d # The java implementation to use. By default, this environment # variable is REQUIRED on ALL platforms except OS X! export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64
参考
参考教程:https://blog.csdn.net/xiaoluo520112/article/details/118576034