Hadoop集群安装

Hadoop集群环境安装

准备工作

准备三台机器:

  • 192.168.100.133 hadoop1(master)
  • 192.168.100.134 hadoop2
  • 192.168.100.135 hadoop3

三个前提:

  • 首先保证三个虚拟机上的jdk环境都已安装,并配置了JAVA_HOME

    安装JDK,安装方式如下或者其他:

    [root@localhost ~]# yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
    

    通过yum安装JDK,是不会自动配置JAVA_HOME环境变量的,需要手动配置JAVA_HOME,方式如下:

    [root@dc6-80-283 ~]# which java
    /usr/bin/java
    [root@dc6-80-283 ~]# ll /usr/bin/java
    lrwxrwxrwx. 1 root root 22 Jun  1 11:33 /usr/bin/java -> /etc/alternatives/java
    [root@dc6-80-283 ~]# ll /etc/alternatives/java
    lrwxrwxrwx. 1 root root 74 Jun  1 11:33 /etc/alternatives/java -> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64/jre/bin/java
    [root@dc6-80-283 ~]# ll /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64
    total 184
    -rw-r--r--. 1 root root   1522 May 10 22:52 ASSEMBLY_EXCEPTION
    drwxr-xr-x. 2 root root   4096 Jun  1 11:33 bin
    drwxr-xr-x. 3 root root    132 Jun  1 11:33 include
    drwxr-xr-x. 4 root root     95 Jun  1 11:33 jre
    drwxr-xr-x. 3 root root    146 Jun  1 11:33 lib
    -rw-r--r--. 1 root root  19274 May 10 22:52 LICENSE
    drwxr-xr-x. 2 root root    208 Jun  1 11:33 tapset
    -rw-r--r--. 1 root root 157063 May 10 22:52 THIRD_PARTY_README
    [root@dc6-80-283 ~]# 
    

    可以确认/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64就应该是的JAVA_HOME了,然后修改环境变量,也就是修改/etc/profile文件。

    vim /etc/profile
    

    profile文件的末尾,按i切换到输入模式,插入如下配置

    # java
    export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64
    export JRE_HOME=$JAVA_HOME/jre
    export PATH=$PATH:$JAVA_HOME/bin
    

    修改环境变量以后还需使之生效

    source /etc/profile
    

    通过echo $JAVA_HOME命令,检验结果,如下说明成功。

    [root@dc6-80-283 ~]# echo $JAVA_HOME
    /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64
    
  • /etc/hosts里面配置节点和别名(hadoop1,hadoop2,hadoop3)

    [root@dc6-80-283 ~]# cat /etc/hosts
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.100.133 hadoop1
    192.168.100.134 hadoop2
    192.168.100.135 hadoop3
    
  • 保证三台服务器之间可以相互SSH免密码登录。

    参考教程:https://blog.csdn.net/u010698107/article/details/119079821

下载Hadoop安装包

Hadoop官网:http://hadoop.apache.org
使用的Hadoop版本下载地址:http://archive.apache.org/dist/hadoop/core/hadoop-3.3.1/

解压

  • 新建目录/opt/hadoop

    mkdir /opt/hadoop
    
  • 将下载好的包解压到/opt/hadoop目录。

    tar -zxvf hadoop-3.3.1.tar.gz
    
  • 修改文件夹名称

    mv hadoop-3.3.1 hadoop
    
  • 最终三台机器hadoop都在如下路径

    /opt/hadoop/hadoop
    

修改配置文件

  • 进入目录 /opt/hadoop/hadoop/etc/hadoop

  • core-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
        http://www.apache.org/licenses/LICENSE-2.0
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
     
    <!-- Put site-specific property overrides in this file. -->
     
    <configuration>
    	<property>
    		<name>fs.defaultFS</name>
    		<value>hdfs://hadoop1:9000</value>
    	</property>
    	<property>
    		<name>hadoop.tmp.dir</name>
    		<value>/opt/hadoop/hadoopdata</value>
    	</property>
    </configuration>
    
  • hdfs-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
        http://www.apache.org/licenses/LICENSE-2.0
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
     
    <!-- Put site-specific property overrides in this file. -->
     
    <configuration>
    	<property>
    		<name>dfs.replication</name>
    		<value>1</value>
    	</property>
    </configuration>
    
  • yarn-site.xml

    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
        http://www.apache.org/licenses/LICENSE-2.0
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
     
    <!-- Site specific YARN configuration properties -->
     
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>hadoop1:18040</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>hadoop1:18030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>hadoop1:18025</value>
        </property>
        <property>
            <name>yarn.resourcemanager.admin.address</name>
            <value>hadoop1:18141</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.address</name>
            <value>hadoop1:18088</value>
        </property>
     
    </configuration>
    
  • mapred-site.xml

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
        http://www.apache.org/licenses/LICENSE-2.0
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
     
    <!-- Put site-specific property overrides in this file. -->
     
    <configuration>
     
    	<property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
    	</property>
        
    </configuration>
    
  • 配置workers文件

    注意:在 hadoop3.0slaves 改为了 workers

    vi /opt/hadoop/hadoop/etc/hadoop/workers
    
  • 加入从服务器名称,如果文件中有localhost这行信息,直接删除掉

    hadoop1
    hadoop2
    hadoop3
    

配置系统环境变量

此步骤需要在三个节点中都配置,其他无特殊说明的只需要在主节点master上配置

cd /opt/hadoop 
vim ~/.bash_profile

在其中加入如下内容

#HADOOP
export HADOOP_HOME=/opt/hadoop/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

修改环境变量以后还需使之生效

source ~/.bash_profile

在三个节点中都执行以上命令

创建数据存放目录

mkdir /opt/hadoop/hadoopdata

格式化文件系统

hadoop namenode -format

将配置文件复制到workers节点上

scp -r /opt/hadoop root@hadoop2:/opt
scp -r /opt/hadoop root@hadoop3:/opt

启动和关闭Hadoop集群

启动命令

cd /opt/hadoop/hadoop/sbin
start-all.sh
[root@dc6-80-283 sbin]# start-all.sh
Starting namenodes on [hadoop1]
Last login: Wed Jun  1 15:10:33 CST 2022 on pts/3
Starting datanodes
Last login: Wed Jun  1 15:10:39 CST 2022 on pts/3
Starting secondary namenodes [dc6-80-283.novalocal]
Last login: Wed Jun  1 15:10:41 CST 2022 on pts/3
2022-06-01 15:10:59,223 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Last login: Wed Jun  1 15:10:44 CST 2022 on pts/3
Starting nodemanagers
Last login: Wed Jun  1 15:11:00 CST 2022 on pts/3

关闭命令

cd /opt/hadoop/hadoop/sbin
stop-all.sh
[root@dc6-80-283 sbin]# stop-all.sh
Stopping namenodes on [hadoop1]
Last login: Wed Jun  1 15:11:02 CST 2022 on pts/3
Stopping datanodes
Last login: Wed Jun  1 15:12:26 CST 2022 on pts/3
Stopping secondary namenodes [dc6-80-283.novalocal]
Last login: Wed Jun  1 15:12:26 CST 2022 on pts/3
2022-06-01 15:12:32,120 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping nodemanagers
Last login: Wed Jun  1 15:12:28 CST 2022 on pts/3
hadoop3: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
hadoop2: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
Stopping resourcemanager
Last login: Wed Jun  1 15:12:32 CST 2022 on pts/3

检查是否启动成功

jps
# 主节点
[root@dc6-80-283 sbin]# jps
10672 NodeManager
10145 SecondaryNameNode
10966 Jps
9912 DataNode
# 从节点
[root@dc6-80-275 hadoop]# jps
26458 NodeManager
26286 DataNode
26639 Jps

踩坑

ERROR: Attempting to operate on hdfs namenode as root的方法

  • 描述:hadoop-3.3.1启动hadoop集群时还有可能可能会报如下错误

    [root@localhost sbin]# start-all.sh
    Starting namenodes on [hadoop]
    ERROR: Attempting to operate on hdfs namenode as root
    ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
    Starting datanodes
    ERROR: Attempting to operate on hdfs datanode as root
    ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
    Starting secondary namenodes [hadoop]
    ERROR: Attempting to operate on hdfs secondarynamenode as root
    ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
    2018-07-16 05:45:04,628 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Starting resourcemanager
    ERROR: Attempting to operate on yarn resourcemanager as root
    ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
    Starting nodemanagers
    ERROR: Attempting to operate on yarn nodemanager as root
    ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
    
  • 解决方案一

    输入如下命令,在环境变量中添加下面的配置

    vim /etc/profile
    

    然后向里面加入如下的内容

    export HDFS_NAMENODE_USER=root
    export HDFS_DATANODE_USER=root
    export HDFS_SECONDARYNAMENODE_USER=root
    export YARN_RESOURCEMANAGER_USER=root
    export YARN_NODEMANAGER_USER=root
    

    输入如下命令使改动生效

    source /etc/profile
    
  • 解决方案二

    start-dfs.shstop-dfs.sh(在hadoop安装目录的sbin里)两个文件顶部添加以下参数

    HDFS_DATANODE_USER=root
    HADOOP_SECURE_DN_USER=hdfs
    HDFS_NAMENODE_USER=root
    HDFS_SECONDARYNAMENODE_USER=root
    

    start-yarn.shstop-yarn.sh(在hadoop安装目录的sbin里)两个文件顶部添加以下参数

    YARN_RESOURCEMANAGER_USER=root
    HADOOP_SECURE_DN_USER=yarn
    YARN_NODEMANAGER_USER=root
    
  • 参考文章

    https://blog.csdn.net/a6661314/article/details/124454376

ERROR: JAVA_HOME is not set and could not be found

  • 问题描述:启动时显示如下错误

    [root@dc6-80-283 sbin]# start-all.sh
    Starting namenodes on [hadoop1]
    Last login: Wed Jun  1 14:47:53 CST 2022 on pts/3
    Starting datanodes
    Last login: Wed Jun  1 14:58:55 CST 2022 on pts/3
    hadoop2: Warning: Permanently added 'hadoop2' (ECDSA) to the list of known hosts.
    hadoop2: ERROR: JAVA_HOME is not set and could not be found.
    hadoop3: ERROR: JAVA_HOME is not set and could not be found.
    Starting secondary namenodes [dc6-80-283.novalocal]
    Last login: Wed Jun  1 14:58:57 CST 2022 on pts/3
    2022-06-01 14:59:19,231 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Starting resourcemanager
    Last login: Wed Jun  1 14:58:58 CST 2022 on pts/3
    Starting nodemanagers
    Last login: Wed Jun  1 14:59:20 CST 2022 on pts/3
    hadoop3: ERROR: JAVA_HOME is not set and could not be found.
    hadoop2: ERROR: JAVA_HOME is not set and could not be found.
    
  • 解决方案

    查看JAVA_HOME目录

    [root@dc6-80-283 ~]# echo $JAVA_HOME
    /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64
    

    在hadoop的配置目录etc/hadoop中(我的是/opt/hadoop/hadoop/etc/hadoop/)修改hadoop-env.sh配置

    vim /opt/hadoop/hadoop/etc/hadoop/hadoop-env.sh
    

    JAVA_HOME换成正确的目录(/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64

    # Technically, the only required environment variable is JAVA_HOME.
    # All others are optional.  However, the defaults are probably not
    # preferred.  Many sites configure these options outside of Hadoop,
    # such as in /etc/profile.d
    
    # The java implementation to use. By default, this environment
    # variable is REQUIRED on ALL platforms except OS X!
     export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.aarch64
    

    参考

    参考教程:https://blog.csdn.net/xiaoluo520112/article/details/118576034

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值