Hadoop 搭建HDFS 集群 --- 学习记录

准备操作:

  1. 创建三个虚拟机(master x1 service x2),我这里使用的是UbuntuServer。
  2. 安装jdk
    # 安装jdk8
    sudo apt-get install openjdk-8-jdk
    
    # 查看安装路径
    sudo update-alternatives --config java
    
    # 设置java home
    sudo vim ~/.bashrc
    
    # 在末尾加入如下代码
    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
    export PATH=$JAVA_HOME/bin:$PATH
    export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    
    # 查看安装结果
    java -version
    

  3. 配置无密码登录ssh

    Linux 配置 免密登录 学习笔记icon-default.png?t=M276https://blog.csdn.net/HongzhuoO/article/details/123451766

  4. 下载Hadoop压缩包并创建一个文件夹用于存放。
  5. 使用命令解压,这里我使用的是3.3.1版本 
    tar -xzvf hadoop-3.3.1-aarch64.tar.gz

  6. 编辑全局配置文件,追加Hadoop路径
    # 编辑用户下的文件
    sudo vim ~/.bashrc
    
    # 追加以下配置项目
    export HADOOP_HOME=$HOME/hadoop_ss/hadoop-3.3.1
    export PATH=$PATH:$HADOOP_HOME/bin
    export PATH=$PATH:$HADOOP_HOME/sbin
    
    编辑完成后建议重启一次 ,或者执行下面的命令
    . ~/.bashrc

  7. 进入hadoop主目录创建存储文件夹
    # 进入hadoop主目录
    cd $HADOOP_HOME
    
    # 创建dfs 和 tmp 文件夹
    mkdir dfs tmp
    
    # 进入刚刚创建好的dfs文件夹
    cd dfs
    
    # 创建 data 和 name 文件夹
    mkdir data name

  8. 切换到hadoop 配置文件目录准备开始配置
    # 切换到 hadoop 配置文件目录
    cd $HADOOP_HOME/etc/hadoop

  9. 编辑 workers 文件 (hadoop 2.x 版本叫 slaves 3.x 叫 workers)
    # 编辑 workers
    vim workers
    
    # 删除默认的  localhost 并添加所有主机名 (就是之前 hosts 文件中配置的那些)
    # 例如
    
    hdfs_master
    hdfs_name1
    hdfs_name2
    
    
    

  10. 编辑 hadoop-env.sh 脚本,填写 JAVA_HOME路径
    # 找到 JAVA_HOME 解开注释并填写为主机正确的Javahome 路径
    
    ##
    ## THIS FILE ACTS AS THE MASTER FILE FOR ALL HADOOP PROJECTS.
    ## SETTINGS HERE WILL BE READ BY ALL HADOOP COMMANDS.  THEREFORE,
    ## ONE CAN USE THIS FILE TO SET YARN, HDFS, AND MAPREDUCE
    ## CONFIGURATION OPTIONS INSTEAD OF xxx-env.sh.
    ##
    ## Precedence rules:
    ##
    ## {yarn-env.sh|hdfs-env.sh} > hadoop-env.sh > hard-coded defaults
    ##
    ## {YARN_xyz|HDFS_xyz} > HADOOP_xyz > hard-coded defaults
    ##
    
    # Many of the options here are built from the perspective that users
    # may want to provide OVERWRITING values on the command line.
    # For example:
    #
    JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
    #
    # Therefore, the vast majority (BUT NOT ALL!) of these defaults
    # are configured for substitution and not append.  If append
    # is preferable, modify this file accordingly.

  11. 编辑 core-site.xml 配置文件
    # 在configuration节点中进行配置,具体如下
    
    
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
            <property>
                   <name>fs.defaultFS</name>
                   <value>hdfs://devhdfsmaster(这是主节点名称):9000</value>
            </property>
            <property>
                   <name>io.file.buffer.size</name>
                   <value>131072</value>
            </property>
            <property>
                   <name>hadoop.tmp.dir</name>
                   <value>file:(hadoop 下的tmp文件夹路径。如果是在用户目录下,那么需要加上用户目录路径)/hadoop_ss/hadoop-3.3.1/tmp</value>
            </property>
    </configuration>

  12. 编辑 hdfs-site.xml 配置文件
    # 修改configuration 节点内容,如下所示。
    
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    
    #########################################################################
    ###### 该配置项目仅可开发测试时临时加上,不可用于生产环境和预发布测试 #######
    
    
            <property>
                    <name>dfs.permissions.enabled</name>
                    <value>false</value>
            </property>
    
    #########################################################################
    
    
    
    
            <property>
                   <name>dfs.namenode.name.dir</name>
                   <value>file:(hadoop的name文件夹路径,如果是用户目录下 那么需要加上用户目录)/hadoop_ss/hadoop-3.3.1/dfs/name</value>
            </property>
            <property>
                   <name>dfs.namenode.data.dir</name>
                   <value>file:(hadoop的data文件夹路径,如果是用户目录下 那么需要加上用户目录)/hadoop_ss/hadoop-3.3.1/dfs/data</value>
            </property>
            <property>
                   <name>dfs.replication</name>
                   <!-- 从结点数 -->
                   <value>2</value>
            </property>
            <property>
                   <name>dfs.http.address</name>
                   <value>主节点主机名:50070</value>
            </property>
            <property>
                   <name>dfs.namenode.secondary.http-address</name>
                   <value>主节点主机名:50090</value>
            </property>
    </configuration>

  13. 修改 mapred-site.xml 配置文件
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
            <property>
                   <name>mapreduce.framework.name</name>
                   <value>yarn</value>
            </property>
            <property>
                   <name>mapreduce.jobhistory.address</name>
                   <value>Master主机名:10020</value>
            </property>
            <property>
                   <name>mapreduce.jobhistory.webapp.address</name>
                   <value>Master主机名:19888</value>
            </property>
    </configuration>

  14. 修改 yarn-site.xml 配置文件
    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
            <property>
                   <name>yarn.resourcemanager.address</name>
                   <value>Master主机名:8032</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.scheduler.address</name>
                   <value>Master主机名:8030</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.resource-tracker.address</name>
                   <value>Master主机名:8031</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.admin.address</name>
                   <value>Master主机名:8033</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.webapp.address</name>
                   <value>Master主机名:8088</value>
            </property>
            <property>
                   <name>yarn.nodemanager.aux-services</name>
                   <value>mapreduce_shuffle</value>
            </property>
            <property>
                   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
            </property>
    </configuration>

  15. 将配置好的Hadoop文件夹发送到其他节点
    scp -r hadoop文件夹 用户名@主机名:

    不要忘了 在其他节点也要执行【步骤6】编辑全局配置文件,追加Hadoop路径

  16. 修改每个节点上的Hadoop文件夹权限
    chmod -R a+w hadoop_ss/hadoop-3.3.1

  17. 格式化Namenode
    $HADOOP_HOME/bin/hdfs namenode -format

  18. 常用命令
    # 启动
    start-all.sh
    
    # 停止
    stop-all.sh
    
    # 上传一张照片到 TestPhoto 目录
    hadoop fs -put test.png /TestPhoto

  19. 在网页查看控制台 访问 http://主节点IP:50070

注意事项:

如果在网页或者程序中出现上传或下载错误以及找不到主机这样的问题,可以尝试把Hadoop服务器的Hosts文件给程序或网站服务器一份。(Windows 的host路径为 C:\Windows\System32\drivers\etc )。但这可能仍然不是一个好的解决方法,期待后续解决。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值