hadoop3.3.6集群搭建

hadoop3.3.6集群搭建

一、前置条件

  1. 服务器:3台,1主2从,centos7

    IPhostname说明
    192.168.108.137centos137master
    192.168.108.138centos138node
    192.168.108.139centos139node

    三台服务之间能通过hostname访问

    # hostname修改
    hostnamectl set-hostname centos137
    #三台修改hosts文件,添加以下命令
    192.168.108.137 centos137
    192.168.108.138 centos138
    192.168.108.139 centos139
    #重启
    reboot
    
  2. hadoop集群(版本号2.2+),集群中安装有HDFS服务

  3. JDK1.8+(推荐自己安装JDK,需要JAVA_HOME环境变量)

使用3.3.6版本

二、角色分配

节点部署角色目录

节点ipNNSNNDNRMNMHS
centos137192.168.108.137
centos138192.168.108.138
centos139192.168.108.139

角色说明

HDFSYARNMapReduce
NameNode(NN)ResourceManager(RM)HistoryServer(HS)
SecondNameNode (SNN)NodeManager(NM)
DataNode (DN)

组件默认端口清单

组件端口说明
HDFS8020NameNode
50010,50020、50075DataNode
YARN8032ResourceManager
8088Web界面
8040NodeManager协议
8042Web界面
MapReduce10020HistoryServer协议
19888Web界面
Hadoop Common49152~65535Inter-Process Communication
ZooKeeper2181Hadoop集群的协调服务
Hadoop Web界面9870NameNode Web界面
8088ResourceManager Web界面:
19888JobHistoryServer Web界面
Hadoop RPC8019Remote Procedure Call

安装包:国内镜像地址:Index of /apache/hadoop/common/hadoop-3.3.6 (tsinghua.edu.cn)

2.1软件安装(所有服务器)

ssh免密登录
useradd hadoop
passwd hadoop

忽略提示密码太短的警告

# 切换用户
su hadoop 

先输入密码自己登录下自己生成.ssh目录

ssh localhost

生成秘钥

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
分发密钥(主节点上执行)
#后面是想要免密登录的节点主机名
ssh-copy-id centos137
ssh-copy-id centos138
ssh-copy-id centos139

测试centos137登录各个节点是否免密例如登录centos138

ssh centos138

在所有虚拟机根目录下新建文件夹export,export文件夹中新建data、servers和software文件

mkdir -p /export/data
mkdir -p /export/servers
mkdir -p /export/software

1.解压
tar -zxvf hadoop-3.3.6.tar.gz -C /export/servers/
2.配置环境变量
vi /etc/profile
# 文末追加
export HADOOP_HOME=/export/servers/hadoop-3.3.6
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# 环境变量生效
source /etc/profile

3.验证
[root@centos137 servers]# hadoop version
Hadoop 3.3.6
Source code repository https://github.com/apache/hadoop.git -r 1be78238728da9266a4f88195058f08fd012bf9c
Compiled by ubuntu on 2023-06-18T08:22Z
Compiled on platform linux-x86_64
Compiled with protoc 3.7.1
From source with checksum 5652179ad55f76cb287d9c633bb53bbd
This command was run using /export/servers/hadoop-3.3.6/share/hadoop/common/hadoop-common-3.3.6.jar

** 注意环境切换 **

2.2主节点配置

进入hadoop安装目录
cd /export/servers/hadoop-3.3.6/etc/hadoop
修改配置hadoop-env.sh
vim hadoop-env.sh
# 添加JAVA_HOME
export JAVA_HOME=/export/servers/jdk

修改workers

vim workers

添加
centos137
centos138
centos139

内容:

[hadoop@centos138 sbin]$ cat workers 
centos137
centos138
centos139
修改core-site.xml
vim core-site.xml
  • 配置HDFS的URI和临时目录
  • HDFS网页登录使用的静态用户
  • 添加以下配置
<configuration>
    <!--setting HDFS-->
    <property>
        <name>fs.defaultFS</name>
        <!--setting namenode-->
        <value>hdfs://centos137:9000</value>
    </property>
    <!--setting temp folder,default:/tem/hadoop-${user.name}-->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/export/servers/hadoop-3.3.6/tmp</value>
    </property>
    <!-- HDFS web loggin static  user -->
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>hadoop</value>
    </property>
</configuration>

修改hdfs-site.xml文件
vim hdfs-site.xml
  • 指定HDFS的数量

  • 配置secondary namenode

  • 添加以下配置

<configuration>
    <!--setting HDFS number-->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!--setting secondary namenode-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>centos138:50090</value>
    </property>
</configuration>
修改mapred-site.xml文件
vim mapred-site.xml
  • 指定MapReduce运行时的框架,这里指定在YARN上,默认在local
  • 历史服务器端地址

添加配置​

<configuration>
        <!-- 执行MapReduce的方式:yarn/local -->
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>centos138:10020</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>centos138:19888</value>
        </property>
</configuration>
修改yarn-site.xml文件
  • 指定YARN集群的管理者(ResourceManager)的地址
  • 指定MR走shuffle
  • 开启日志聚集功能
  • 设置日志聚集服务器地址
  • 设置日志保留时间为 7 天
vim yarn-site.xml
<configuration>
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>centos137</value>
        </property>
        <property>
                <name>yarn.nodemanager.env-whitelist</name>
                <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <!-- open log aggregation -->
        <property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>
        <!-- log erver -->
        <property>
                <name>yarn.log.server.url</name>
                <value>http://centos138:19888/jobhistory/logs</value>
        </property>
        <!-- log save days-->
        <property>
                <name>yarn.log-aggregation.retain-seconds</name>
                <value>604800</value>
        </property>
</configuration>

目录授权

将安装目录的权限赋予hadoop用户

chown -R hadoop:hadoop /export/

2.3文件分发

将master的配置分发到从node

scp -r /export/servers centos138:/export
scp -r /export/servers centos139:/export

2.4启动hadoop集群

格式化NameNode

hdfs namenode -format

格式化NameNode会产生新的集群id,导致DataNode中记录的的集群id和刚生成的NameNode的集群id不 一致,DataNode找不到NameNode。所以,格式化NameNode时,一定要先删除每个节点的data目录和logs日志,然后再格式化NameNode,一般只在搭建初期执行这一次。

在master(centos137)执行

# 启动集群
/export/servers/hadoop-3.3.6/sbin/start-all.sh
# 停止集群
/export/servers/hadoop-3.3.6/sbin/stop-all.sh
[hadoop@centos137 hadoop-3.3.6]$ /export/servers/hadoop-3.3.6/sbin/start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [centos137]
Starting datanodes
Starting secondary namenodes [centos138]
Starting resourcemanager
Starting nodemanagers

启动过程有错误输出,核对文件分发后的目录是否正确

启动历史服务(centos138节点)

 mapred --daemon start historyserver

或者HDFS和YARN单独启动

# 启动
start-dfs.sh
start-yarn.sh
# 停止
stop-dfs.sh
stop-yarn.sh

集群部署验证
每个节点执行jps命令验证hdfs集群启动的角色是否正确

2.5集群部署验证

  • 每个节点执行jps命令验证hdfs集群启动的角色是否正确

    执行:jps

    centos138角色: NN、RM、NM、DN

    [hadoop@centos137 hadoop-3.3.6]$ jps
    34082 ResourceManager
    34228 NodeManager
    33638 DataNode
    33497 NameNode
    

37790 Jps


centos138角色:SNN、DN、NM、HS

```bash
[hadoop@centos138 sbin]$ jps
32530 SecondaryNameNode
26679 JobHistoryServer
33321 Jps
32733 NodeManager
32383 DataNode

centos139角色:NM、DN

[hadoop@centos139 hadoop]$ jps
51088 NodeManager
51685 Jps
50823 DataNode

根据组件默认端口清单访问WEB UI

34228 NodeManager
33638 DataNode
33497 NameNode
37790 Jps


centos138角色:SNN、DN、NM、HS

```bash
[hadoop@centos138 sbin]$ jps
32530 SecondaryNameNode
26679 JobHistoryServer
33321 Jps
32733 NodeManager
32383 DataNode

centos139角色:NM、DN

[hadoop@centos139 hadoop]$ jps
51088 NodeManager
51685 Jps
50823 DataNode

根据组件默认端口清单访问WEB UI

参考链接:
https://blog.csdn.net/weixin_43655425/article/details/134751084
https://blog.csdn.net/tang5615/article/details/120382513

  • 19
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值