【hadoop】集群安装

阿提艾斯

于 2024-08-07 06:00:00 发布

阅读量242

点赞数 10

分类专栏：大数据文章标签： hadoop 大数据分布式

本文链接：https://blog.csdn.net/qq_37871657/article/details/140967780

版权

大数据专栏收录该内容

4 篇文章 0 订阅

订阅专栏

需求

Hadoop运行模式可分为：

本地模式：本地测试。

伪分布式模式：小公司可能会用。

完全分布式模式：大公司经常用。

安装一套3节点的完全分布式hadoop集群。如下：
在这里插入图片描述

NameNode和SecondaryNameNode不要安装在同一台服务器。
出于内存考虑，ResourceManager不要和NameNode、SecondaryNameNode配置在同一台机器上。

虚拟机安装

先装一个模板虚拟机，之后集群中的虚拟机可以直接使用克隆虚拟机的方式创建。

linux 版本： centos 7.5

网络配置

虚拟网络设置
在这里插入图片描述
网卡设置

在这里插入图片描述

虚拟机网络设置

/etc/sysconfig/network-scripts/ifcfg-ens33

BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.10.100
GATEWAY=192.168.10.2
DNS1=192.168.10.2

配置主机名称

/etc/hostname

hadoop100

主机映射

/etc/hosts

192.168.10.100 hadoop100
192.168.10.101 hadoop101
192.168.10.102 hadoop102
192.168.10.103 hadoop103
192.168.10.104 hadoop104

重启虚拟机

reboot
# 检查虚拟机ip
ip addr
ping www.baidu.com

安装依赖

# 额外包
yum install -y epel-release
# 网络工具，ifconfig工具，可选安装
yum intall -y net-tools
# 文本编辑工具，可选安装
yum install -y vim
# 远程同步工具，可选安装
yum install -y rsync

关闭防火墙

# 停止防火墙
systemctl stop firewalld

# 停止防火墙开机启动
systemctl disable firewalld.service

创建普通用户

安装Centos的时候，如果已经建过普通用户的话，就不用再创建了

useradd atiaisi
passwd atiaisi

给普通用户atiaisi赋予sudoer权限

# /etc/sudoers
# NOPASSWD:ALL的意思是普通用户下，使用sudo不再输入密码
atiaisi ALL=(ALL) NOPASSWD:ALL

创建目录，作为之后hadoop软件的安装目录

mkdir -p /opt/module
mkdir -p /opt/software
chown -R atiaisi:atiaisi /opt/module
chown -R atiaisi:atiaisi /opt/software

以atiaisi用户登录虚拟机

上传hadoop-3.1.3.tar.gz、jdk-8u212-linux-x64.tar.gz到/opt/software目录下

配置JDK

解压文件

tar -zxvf jdk-8u212-linux-x64.tar.gz -C /opt/module

配置jdk环境变量

# /etc/profile.d/my_env.sh
# JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin

使环境变量生效

source /etc/profile

安装hadoop

解压文件

tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/

配置hadoop环境变量

# /etc/my_env.sh中追加
# HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

使环境变量生效

source /etc/profile

[atiaisi@hadoop100 hadoop-3.1.3]$ tree /opt/module/hadoop-3.1.3/ -I "share|lib|libexec|include|NOTICE.txt|README.txt|LICENSE.txt"
/opt/module/hadoop-3.1.3/
├── bin
│   ├── container-executor
│   ├── hadoop
│   ├── hadoop.cmd
│   ├── hdfs			# hdfs命令
│   ├── hdfs.cmd
│   ├── mapred			# mapreduce命令
│   ├── mapred.cmd
│   ├── test-container-executor
│   ├── yarn			# yarn命令
│   └── yarn.cmd
├── etc
│   └── hadoop
│       ├── capacity-scheduler.xml
│       ├── configuration.xsl
│       ├── container-executor.cfg
│       ├── core-site.xml
│       ├── hadoop-env.cmd
│       ├── hadoop-env.sh
│       ├── hadoop-metrics2.properties
│       ├── hadoop-policy.xml
│       ├── hadoop-user-functions.sh.example
│       ├── hdfs-site.xml		# hdfs配置文件
│       ├── httpfs-env.sh
│       ├── httpfs-log4j.properties
│       ├── httpfs-signature.secret
│       ├── httpfs-site.xml
│       ├── kms-acls.xml
│       ├── kms-env.sh
│       ├── kms-log4j.properties
│       ├── kms-site.xml
│       ├── log4j.properties
│       ├── mapred-env.cmd
│       ├── mapred-env.sh
│       ├── mapred-queues.xml.template
│       ├── mapred-site.xml		# mapreduce配置文件
│       ├── shellprofile.d
│       │   └── example.sh
│       ├── ssl-client.xml.example
│       ├── ssl-server.xml.example
│       ├── user_ec_policies.xml.template
│       ├── workers
│       ├── yarn-env.cmd
│       ├── yarn-env.sh
│       ├── yarnservice-log4j.properties
│       └── yarn-site.xml		# yarn配置文件
└── sbin
    ├── distribute-exclude.sh
    ├── FederationStateStore
    │   ├── MySQL
    │   │   ├── dropDatabase.sql
    │   │   ├── dropStoreProcedures.sql
    │   │   ├── dropTables.sql
    │   │   ├── dropUser.sql
    │   │   ├── FederationStateStoreDatabase.sql
    │   │   ├── FederationStateStoreStoredProcs.sql
    │   │   ├── FederationStateStoreTables.sql
    │   │   └── FederationStateStoreUser.sql
    │   └── SQLServer
    │       ├── FederationStateStoreStoreProcs.sql
    │       └── FederationStateStoreTables.sql
    ├── hadoop-daemon.sh
    ├── hadoop-daemons.sh
    ├── httpfs.sh
    ├── kms.sh
    ├── mr-jobhistory-daemon.sh
    ├── refresh-namenodes.sh
    ├── start-all.cmd
    ├── start-all.sh
    ├── start-balancer.sh
    ├── start-dfs.cmd
    ├── start-dfs.sh		# 启动dhfs
    ├── start-secure-dns.sh
    ├── start-yarn.cmd		# 启动yarn
    ├── start-yarn.sh
    ├── stop-all.cmd
    ├── stop-all.sh
    ├── stop-balancer.sh
    ├── stop-dfs.cmd
    ├── stop-dfs.sh
    ├── stop-secure-dns.sh
    ├── stop-yarn.cmd
    ├── stop-yarn.sh
    ├── workers.sh
    ├── yarn-daemon.sh
    └── yarn-daemons.sh

8 directories, 78 files

至此，虚拟机模板创建完成。

克隆虚拟机

root用户登录。

克隆三个虚拟机，分别是：hadoop102, hadoop103, hadoop104.

修改每个虚拟机的IP、主机名称、主机映射。

重启虚拟机后，检查是否配置正确，确保每一步都是正确的，再执行后面的配置。

配置ssh免密

atiaisi用户登录。

在A机器hostnameA上通过ssh访问B机器hostnameB，ssh免密操作

在A机器上执行：

# 生成密钥对 ssh-keygen -t rsa # 将公钥拷贝到B机器上 ssh-copy-id hostnameB

这样A访问B就不用再输入密码了。

如果B要访问A，在B机器上执行以上相同操作。

xsync同步脚本编写

该脚本的作用是可以同时给各个子节点发送文件。

#!/bin/bash

#1. 判断参数个数
if [ $# -lt 1 ]
then
    echo Not Enough Argument!
    exit;
fi

# 2. 遍历集群所有机器
for host in hadoop102 hadoop103 hadoop104
do
    echo ========== $host ==========
    # 3. 遍历所有目录
    for file in $@
    do
        # 4. 判断文件是否存在
        if [ -e $file ]
        then
            # 5. 获取父目录
            pdir=$(cd -P $(dirname $file); pwd) 
            # 6. 获取当前文件的名称
            fname=$(basename $file)
            ssh $host "mkdir -p $pdir"
            rsync -av $pdir/$fname $host:$pdir
        else
            echo $file does not exists!
        fi
    done
done

配置环境变量

/etc/profile.d/my_env.sh

# XSYNC_HOME
export XSYNC_HOME=/home/atiaisi
export PATH=$PATH:$XSYNC_HOME/bin

集群配置

atiaisi用户登录。

配置文件目录：/opt/module/hadoop-3.1.3/etc/hadoop

core-site.xml

公共配置(核心配置文件)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- 指定NameNode的地址 -->
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://hadoop102:8020</value>
</property>
<!-- 指定hadoop数据的存储目录 -->
<property>
  <name>hadoop.tmp.dir</name>
  <value>/opt/module/hadoop-3.1.3/data</value>
  <description>A base for other temporary directories.</description>
</property>
<!-- 配置HDFS网页登录使用的静态用户为atiaisi -->
<property>
  <name>hadoop.http.staticuser.user</name>
  <value>atiaisi</value>
</property>

</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- nn web端访问地址 -->
<property>
  <name>dfs.namenode.http-address</name>
  <value>hadoop102:9870</value>
</property>
<!-- 2nn web端访问地址 -->
<property>
  <name>dfs.namenode.secondary.http-address</name>
  <value>hadoop104:9868</value>
</property>
</configuration>

yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
<!-- 指定MR走shuffle -->
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
</property>
<!-- 指定RM地址 -->
<property>
    <description>The hostname of the RM.</description>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop103</value>
</property>
<!-- 环境变量继承 -->
<property>
    <description>Environment variables that containers may override rather than use NodeManager's default.</description>
    <name>yarn.nodemanager.env-whitelist</name>
    <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

<!-- 如果不配置日志聚集功能，web页面上的yarn历史任务的日志会查看不了。 -->
<!-- 开启日志聚集功能 -->
<property>
	<name>yarn.log-aggregation-enable</name>
	<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
    <name>yarn.log.server.url</name>
    <value>http://hadoop102:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为7天 -->
<property>
	<name>yarn.log-aggregation.retain-seconds</name>
	<value>604800</value>
</property> 
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- 指定MapReduce程序运行在Yarn上 -->
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
 <!-- 历史服务器端地址 -->
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>hadoop102:10020</value>
  <description>MapReduce JobHistory Server IPC host:port</description>
</property>
<!-- 历史服务器web端地址 -->
<!-- 如果不配置，web页面上查看yarn任务历史的时候，会跳转不过去。-->
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>hadoop102:19888</value>
  <description>MapReduce JobHistory Server Web UI host:port</description>
</property>
</configuration>

配置workers

# /opt/module/hadoop-3.1.3/etc/hadoop/workers
hadoop102
hadoop103
hadoop104

同步配置文件到所有节点

xsync /opt/module/hadoop-3.1.3/etc/hadoop/

群起集群

启动集群

格式化MameNode

只需要在hadoop102节点上执行，只有集群第一次启动的时候执行

hdfs namenode -format

启动HDFS

只需要在hadoop102节点上执行

start-dfs.sh

查看集群状态，观察NameNode，DataNode, 2NN是否符合更开始的集群搭建需求。

jps

启动yarn

只需要在hadoop103节点上执行

start-yarn.sh

再次观察集群状态 jps，查看ResourceManger, NodeManager是否符合需求

在hadoop102启动历史服务器

# 启动历史服务器
mapred --daemon start historyserver

# 查看历史服务器是否启动
jps

102

103

104
5) web端查看HDFS的NameNode

浏览器中输入：http://hadoop102:9870

可以查看HDFS上存储的数据信息。

在这里插入图片描述
6) web端查看yarn的ResourceManager

浏览器中输入: http://hadoop103:8088

可以查看YARN上运行的Job信息。

在这里插入图片描述

测试集群

上传文件到集群

# dhfs系统中创建文件夹
hadoop fs -mkdir /packages
# 上传文件
hadoop fs -put /opt/software/jdk-8u212-linux-x64.tar.gz /packages
hadoop fs -put /tmp/words.txt /

查看上传的文件

在这里插入图片描述

在这里插入图片描述
发起一个wordcount程序，出发MR任务。

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /words.txt /output

任务记录

在这里插入图片描述
任务记录详细信息

在这里插入图片描述
任务记录详细日志

阿提艾斯

关注

10
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
【hadoop】集群安装

查看集群状态，观察NameNode，DataNode, 2NN是否符合更开始的集群搭建需求。克隆三个虚拟机，分别是：hadoop102, hadoop103, hadoop104.配置文件目录：/opt/module/hadoop-3.1.3/etc/hadoop。重启虚拟机后，检查是否配置正确，确保每一步都是正确的，再执行后面的配置。只需要在hadoop102节点上执行，只有集群第一次启动的时候执行。虚拟机，之后集群中的虚拟机可以直接使用克隆虚拟机的方式创建。如果B要访问A，在B机器上执行以上相同操作。
复制链接

扫一扫

专栏目录