Hadoop分布式集群安装过程记录

目录

安装Java

安装SSH

解压hadoop

写入环境变量

完成hadoop配置

环境变量配置文件

全局核心配置文件

HDFS配置文件

YARN配置文件

MapReduce配置文件

slaves文件


Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。Hadoop实现了一个分布式文件系统( Distributed File System),其中一个组件是HDFS(Hadoop Distributed File System)。HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,而MapReduce则为海量的数据提供了计算。

VMware Workstation中创建一台虚拟机,操作系统为ubuntu-22.04.3-desktop-amd64,之后使用链接克隆(也可以用完整克隆),创建三台虚拟机,分别为master、slave1、slave2。

集群节点规划
HostNameIP Address

master

192.168.200.10
slave1192.168.200.20
slave2192.168.200.30

下载Hadoop,可以在宿主机上用迅雷下载,然后用MobaXterm、Xshell之类的软件传入虚拟机,虚拟机也安装上VMware Tools,方便操作。

hadoop-3.2.4icon-default.png?t=N7T8https://dlcdn.apache.org/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz安装Ubuntu时,创建一个用户,就名为hadoop,修改主机名,配置IP地址。

进入系统,第一件事,把源换成国内的,免得用apt安装要等大半天。

修改/ets/hosts,写入主机名和相应的IP

安装Java、SSH,用的是Java 8(JDK 1.8)

JRE和JDK的区别: JRE(Java Runtime Environment,Java运行环境),是运行 Java 所需的环境。JDK(Java Development Kit,Java软件开发工具包)即包括 JRE,还包括开发 Java 程序所需的工具和类库。

安装Java:

sudo apt install openjdk-8-jdk -y

用apt安装的话,再用这条命令查询一下安装位置

readlink -f $(which java) | sed "s:bin/java::"

安装完成后 java -version,输出版本信息看看

安装SSH:

sudo apt install openssh-server -y

配置SSH免密登陆

生成公钥、私钥,运行:

ssh-keygen -t rsa


每个节点都运行一次,把三个节点的公钥全部写进authorized_keys文件

之后用scp 指令分发authorized_keys 文件

authorized_keys 位于.ssh目录下

解压hadoop

打开压缩包所在的目录,解压到opt目录下

tar -zxvf hadoop-3.2.4.tar.gz -C /opt/

可以把opt目录权限改为777,方便后续分发文件到其他节点,不然会提示Permission denied

sudo chmod 777 /opt/

chmod -R:指递归操作里面的全部文件和目录

个人习惯把软件的安装目录改为“软件名-版本号“的形式

例如:hadoop-3.2.4

修改hadoop用户,主目录的“.bashrc”文件

写入环境变量

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export CLASSPATH=$JAVA_HOME/lib
export HADOOP_HOME=/opt/hadoop-3.2.4
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

记得source ~/.bashrc ,使环境变量生效

完成hadoop配置

环境变量配置文件:

vim /opt/hadoop-3.2.4/etc/hadoop/hadoop-env.sh

 52 # The java implementation to use. By default, this environment
 53 # variable is REQUIRED on ALL platforms except OS X!
 54 export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

全局核心配置文件:

vim /opt/hadoop-3.2.4/etc/hadoop/core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
 <property>
     <name>fs.defaultFS</name>
     <value>hdfs://master:9000</value>
 </property>
 <property>
     <name>hadoop.tmp.dir</name>
     <value>/opt/hadoopTmp/</value>
 </property>
 <!-- 配置HDFS网页登录使用的静态用户为hadoop-->
<property>
    <name>hadoop.http.staticuser.user</name>
    <value>hadoop</value>
</property>
</configuration>

HDFS配置文件:

vim /opt/hadoop-3.2.4/etc/hadoop/hdfs-site.xml 

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
 <property>
      <name>dfs.namenode.http-address</name>
      <value>master:50070</value>
 </property>
 <property>
      <name>dfs.namenode.secondary.http-address</name>
      <value>slave1:50090</value>
 </property>
 <property>
      <name>dfs.replication</name>
      <value>3</value>
 </property>
 <property>
      <name>dfs.namenode.name.dir</name>
      <value>/opt/hadoopTmp/dfs/name</value>
 </property>
 <property>
      <name>dfs.datanode.data.dir</name>
      <value>/opt/hadoopTmp/dfs/data</value>
 </property>
</configuration>

YARN配置文件:

vim /opt/hadoop-3.2.4/etc/hadoop/yarn-site.xml 

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<configuration>
 <property>
    <name>yarn.resourcemanager.hostsname</name>
    <value>master</value>
 </property>
 <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>master:8088</value>
 </property>
 <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
 </property>
 <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
 <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
 </property>
 <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>106800</value>
 </property>
 <property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/user/container/logs</value>
 </property>
</configuration>

MapReduce配置文件:

vim /opt/hadoop-3.2.4/etc/hadoop/mapred-site.xml 

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
 <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
 </property>
 <property>
     <name>mapreduce.jobhistory.address</name>
     <value>slave2:10020</value>
 </property>
 <property>
     <name>mapreduce.jobhistory.webapp.address</name>
     <value>slave2:19888</value>
 </property>
 <property>
     <name>mapreduce.jobhistory.intermediate-done-dir</name>
     <value>${hadoop.tmp.dir}/mr-history/tmp</value>
 </property>
 <property>
     <name>mapreduce.jobhistory.done-dir</name>
     <value>${hadoop.tmp.dir}/mr-history/done</value>
 </property>
</configuration>

slaves文件:

vim /opt/hadoop-3.2.4/etc/hadoop/slaves 

master
slave1
slave2

初次启动先格式化: 

hdfs namenode -format

  • 3
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值