Hadoop环境搭建之完全分布式运行模式

一、开通阿里云ECS服务器

1.创建实例
在这里插入图片描述

2.实例创建完成后,列表如下
在这里插入图片描述

二、连接服务器

1.打开系统自带或第三方终端工具。
Windows: PowerShell或Putty
Ubuntu: Terminal
2.设置连接参数,连接类型是SSH,端口号是22,服务器IP地址见第1步中创建的ECS服务器的公网地址。
在这里插入图片描述

3.点击Open按钮,会弹出一个安全警告,点击“是”表示信任该服务器,以后再连接该服务器就不会再弹窗了。
在这里插入图片描述

4.在连接会话窗口输入用户名root和密码,如下图所示表示成功连接服务器。
在这里插入图片描述

三、配置映射文件

编辑映射文件

root@hadoop001:~# vim /etc/hosts

在该文件中添加如下内容:

172.18.48.157   hadoop001       hadoop001
172.18.48.159   hadoop002       hadoop002
172.18.48.158   hadoop003       hadoop003

确认编辑正确:

root@hadoop001:~# cat /etc/hosts
127.0.0.1       localhost

# The following lines are desirable for IPv6 capable hosts
::1     localhost       ip6-localhost   ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

172.18.48.157   hadoop001       hadoop001
172.18.48.159   hadoop002       hadoop002
172.18.48.158   hadoop003       hadoop003

四、编写集群分发脚本

1.需求:循环复制文件到所有节点的相同目录下
2.需求分析:
(a)期望脚本使用方法:

xsync <要同步的文件路径>

(b)说明:在/usr/local/bin这个目录下存放的脚本,root用户可以在系统任何地方直接执行。
3.脚本实现
(a)在/usr/local/bin目录下xsync创建文件,操作如下:

root@hadoop001:~# cd /usr/local/bin
root@hadoop001:/usr/local/bin# touch xsync
root@hadoop001:/usr/local/bin# vim xsync
root@hadoop001:/usr/local/bin#

在该文件中编写如下代码

#!/bin/bash
#1获取输入参数个数,如果没有参数,直接退出
pcount=$#
if((pcount==0));then
    echo no args;
    exit;
fi

#2获取文件路径
p1=$1
fname=`basename $p1`
echo fname=$fname

#3获取上级目录到绝对路径
pdir=`cd -P $(dirname $p1);pwd`
echo pdir=$pdir

#4获取当前用户名称
user=`whoami`

#5循环
for((num=2;num<4;num++));do
    host=$(printf "%03d" "$num")
    echo -------------------hadoop$host--------------
    rsync -rvl $pdir/$fname $user@hadoop$host:$pdir
done

(b)修改脚本xsync具有执行权限

root@hadoop001:/usr/local/bin# chmod 777 xsync

(c)调用脚本形式:

xsync <文件路径>

(d)分发集群分发脚本

root@hadoop001:~# xsync /usr/local/bin/xsync
fname=xsync
pdir=/usr/local/bin
-------------------hadoop002--------------
sending incremental file list
xsync

sent 602 bytes  received 35 bytes  1,274.00 bytes/sec
total size is 514  speedup is 0.81
-------------------hadoop003--------------
sending incremental file list
xsync

sent 602 bytes  received 35 bytes  1,274.00 bytes/sec
total size is 514  speedup is 0.81
root@hadoop001:~#

(e)分发映射文件

root@hadoop001:~# xsync /etc/hosts

五、安装JDK

1.执行以下命令,下载JDK1.8安装包。

wget https://download.java.net/openjdk/jdk8u41/ri/openjdk-8u41-b04-linux-x64-14_jan_2020.tar.gz

2.执行以下命令,解压下载的JDK1.8安装包。

tar -xzf openjdk-8u41-b04-linux-x64-14_jan_2020.tar.gz

3.执行以下命令,移动并重命名JDK包。

mv java-se-8u41-ri/ /usr/java8

4.执行以下命令,配置Java环境变量。

echo 'export JAVA_HOME=/usr/java8' >> /etc/profile
echo 'export PATH=$PATH:$JAVA_HOME/bin' >> /etc/profile
source /etc/profile

5.执行以下命令,查看Java是否成功安装。

root@hadoop001:~# java -version

如果返回以下信息,则表示安装成功。

root@hadoop001:~# java -version
openjdk version "1.8.0_41"
OpenJDK Runtime Environment (build 1.8.0_41-b04)
OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode)

六、安装Hadoop

1.执行以下命令,下载Hadoop安装包。

wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz

注1:Apache开源镜像服务器只会保存最新的几个Hadoop版本,当前Hadoop2最新版本号为2.10.1。
在这里插入图片描述

注2:如果这个镜像服务器mirrors.tuna.tsinghua.edu.cn下载不了,可以去镜像列表网站http://www.apache.org/mirrors/选择一个可用的镜像服务器
在这里插入图片描述

2.执行以下命令,解压Hadoop安装包至/opt/hadoop。

tar -xzf hadoop-2.10.1.tar.gz -C /opt/
mv /opt/hadoop-2.10.1 /opt/hadoop

3.执行以下命令,配置Hadoop环境变量。

echo 'export HADOOP_HOME=/opt/hadoop/' >> /etc/profile
echo 'export PATH=$PATH:$HADOOP_HOME/bin' >> /etc/profile
echo 'export PATH=$PATH:$HADOOP_HOME/sbin' >> /etc/profile
source /etc/profile

4.执行以下命令,修改配置文件yarn-env.sh和hadoop-env.sh。

echo "export JAVA_HOME=/usr/java8" >> /opt/hadoop/etc/hadoop/yarn-env.sh
echo "export JAVA_HOME=/usr/java8" >> /opt/hadoop/etc/hadoop/hadoop-env.sh
echo "export JAVA_HOME=/usr/java8" >> /opt/hadoop/etc/hadoop/mapred-env.sh

5.执行以下命令,测试Hadoop是否安装成功。

hadoop version

如果返回以下信息,则表示安装成功。

root@hadoop001:~# hadoop version
Hadoop 2.10.1
Subversion https://github.com/apache/hadoop -r 1827467c9a56f133025f28557bfc2c562d78e816
Compiled by centos on 2020-09-14T13:17Z
Compiled with protoc 2.5.0
From source with checksum 3114edef868f1f3824e7d0f68be03650
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-2.10.1.jar

七、集群配置

1.集群部署规划

hadoop001hadoop002hadoop003
HDFSNameNode
DataNode

DataNode
SecondaryNameNode
DataNode
YARN
NodeManager
ResourceManager
NodeManager

NodeManager

规划解释:
HDFS:NameNode 和 SecondaryNameNode 占用内存几乎是一比一,这就要求它们两个不能放在一台服务器上,避免服务器性能下降(比如一个内存为128G 的服务器 两个在一台服务器上的话,那么每个就只有64个G了,严重影响集群性能)。

YARN:ResourceManager 是整个集群资源的老大,它也是很耗费内存的的,所以它要避开上面的 NameNode 和 SecondaryNameNode,满足条件的就只有 hadoop002 这台服务器了。

2.配置集群

root@hadoop001:~# cd /opt/hadoop/etc/hadoop/
root@hadoop001:/opt/hadoop/etc/hadoop# ll
total 172
drwxr-xr-x 2 1000 1000  4096 Sep 14 21:39 ./
drwxr-xr-x 3 1000 1000  4096 Sep 14 21:39 ../
-rw-r--r-- 1 1000 1000  8814 Sep 14 21:39 capacity-scheduler.xml
-rw-r--r-- 1 1000 1000  1335 Sep 14 21:39 configuration.xsl
-rw-r--r-- 1 1000 1000  1211 Sep 14 21:39 container-executor.cfg
-rw-r--r-- 1 1000 1000   774 Sep 14 21:39 core-site.xml
-rw-r--r-- 1 1000 1000  4133 Sep 14 21:39 hadoop-env.cmd
-rw-r--r-- 1 1000 1000  4997 Dec 24 11:29 hadoop-env.sh
-rw-r--r-- 1 1000 1000  2598 Sep 14 21:39 hadoop-metrics2.properties
-rw-r--r-- 1 1000 1000  2490 Sep 14 21:39 hadoop-metrics.properties
-rw-r--r-- 1 1000 1000 10206 Sep 14 21:39 hadoop-policy.xml
-rw-r--r-- 1 1000 1000   775 Sep 14 21:39 hdfs-site.xml
-rw-r--r-- 1 1000 1000  2432 Sep 14 21:39 httpfs-env.sh
-rw-r--r-- 1 1000 1000  1657 Sep 14 21:39 httpfs-log4j.properties
-rw-r--r-- 1 1000 1000    21 Sep 14 21:39 httpfs-signature.secret
-rw-r--r-- 1 1000 1000   620 Sep 14 21:39 httpfs-site.xml
-rw-r--r-- 1 1000 1000  3518 Sep 14 21:39 kms-acls.xml
-rw-r--r-- 1 1000 1000  3139 Sep 14 21:39 kms-env.sh
-rw-r--r-- 1 1000 1000  1788 Sep 14 21:39 kms-log4j.properties
-rw-r--r-- 1 1000 1000  5939 Sep 14 21:39 kms-site.xml
-rw-r--r-- 1 1000 1000 14016 Sep 14 21:39 log4j.properties
-rw-r--r-- 1 1000 1000  1076 Sep 14 21:39 mapred-env.cmd
-rw-r--r-- 1 1000 1000  1535 Dec 24 11:47 mapred-env.sh
-rw-r--r-- 1 1000 1000  4113 Sep 14 21:39 mapred-queues.xml.template
-rw-r--r-- 1 1000 1000   758 Sep 14 21:39 mapred-site.xml.template
-rw-r--r-- 1 1000 1000    10 Sep 14 21:39 slaves
-rw-r--r-- 1 1000 1000  2316 Sep 14 21:39 ssl-client.xml.example
-rw-r--r-- 1 1000 1000  2697 Sep 14 21:39 ssl-server.xml.example
-rw-r--r-- 1 1000 1000  2250 Sep 14 21:39 yarn-env.cmd
-rw-r--r-- 1 1000 1000  4904 Dec 24 11:28 yarn-env.sh
-rw-r--r-- 1 1000 1000   690 Sep 14 21:39 yarn-site.xml

(1)核心配置文件
配置core-site.xml

root@hadoop001:/opt/hadoop/etc/hadoop# vi core-site.xml

在该文件中编写如下配置

    <!--指定HDFS中NameNode的地址-->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop001:9000</value>
    </property>
    <!--指定Hadoop运行时产生文件的存储目录-->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop/tmp</value>
    </property>

(2)HDFS配置文件
配置hdfs-site.xml

root@hadoop001:/opt/hadoop/etc/hadoop# vi hdfs-site.xml

在该文件中编写如下配置

    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!--指定Hadoop辅助名称节点主机配置-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop003:50090</value>
    </property>

(3)YARN配置文件
配置yarn-site.xml

root@hadoop001:/opt/hadoop/etc/hadoop# vi yarn-site.xml

在该文件中增加如下配置

    <!--Reducer获取数据的方式-->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!--指定YARN的ResourceManager的地址-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop002</value>
    </property>

(4)MapReduce配置文件
配置mapred-site.xml

root@hadoop001:/opt/hadoop/etc/hadoop# cp mapred-site.xml.template mapred-site.xml
root@hadoop001:/opt/hadoop/etc/hadoop# vi mapred-site.xml

在该文件中增加如下配置

    <!--指定MR运行在Yarn上-->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

八、配置SSH免密登录

1.执行以下命令,创建公钥和私钥。

ssh-keygen -t rsa

敲3次回车,就会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)。
在这里插入图片描述

2.将公钥拷贝到要免密登录的目标机器上

将公钥拷贝到本机

root@hadoop001:~# ssh-copy-id hadoop001
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'hadoop001 (172.18.48.157)' can't be established.
ECDSA key fingerprint is SHA256:Jc3LsU4S1uU33jfoIuQeUSWFKlCeK3K+nEGM0uDkdKw.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@hadoop001's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop001'"
and check to make sure that only the key(s) you wanted were added.

root@hadoop001:~#

将公钥拷贝到机器hadoop002

root@hadoop001:~# ssh-copy-id hadoop002
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'hadoop002 (172.18.48.159)' can't be established.
ECDSA key fingerprint is SHA256:GBx48Z7+IQMKUmsghmAvnV6Bd0NwP5GW2r/UNenYetM.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@hadoop002's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop002'"
and check to make sure that only the key(s) you wanted were added.

root@hadoop001:~# ssh hadoop002
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-124-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
New release '20.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.


Welcome to Alibaba Cloud Elastic Compute Service !

root@hadoop002:~#

现在当前机器hadoop001可以免密登录到目标机器hadoop002。
类似的,将公钥拷贝到机器hadoop003

root@hadoop002:~# logout
Connection to hadoop002 closed.
root@hadoop001:~# ssh-copy-id hadoop003

注意:
还需要在hadoop002上配置一下无密登录到hadoop001、hadoop002、hadoop003服务器上。(应该在ResouceManager所在的机器上启动YARN,而hadoop002被规划为ResouceManager)

九、分发运行环境

1.分发JDK和Hadoop的安装目录和系统配置到集群

root@hadoop001:~# xsync /usr/java8
root@hadoop001:~# xsync /etc/profile
root@hadoop001:~# xsync /opt/hadoop

2.验证分发结果

root@hadoop001:~# ssh hadoop002
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-124-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
New release '20.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.


Welcome to Alibaba Cloud Elastic Compute Service !

Last login: Thu Dec 24 11:23:58 2020 from 172.18.48.157
root@hadoop002:~# tail /etc/profile
      . $i
    fi
  done
  unset i
fi
export JAVA_HOME=/usr/java8
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/opt/hadoop/
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
root@hadoop002:~# tail /opt/hadoop/etc/hadoop/core-site.xml
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop001:9000</value>
    </property>
    <!--指定Hadoop运行时产生文件的存储目录-->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop/tmp</value>
    </property>
</configuration>
root@hadoop002:~#

十、群起集群

1.配置slaves

root@hadoop001:~# vi /opt/hadoop/etc/hadoop/slaves

在该文件中增加如下内容:

hadoop001
hadoop002
hadoop003

注意:该文件中添加的内容结尾不允许有空格,文件中不允许有空行。
同步所有节点配置文件

root@hadoop001:~# xsync /opt/hadoop/etc/hadoop/slaves

2.启动集群
(1)如果集群是第一次启动,需要格式化NameNode(注意格式化之前,一定要先停止上次启动的所有namenode和datanode进程,然后再删除data和log数据)

root@hadoop001:~# hdfs namenode -format

(2)启动HDFS

root@hadoop001:~# start-dfs.sh
Starting namenodes on [hadoop001]
hadoop001: starting namenode, logging to /opt/hadoop/logs/hadoop-root-namenode-hadoop001.out
hadoop002: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-hadoop002.out
hadoop001: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-hadoop001.out
hadoop003: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-hadoop003.out
Starting secondary namenodes [hadoop003]
hadoop003: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-root-secondarynamenode-hadoop003.out
root@hadoop001:~# jps
2578 NameNode
2725 DataNode
2939 Jps


root@hadoop001:~# ssh hadoop002
Welcome to Alibaba Cloud Elastic Compute Service !

Last login: Thu Dec 24 12:01:48 2020 from 172.18.48.157
root@hadoop002:~# jps
2185 DataNode
2302 Jps
root@hadoop002:~# logout
Connection to hadoop002 closed.


root@hadoop001:~# ssh hadoop003
Welcome to Alibaba Cloud Elastic Compute Service !

Last login: Thu Dec 24 11:25:26 2020 from 172.18.48.157
root@hadoop003:~# jps
1944 DataNode
2140 Jps
2060 SecondaryNameNode

(3)启动YARN

root@hadoop001:~# ssh hadoop002
Welcome to Alibaba Cloud Elastic Compute Service !

Last login: Thu Dec 24 14:29:19 2020 from 172.18.48.157

root@hadoop002:~# source /etc/profile
root@hadoop002:~# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/logs/yarn-root-resourcemanager-hadoop002.out
hadoop003: starting nodemanager, logging to /opt/hadoop/logs/yarn-root-nodemanager-hadoop003.out
hadoop001: starting nodemanager, logging to /opt/hadoop/logs/yarn-root-nodemanager-hadoop001.out
hadoop002: starting nodemanager, logging to /opt/hadoop/logs/yarn-root-nodemanager-hadoop002.out

root@hadoop002:~# jps
2825 Jps
2185 DataNode
2490 NodeManager
2365 ResourceManager


root@hadoop002:~# logout
Connection to hadoop002 closed.
root@hadoop001:~# jps
2578 NameNode
2725 DataNode
3013 NodeManager
3130 Jps


root@hadoop001:~# ssh hadoop003
Welcome to Alibaba Cloud Elastic Compute Service !

Last login: Thu Dec 24 14:29:28 2020 from 172.18.48.157
root@hadoop003:~# jps
2356 Jps
1944 DataNode
2201 NodeManager
2060 SecondaryNameNode

注意:NameNode和ResourceManger如果不是同一台机器,不能在NameNode上启动YARN,应该在ResouceManager所在的机器上启动YARN。

3.集群基本测试
(1)上传文件到集群
创建测试文件

root@hadoop001:~# mkdir wcinput
root@hadoop001:~# vim wcinput/wc.input

在文件中输入如下内容

hadoop yarn
hadoop mapreduce
hdfs
hdfs

上传小文件

root@hadoop001:~# hdfs dfs -mkdir -p /user/root/input
root@hadoop001:~# hdfs dfs -put wcinput/wc.input /user/root/input

上传大文件

root@hadoop001:~# hadoop fs -put hadoop-2.10.1.tar.gz  /user/root/input

(2)上传文件后查看文件存放在什么位置
(a)查看HDFS文件存储路径

root@hadoop001:~# cd /opt/hadoop/tmp/dfs/data/current/BP-1886745917-172.18.48.157-1608791238127/current/finalized/subdir0/subdir0/
root@hadoop001:/opt/hadoop/tmp/dfs/data/current/BP-1886745917-172.18.48.157-1608791238127/current/finalized/subdir0/subdir0# ll
total 402164
drwxr-xr-x 2 root root      4096 Dec 24 15:13 ./
drwxr-xr-x 3 root root      4096 Dec 24 15:12 ../
-rw-r--r-- 1 root root        39 Dec 24 15:12 blk_1073741825
-rw-r--r-- 1 root root        11 Dec 24 15:12 blk_1073741825_1001.meta
-rw-r--r-- 1 root root 134217728 Dec 24 15:13 blk_1073741826
-rw-r--r-- 1 root root   1048583 Dec 24 15:13 blk_1073741826_1002.meta
-rw-r--r-- 1 root root 134217728 Dec 24 15:13 blk_1073741827
-rw-r--r-- 1 root root   1048583 Dec 24 15:13 blk_1073741827_1003.meta
-rw-r--r-- 1 root root 134217728 Dec 24 15:13 blk_1073741828
-rw-r--r-- 1 root root   1048583 Dec 24 15:13 blk_1073741828_1004.meta
-rw-r--r-- 1 root root   5933927 Dec 24 15:13 blk_1073741829
-rw-r--r-- 1 root root     46367 Dec 24 15:13 blk_1073741829_1005.meta

(b)查看HDFS在磁盘存储文件内容

root@hadoop001:/opt/hadoop/tmp/dfs/data/current/BP-1886745917-172.18.48.157-1608791238127/current/finalized/subdir0/subdir0# cat blk_1073741825
hadoop yarn
hadoop mapreduce
hdfs
hdfs

(c)另外四个大文件blk_1073741826、blk_1073741827、blk_1073741828和blk_1073741829就是hadoop-2.10.1.tar.gz分块后的块文件

十一、集群启动/停止方式总结

1.各个服务组件逐一启动/停止
(1)分别启动/停止HDFS组件

hadoop-daemon.sh start/stop namenode/datanode/secondarynamenode

(2)启动/停止YARN

yarn-daemon.sh start/stop resourcemanager/nodemanager

2.各个模块分开启动/停止(配置ssh免密登录是前提)常用
(1)整体启动/停止HDFS(在NameNode机器上执行)

start-dfs.sh/stop-dfs.sh

(2)整体启动/停止YARN(在ResourceManager机器上执行)

start-yarn.sh/stop-yarn.sh
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值