Hadoop大数据平台

本文详细介绍了Hadoop的单机、伪分布式和全分布式安装配置过程,包括HDFS、MapReduce、YARN的设置,以及Zookeeper高可用集群的搭建。通过实例展示了数据的上传、处理和故障切换,揭示了Hadoop在大规模数据存储和计算中的应用。
摘要由CSDN通过智能技术生成


1.Hadoop简介

- hadoop主流版本:
- Apache基金会所开发的分布式系统基础架构(hadoop);
- Cloudera版本(Cloudera’s Distribution Including Apache Hadoop,简称“CDH”),企业化版本;
- Hortonworks版本(Hortonworks Data Platform,简称“HDP”),常用版本。

  • hadoop的框架最核心的设计是底层的分布式存储(HDFS)和分布式计算(MapReduce)

    • HDFS为海量的数据提供了存储。
    • MapReduce为海量的数据提供了计算。
  • Hadoop框架包括以下四个模块

    • Hadoop Common: 这些是其他Hadoop模块所需的Java库和实用程
      序。这些库提供文件系统和操作系统级抽象,并包含启动Hadoop所
      需的Java文件和脚本
    • Hadoop YARN: 这是一个用于作业调度和集群资源管理的框架。
    • Hadoop Distributed File System (HDFS): 分布式文件系统,最底层的基础设施,提供对应用程序数据的高吞吐量访问
    • Hadoop MapReduce:用来做离线的,这是基于YARN的用于并行处理大数据集的系统。
  • hadoop应用场景: 在线旅游、移动数据、电子商务、能源开采与节能、基础架构管理、图像处理、诈骗检测、IT安全、医疗保健。

用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。Hadoop实现了一个分布式文件系统( Distributed File System),其中一个组件是HDFS(Hadoop Distributed File System)。HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,而MapReduce则为海量的数据提供了计算 [1] 。

1.HDFS单机版

##新建五个快照server1,2,3,4,5
官网:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

##创建hadoop用户,在普通用户下执行
[root@server1 ~]# useradd hadoop
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ pwd
/home/hadoop
[hadoop@server1 ~]$ ls##下载hadoop和jdk
hadoop-3.2.1.tar.gz  jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ tar zxf jdk-8u181-linux-x64.tar.gz 
[hadoop@server1 ~]$ ls
hadoop-3.2.1.tar.gz  jdk1.8.0_181  jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ ln -s jdk1.8.0_181/ java##软连接

[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ ls
bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share
[hadoop@server1 hadoop]$ cd etc/hadoop/
[hadoop@server1 hadoop]$ vim hadoop-env.sh 
##修改路径
export JAVA_HOME=/home/hadoop/java
export HADOOP_HOME=/home/hadoop/hadoop

在这里插入图片描述

[hadoop@server1 hadoop]$ cd ..
[hadoop@server1 etc]$ cd ..
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ mkdir input
[hadoop@server1 hadoop]$ cp etc/hadoop/*.xml input
[hadoop@server1 hadoop]$ ls input/
[hadoop@server1 hadoop]$ bin/hadoop jar
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar 
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'##output目录是不存在的,如存在会报错
[hadoop@server1 hadoop]$ ls
bin  include  lib      LICENSE.txt  output      sbin
etc  input    libexec  NOTICE.txt   README.txt  share
[hadoop@server1 hadoop]$ cd output/
[hadoop@server1 output]$ ls
part-r-00000  _SUCCESS
[hadoop@server1 output]$ cat *
1	dfsadmin

2.伪分布式

[hadoop@server1 hadoop]$ ls
bin  include  lib      LICENSE.txt  output      sbin
etc  input    libexec  NOTICE.txt   README.txt  share
[hadoop@server1 hadoop]$ vim etc/hadoop/core-site.xml 
#最后添加
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
[hadoop@server1 hadoop]$ vim etc/hadoop/hdfs-site.xml 
#最后添加
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
##免密
[hadoop@server1 hadoop]$ ssh-keygen 
[hadoop@server1 hadoop]$ logout
[root@server1 ~]# passwd hadoop
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ ssh-copy-id localhost
[hadoop@server1 ~]$ ssh localhost
[hadoop@server1 ~]$ logout
[hadoop@server1 ~]$ cd hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ ll workers 
-rw-r--r-- 1 hadoop hadoop 10 Sep 10  2019 workers
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format##初始化
[hadoop@server1 hadoop]$ sbin/start-dfs.sh

网页访问172.25.3.1:9870

在这里插入图片描述

[hadoop@server1 ~]$ vim .bash_profile
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$HOME/java/bin
[hadoop@server1 ~]$ source .bash_profile
[hadoop@server1 ~]$ jps##运行的进程
24850 Jps
24679 SecondaryNameNode
24377 NameNode
24490 DataNode
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls##会报错,需创建家目录
ls: `.': No such file or directory
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
[hadoop@server1 hadoop]$ ls
bin  include  lib      LICENSE.txt  NOTICE.txt  README.txt  share
etc  input    libexec  logs         output      sbin
[hadoop@server1 hadoop]$ rm -fr output/
[hadoop@server1 hadoop]$ bin/hdfs dfs -put input##上传到分布式文件系统里

刷新网页访问172.25.3.1:9870,有上传的文件

在这里插入图片描述

[hadoop@server1 hadoop]$ rm -fr input
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls##命令查看上传内容
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2021-04-24 10:52 input
drwxr-xr-x   - hadoop supergroup          0 2021-04-24 10:57 output
[hadoop@server1 hadoop]$ bin/hdfs dfs -cat output/*

[hadoop@server1 hadoop]$ bin/hdfs dfs -get output##可以下载下来
[hadoop@server1 hadoop]$ rm -fr output/##删除本地的不会影响网页

[hadoop@server1 hadoop]$ bin/hdfs dfs -rm -r output##删除网页上的
[hadoop@server1 hadoop]$ sbin/stop-dfs.sh 

3.全分布式

##新建快照server2,3
[hadoop@server1 hadoop]$ sbin/stop-dfs.sh 
[root@server1 ~]# yum install nfs-utils -y
[root@server2 ~]# useradd hadoop
[root@server2 ~]# echo westos | passwd --stdin hadoop
[root@server2 ~]# yum install nfs-utils -y
[root@server3 ~]# useradd hadoop
[root@server3 ~]# echo westos | passwd --stdin hadoop
[root@server3 ~]# yum install nfs-utils -y
##nfs挂接到一个节点上,server1,2,3都是免密、同步的
[root@server1 ~]# id hadoop
uid=1001(hadoop) gid=1001(hadoop) groups=1001(hadoop)
[root@server1 ~]# vim /etc/exports
[root@server1 ~]# cat /etc/exports
/home/hadoop	*(rw,anonuid=1001,anongid=1001)
[root@server1 ~]# systemctl start nfs
[root@server1 ~]# showmount -e
Export list for server1:
/home/hadoop *
[root@server2 ~]# showmount -e 172.25.3.1
Export list for 172.25.3.1:
/home/hadoop *
[root@server2 ~]# mount 172.25.3.1:/home/hadoop/ /home/hadoop/
172.25.3.1:/home/hadoop  17811456 3007744  14803712  17% /home/hadoop
[root@server2 ~]# su - hadoop
[hadoop@server2 ~]$ ls
hadoop  hadoop-3.2.1  hadoop-3.2.1.tar.gz  java  jdk1.8.0_181  jdk-8u181-linux-x64.tar.gz
[hadoop@server2 ~]$ cd hadoop
[hadoop@server2 hadoop]$ ls
bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share

[root@server3 ~]# showmount -e 172.25.3.1
Export list for 172.25.3.1:
/home/hadoop *
[root@server3 ~]# mount 172.25.3.1:/home/hadoop/ /home/hadoop/
172.25.3.1:/home/hadoop  17811456 3007744  14803712  17% /home/hadoop
[root@server3 ~]# su - hadoop
[hadoop@server3 ~]$ ls
hadoop  hadoop-3.2.1  hadoop-3.2.1.tar.gz  java  jdk1.8.0_181  jdk-8u181-linux-x64.tar.gz
[hadoop@server3 ~]$ cd hadoop
[hadoop@server3 hadoop]$ ls
bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share
[hadoop@server2 hadoop]$ ssh server2
##挂接到一个节点上,server1,2,3都是免密、同步的

在这里插入图片描述

[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ vim etc/hadoop/core-site.xml 
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://server1:9000</value>
    </property>
</configuration>
[hadoop@server1 hadoop]$ vim etc/hadoop/hdfs-site.xml##两个节点。可以保存两份
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
</configuration>

[hadoop@server1 hadoop]$ vim etc/hadoop/workers 
[hadoop@server1 hadoop]$ cat etc/hadoop/workers
server2
server3
[hadoop@server1 hadoop]$ rm -fr /tmp/*
[hadoop@server1 hadoop]$ bin/hdfs namenode -format##重新格式化,启动
[hadoop@server1 hadoop]$ sbin/start-dfs.sh 
Starting namenodes on [server1]
Starting datanodes
Starting secondary namenodes [server1]
##work节点上运行的是datanode,用于存储数据
[hadoop@server1 hadoop]$ jps
29253 Jps
28908 NameNode
29133 SecondaryNameNode
[hadoop@server2 hadoop]$ jps
24130 DataNode
24191 Jps
[hadoop@server3 hadoop]$ jps
24124 Jps
24061 DataNode
[hadoop@server1 hadoop]$ dd if=/dev/zero of=bigfile bs=1M count=200
[hadoop@server1 hadoop]$ bin/hdfs dfsadmin -report
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -put bigfile 

刷新网页访问172.25.3.1:9870,有上传的文件bigfile
在这里插入图片描述
在这里插入图片描述

热添加节点(在线添加server4节点)

##新建快照server4
[root@server4 ~]# useradd hadoop
[root@server4 ~]# yum install nfs-utils -y
[root@server4 ~]# mount 172.25.3.1:/home/hadoop/ /home/hadoop/
172.25.3.1:/home/hadoop  17811456 3007744  14803712  17% /home/hadoop
[root@server4 ~]# su - hadoop
[hadoop@server4 ~]$ ls
hadoop  hadoop-3.2.1  hadoop-3.2.1.tar.gz  java  jdk1.8.0_181  jdk-8u181-linux-x64.tar.gz
[hadoop@server4 ~]$ cat hadoop/etc/hadoop/workers 
server2
server3
server4
[hadoop@server4 ~]$ cd hadoop
[hadoop@server4 hadoop]$ bin/hdfs --daemon start datanode
[hadoop@server4 hadoop]$ jps
13909 Jps
13846 DataNode
[hadoop@server4 hadoop]$ mv bigfile demo
[hadoop@server4 hadoop]$ bin/hdfs dfs -put demo

刷新网页访问172.25.3.1:9870,有上传的文件demo,在server4上上传,会就近分配给server4,再给其他,保证负载均衡
在这里插入图片描述

4.分布式存储

hdfs读写和容错机制原理漫画讲解:https://www.dazhuanlan.com/2019/10/23/5db06032e0463/
官方文档:hadoop+zookeeper

读写原理在这里插入图片描述在这里插入图片描述在这里插入图片描述

[hadoop@server1 hadoop]$ vim etc/hadoop/mapred-site.xml 
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>

[hadoop@server1 hadoop]$ vim etc/hadoop/hadoop-env.sh 
#添加
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop
[hadoop@server1 hadoop]$ vim etc/hadoop/yarn-site.xml 
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>#白名单
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

[hadoop@server1 hadoop]$ sbin/start-yarn.sh 
Starting resourcemanager
Starting nodemanagers
server4: Warning: Permanently added 'server4,172.25.3.4' (ECDSA) to the list of known hosts.
[hadoop@server1 hadoop]$ jps##ResourceManager、NodeManager资源、作业管理器
30312 ResourceManager
28908 NameNode
30620 Jps
29133 SecondaryNameNode
[hadoop@server2 hadoop]$ jps
24130 DataNode
24727 Jps
24633 NodeManager

网页访问http://172.25.3.1:8088/

在这里插入图片描述

刷新网页访问172.25.3.1:9870,有上传的文件demo,在server4上上传,会就近分配给server4,再给其他,保证负载均衡

在这里插入图片描述

在这里插入图片描述

5.Zookeeper 高可用集群(hadoop自带的nn高可用)

%5个快照,server1和5两个master是2G,三个node1G。Zookeeper 集群至少三台,总节点数为奇数个
%新开server5,2G

1.集群搭建

[root@server5 ~]# useradd hadoop
[root@server5 ~]# yum install nfs-utils -y
[root@server5 ~]# su - hadoop
[root@server5 ~]# mount 172.25.3.1:/home/hadoop/ /home/hadoop/
172.25.3.1:/home/hadoop  17811456 3007744  14803712  17% /home/hadoop


[hadoop@server1 hadoop]$ sbin/stop-yarn.sh 
[hadoop@server1 hadoop]$ sbin/stop-dfs.sh 
[hadoop@server1 hadoop]$ jps
31263 Jps
[hadoop@server1 hadoop]$ rm -fr /tmp/*
[hadoop@server2 hadoop]$ rm -fr /tmp/*
[hadoop@server3 hadoop]$ rm -fr /tmp/*
[hadoop@server4 hadoop]$ rm -fr /tmp/*


[hadoop@server1 ~]$ ls##下载zookeeper-3.4.9.tar.gz
zookeeper-3.4.9.tar.gz
[hadoop@server1 ~]$ tar zxf zookeeper-3.4.9.tar.gz 
[hadoop@server1 ~]$ cd zookeeper-3.4.9/conf/
[hadoop@server1 conf]$ ls
configuration.xsl  log4j.properties  zoo_sample.cfg
[hadoop@server1 conf]$ cp zoo_sample.cfg zoo.cfg
[hadoop@server1 conf]$ vim zoo.cfg 
##最后添加
server.1=172.25.3.2:2888:3888
server.2=172.25.3.3:2888:3888
server.3=172.25.3.4:2888:3888

参数
server.x=[hostname]:nnnnn[:nnnnn]
这里的 x 是一个数字,与 myid 文件中的 id 是一致的。右边可以配置两个端口,第一个端口
用于 F 和 L 之间的数据同步和其它通信,第二个端口用于 Leader 选举过程中投票通信。

[hadoop@server2 ~]$ mkdir /tmp/zookeeper
[hadoop@server2 ~]$ echo 1 > /tmp/zookeeper/myid
[hadoop@server3 ~]$ mkdir /tmp/zookeeper
[hadoop@server3 ~]$ echo 2 > /tmp/zookeeper/myid
[hadoop@server4 ~]$ mkdir /tmp/zookeeper
[hadoop@server4 ~]$ echo 3 > /tmp/zookeeper/myid
###在各节点启动服务
[hadoop@server2 ~]$ cd zookeeper-3.4.9/
[hadoop@server2 zookeeper-3.4.9]$ bin/zkServer.sh start
[hadoop@server3 zookeeper-3.4.9]$ bin/zkServer.sh start
[hadoop@server4 zookeeper-3.4.9]$ bin/zkServer.sh start

[hadoop@server3 zookeeper-3.4.9]$ bin/zkServer.sh status
Mode: leader
[hadoop@server4 zookeeper-3.4.9]$ bin/zkServer.sh status
Mode: follower

2.hadoop配置

[hadoop@server1 ~]$ vim hadoop/etc/hadoop/core-site.xml 
<configuration>
    <!-- 指定 hdfs 的 namenode 为 masters (名称可自定义)-->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://masters</value>
    </property>
    
<!-- 指定 zookeeper 集群主机地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>172.25.3.2:2181,172.25.3.3:2181,172.25.3.4:2181</value>
</property>
</configuration>

[hadoop@server1 ~]$ vim hadoop/etc/hadoop/hdfs-site.xml 
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>

<!-- 指定 hdfs 的 nameservices 为 masters,和 core-site.xml 文件中的设置保持一
致 -->
<property>
<name>dfs.nameservices</name>
<value>masters</value>
</property>

<!-- masters 下面有两个 namenode 节点,分别是 h1 和 h2 (名称可自定义)
-->
<property>
<name>dfs.ha.namenodes.masters</name>
<value>h1,h2</value>
</property>

<!-- 指定 h1 节点的 rpc 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.masters.h1</name>
<value>172.25.3.1:9000</value>
</property>

<!-- 指定 h1 节点的 http 通信地址 -->
<property>
<name>dfs.namenode.http-address.masters.h1</name>
<value>172.25.3.1:9870</value>
</property>

<!-- 指定 h2 节点的 rpc 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.masters.h2</name>
<value>172.25.3.5:9000</value>
</property>

<!-- 指定 h2 节点的 http 通信地址 -->
<property>
<name>dfs.namenode.http-address.masters.h2</name>
<value>172.25.3.5:9870</value>
</property>

<!-- 指定 NameNode 元数据在 JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://172.25.3.2:8485;172.25.3.3:8485;172.25.3.4:8485/masters</value>
</property>

<!-- 指定 JournalNode 在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/tmp/journaldata</value>
</property>

<!-- 开启 NameNode 失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>

<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.masters</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<!-- 配置隔离机制方法,每个机制占用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>

<!-- 使用 sshfence 隔离机制时需要 ssh 免密码 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>

<!-- 配置 sshfence 隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>

3.启动 hdfs 集群(按顺序启动)

###第一次启动,2,3,4上启动日志节点,然后server1初始化
[hadoop@server2 ~]$ cd hadoop
[hadoop@server2 hadoop]$ bin/hdfs --daemon start journalnode
[hadoop@server2 hadoop]$ jps
25078 Jps
24907 QuorumPeerMain
25038 JournalNode
[hadoop@server3 hadoop]$ bin/hdfs --daemon start journalnode
[hadoop@server4 hadoop]$ bin/hdfs --daemon start journalnode

[hadoop@server1 hadoop]$ bin/hdfs namenode -format##格式化HDFS 集群
[hadoop@server1 hadoop]$ ls /tmp/
hadoop-hadoop  hadoop-hadoop-namenode.pid  hsperfdata_hadoop
[hadoop@server1 hadoop]$ scp -r /tmp/hadoop-hadoop server5:/tmp
[hadoop@server1 hadoop]$ bin/hdfs zkfc -formatZK##格式化 zookeeper
##启动 hdfs 集群(只需在 h1 上执行即可)
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
[hadoop@server1 hadoop]$ jps
2885 DFSZKFailoverController##故障控制器
2938 Jps
2524 NameNode

查看各节点状态

[hadoop@server3 zookeeper-3.4.9]$ bin/zkCli.sh 
[zk: localhost:2181(CONNECTED) 21] ls /hadoop-ha/masters
[zk: localhost:2181(CONNECTED) 23] get /hadoop-ha/masters/ActiveBreadCrumb
mastersh1server1 �F(�>

在这里插入图片描述

##网页访问172.25.3.1:9870,172.25.3.1:9870。server1正常运行,server5是standy

在这里插入图片描述
在这里插入图片描述

4.上传文件

[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
ls: `.': No such file or directory
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ ls
bin  demo  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share
[hadoop@server1 hadoop]$ bin/hdfs dfs -put demo

在这里插入图片描述
在这里插入图片描述

5.测试故障自动切换

##server1挂掉后server5接管,文件可以正常查看

[hadoop@server1 hadoop]$ jps
2885 DFSZKFailoverController
2524 NameNode
3215 Jps
[hadoop@server1 hadoop]$ kill 2524
[hadoop@server1 hadoop]$ bin/hdfs --daemon start namenode

##网页访问172.25.3.1:9870,172.25.3.1:9870.server5正常运行,server1是standy

6.yarn 管理器的高可用

[hadoop@server1 hadoop]$ vim etc/hadoop/yarn-site.xml 

<configuration>
<!-- 配置可以在 nodemanager 上运行 mapreduce 程序 -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>

<!-- 激活 RM 高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>

<!-- 指定 RM 的集群 id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>RM_CLUSTER</value>
</property>


<!-- 定义 RM 的节点-->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>

<!-- 指定 RM1 的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>172.25.3.1</value>
</property>

<!-- 指定 RM2 的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>172.25.3.5</value>
</property>

<!-- 激活 RM 自动恢复 -->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>

<!-- 配置 RM 状态信息存储方式,有 MemStore 和 ZKStore-->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>

<!-- 配置为 zookeeper 存储时,指定 zookeeper 集群的地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>172.25.3.2:2181,172.25.3.3:2181,172.25.3.4:2181</value>
</property>

[hadoop@server1 hadoop]$ sbin/start-yarn.sh
Starting resourcemanagers on [ 172.25.3.1 172.25.3.5]
Starting nodemanagers
[hadoop@server1 hadoop]$ jps
4272 Jps
3313 NameNode
2885 DFSZKFailoverController
3961 ResourceManager

[hadoop@server3 zookeeper-3.4.9]$ bin/zkCli.sh 
[zk: localhost:2181(CONNECTED) 26] get /yarn-leader-election/RM_CLUSTER/ActiveBreadCrumb
RM_CLUSTERrm1###rm1在运行

在这里插入图片描述

网页访问http://172.25.3.1:8088

在这里插入图片描述

7.测试 yarn 故障切换

[hadoop@server1 hadoop]$ jps
4352 Jps
3313 NameNode
2885 DFSZKFailoverController
3961 ResourceManager
[hadoop@server1 hadoop]$ kill 3961
##网页访问http://172.25.3.5:8088,此时server5是正常的
[hadoop@server3 zookeeper-3.4.9]$ bin/zkCli.sh 
[zk: localhost:2181(CONNECTED) 26] get /yarn-leader-election/RM_CLUSTER/ActiveBreadCrumb
RM_CLUSTERrm2###rm2在运行
[hadoop@server1 hadoop]$ bin/yarn --daemon start resourcemanager##再把server1起开,server1是standby

##网页访问http://172.25.3.5:8088

在这里插入图片描述

8.Hbase 分布式部署

[hadoop@server1 ~]$ ls
hbase-1.2.4-bin.tar.gz
[hadoop@server1 hbase-1.2.4]$ vim conf/hbase-env.sh 
# The java implementation to use.  Java 1.7+ required.
export JAVA_HOME=/home/hadoop/java
export HADOOP_HOME=/home/hadoop/hadoop
export HBASE_MANAGES_ZK=false

[hadoop@server1 hbase-1.2.4]$ vim conf/hbase-site.xml 
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://masters/hbase</value>
</property>

<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>

<property>
<name>hbase.zookeeper.quorum</name>
<value>172.25.3.2,172.25.3.3,172.25.3.4</value>
</property>

<property>
<name>hbase.master</name>
<value>h1</value>
</property>


</configuration>

[hadoop@server1 hbase-1.2.4]$ vim conf/regionservers 
[hadoop@server1 hbase-1.2.4]$ cat conf/regionservers 
172.25.3.3
172.25.3.4
172.25.3.2

[hadoop@server1 hbase-1.2.4]$ bin/hbase-daemon.sh start master
[hadoop@server1 hbase-1.2.4]$ jps
3313 NameNode
4434 ResourceManager
2885 DFSZKFailoverController
4950 Jps
4775 HMaster


[hadoop@server5 ~]$ cd hbase-1.2.4/
[hadoop@server5 hbase-1.2.4]$  bin/hbase-daemon.sh start master
[hadoop@server5 hbase-1.2.4]$ jps
26032 Jps
15094 DFSZKFailoverController
25927 HMaster
14971 NameNode
25565 ResourceManager

##网页访问http://172.25.3.1:16010

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值