Hadoop大数据

Hadoop

大数据

概述

  • 数据量越来越大,数据分析的实时性越来越强,数据结果的应用越来越广泛,大数据技术应运而生
  • 大数据:大数据是收集、整理、处理大容量数据集,并从中获得结果的技术总称

大数据处理框架

  • 处理框架:实际负责处理数据操作的一系列组件
  • 常见框架
    • 批处理框架:用于批量处理大数据集的处理框架,可对整个数据集进行操作。如Apache Hadoop
    • 流处理框架:用于对随时进入系统的数据进行实时计算,是一种“无数据边界”的操作方式。如Apache Storm,Apache Samza
    • 混合处理框架:一些大数据处理框架可同时处理批处理流处理工作负载。如:Apache Spark,Apache Flink

hadoop介绍

简介

  • Hadoop是一个由Apache基金会所开发的分布式系统基础架构。主要解决海量数据存储分析计算问题。
  • 广义上来说,Hadoop通常指一个更加广阔到概念——Hadoop生态圈
  • Hadoop是一个可靠可扩展的分布式计算的开源软件,可以从单个服务器扩展到数千台计算机。集群中每台计算机都提供本地计算和存储
  • Hadoop把硬件故障认为常态,通过软件把控故障,在软件水平实现高可用
  • Hadoop是一个大数据处理框架,允许使用简单的编程模型跨计算机集群分布式处理大型数据集

项目

核心项目

  • Hadoop HDFS:分布式文件系统,可提供对应用程序数据的高吞吐量访问
  • Hadoop YARN:作业调度和集群资源管理的框架
  • Hadoop MapReduce:基于YARN的系统,用于并行处理大型数据集
  • Hadoop Common:支持其他Hadoop模块的常用实用程序
  • Hadoop Ozone: Hadoop集群所提供的对象存储

其他项目

  • Ambari:基于Web的工具,用于配置,管理和监控Apache Hadoop集群,包括对Hadoop HDFS,HadoopMapReduce,Hive,HCatalog,HBase,ZooKeeper,Oozie,Pig和Sqoop的支持。Ambari还提供了一个用于查看集群运行状况的仪表板,例如热图,以及可视化查看MapReduce,Pig和Hive应用程序的功能,以及以用户友好的方式诊断其性能特征的功能
  • Avro:数据序列化系统
  • HBase:可扩展的分布式数据库,支持大型表的结构化数据存储。
  • Mahout:可扩展的机器学习和数据挖掘库
  • Spark:用于Hadoop数据的快速通用计算引擎。Spark提供了一种简单而富有表现力的编程模型,支持广泛的应用程序,包括ETL,机器学习,流处理和图形计算。
  • ZooKeeper:用于分布式应用程序的高性能协调服务

HDFS

介绍

  • HDFS:Hadoop Distributed File System,Hadoop分布式文件系统
  • HDFS它是一个高度容错性的系统,可以部署在廉价机上,提供高吞吐量的数据访问,适合具有超大数据集(海量数据分析,机器学习等)的应用程序
  • 特点:
    • 支持大数据文件:适合TB级别数据
    • 文件分块存储
    • 支持一次写入,多次读取,顺序读取
    • 支持廉价硬件
    • 支持硬件故障

读写流程

相关名词
  • Block:最基本的存储单位。将文件进行分块处理,通常是128M/块

  • Hadoop集群:一主多从架构

  • NameNode:主节点

    用于保存整个文件系统的目录信息、文件信息及分块信息

    功能:

    • 接收用户的操作请求
    • 维护文件系统的目录结构
    • 管理文件和Block之间的映射管理
    • 管理 block 和 DataNode 之间的映射
  • DataNode:从节点

    分布在廉价的计算机上,用于存储Block块文件

    具体储存方式:文件被分成块存储到 DataNode 的磁盘上,每个Block(块)可以设置多副本

写流程
  • 客户端向namenode发起请求
  • 客户端向datenode发起建立连接请求
  • 客户端向datenode存储数据

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IvzbDnzM-1644589954315)(Hadoop.assets/image-20211218165009195.png)]

读流程

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tCj4eTCG-1644589954316)(Hadoop.assets/image-20211218170621998.png)]

MapReduce

介绍

  • MapReduce:是一套从海量源数据提取、分析元素,最后返回结果集的方法
  • MapReduce框架的核心步骤主要分两部分:Map 和Reduce
  • Map把大数据分成小数据,进行分析计算,将结果通过洗牌的方式给Reduce。Reduce对Map的结果进行汇总。
  • 举例说明:
    • 求和:1+5+7+3+4+9+3+5+6=?
      [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-o0tEWkZv-1644589954317)(Hadoop.assets/image-20211218202712197.png)]

工作流程

  • 当向MapReduce 框架提交一个计算作业时,它会首先把计算作业拆分成若干个Map 任务,然后分配到不同的节点(DataNode)上去执行,每一个Map 任务处理输入数据中的一部分,当Map 任务完成后,它会生成一些中间文件,这些中间文件将会作为Reduce 任务的输入数据。Reduce 任务的主要目标就是把前面若干个Map 的输出汇总到一起并输出。

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-J5k3fkPe-1644589954317)(Hadoop.assets/image-20211218203455327.png)]

YARN

  • YARN: Yet An other Resouce Negotiator,另一种资源协调者

  • 主要功能:任务调度和集群资源管理,让hadoop平台性能及扩展性得到更好发挥

  • YARN是Hadoop 2.0新增加的一个子项目,弥补了Hadoop 1.0(MRv1)扩展性差、可靠性资源利用率低以及无法支持其他计算框架等不足,Hadoop的下一代计算框架MRv2将资源管理功能抽象成一个通用系统YARN。也就是说,YARN由MapReduce拆分而来

  • 优点:降低集群资源空闲,提高集群资源利用率。维护成本低。数据共享。 避免了集群之间移动数据。

  • 主从架构

    • ResourceManager:资源管理(主)

      负责对各个NodeManager上的资源进行统一管理和任务调度

    • NodeManager:节点管理(从)

      在各个计算节点运行,用于接收RM中ApplicationsManager的计算任务、启动/停止任务、和RM中Scheduler汇报并协商资源、监控并汇报本节点的情况

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gaEUAWSv-1644590091484)(Hadoop.assets/image-20211219180834368.png)]

Hadoop部署

常见部署分类

单机
  • 单机(本地模式)是Hadoop的默认部署模式
  • 当配置文件为空时,Hadoop完全运行在本地
  • 不需要与其他节点交互,单机(本地模式)就不使用HDFS,也不加载任何Hadoop的守护进程
  • 该模式主要用于开发调试MapReduce程序的应用逻辑
伪分布式
  • Hadoop守护进程运行在本地机器上,模拟一个小规模的的集群
  • 该模式在单机模式之上增加了代码调试功能,允许你检查内存使用情况,HDFS输入/输出,以及其他的守护进程交互
完全分布式

介绍

  • 单机和伪分布式部署仅用于测试环境,生产环境需要完全分布式部署
  • 完全分部式是真正利用多台Linux主机来进行部署Hadoop,对Linux机器集群进行规划,使得Hadoop各个模块分别部署在不同的多台机器上
  • 由于NameNode和ResourceManager一主多从的架构模式,需要对其做高可用

NameNode HA故障切换

  • 一个NameService下面有两个NameNode,分别处于ActiveStandby状态。通过Zookeeper进行协调选举,确保只有一个活跃的NameNode。一旦主(Active)宕掉,standby会切换成Active

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pl9NP4Y4-1644589954319)(Hadoop.assets/image-20211221143451770.png)]

  • ZKFailoverController作为一个ZK(ZooKeeper)集群的客户端,用来监控NN的状态信息。每个运行NN的节点必须要运行一个ZKFC(ZKFailoverControlle)

  • ZKFC功能

    • Health monitoring健康检查:zkfc定期对本地的NN发起health-check的命令,如果NN正确返回,那么这个NN被认为是OK的。否则被认为是失效节点
    • ZooKeeper session management:当本地NN是健康的时候,zkfc将会在zk中持有一个session。如果本地NN又正好是active的,那么zkfc还有持有一个"ephemeral"的节点作为锁,一旦本地NN失效了,那么这个节点将会被自动删除
    • ZooKeeper-based election主备选举:如果本地NN是健康的,并且zkfc发现没有其他的NN持有那个独占锁。那么他将试图去获取该锁,一旦成功,那么它就需要执行Failover,然后成为active的NN节点。

NameNode HA数据共享

  • Namenode主要维护两个文件,一个是fsimage,一个是editlog
  • fsimage保存了最新的元数据检查点,包含了整个HDFS文件系统的所有目录和文件的信息。对于文件来说包括了数据块描述信息、修改时间、访问时间等;对于目录来说包括修改时间、访问权限控制信息(目录所属用户,所在组)等。
  • editlog主要是在NameNode已经启动情况下对HDFS进行的各种更新操作进行记录,HDFS客户端执行所有的写操作都会被记录到editlog中,editlog保存在JournalNode节点
  • StandBy从JournalNode节点里读取editlog中数据进行同步

单机部署

[root@server5 ~]# java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

更换jdk(提前下载好)
[root@server5 ~]# tar xf jdk-8u191-linux-x64.tar.gz 
[root@server5 ~]# mv jdk1.8.0_191/ /usr/local/jdk

导入环境变量(注意顺序)
[root@server5 ~]# vim /etc/profile
[root@server5 ~]# tail -2 /etc/profile
export JAVA_HOME=/usr/local/jdk
export PATH=${JAVA_HOME}/bin:$PATH
[root@server5 ~]# source /etc/profile
[root@server5 ~]# java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
  • Hadoop安装
[root@server5 ~]# rz
[root@server5 ~]# tar xf hadoop-2.8.5.tar.gz 
[root@server5 ~]# mv hadoop-2.8.5 /opt/

导入环境变量
[root@server5 ~]# echo 'PATH=$PATH:/opt/hadoop-2.8.5/bin' >> /etc/profile
[root@server5 ~]# source /etc/profile
  • 词频统计,验证可用性
准备文件
[root@server5 ~]# mkdir /tmp/input
[root@server5 ~]# vim /tmp/input/test.txt
[root@server5 ~]# cat /tmp/input/test.txt 
zhangsan
lisi
zhangsan 192.168.139.10
lisi 192.168.139.20
zhangsan 192.168.139.10
jack 192.168.139.30

词频统计
[root@server5 ~]# hadoop jar /opt/hadoop-2.8.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /tmp/input /tmp/output
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0

查看结果
[root@server5 ~]# ls /tmp/output/
part-r-00000  _SUCCESS
# _SUCCESS仅说明运行成功;part-r-00000保存输出结果文件
[root@server5 ~]# cat /tmp/output/part-r-00000 
192.168.139.10	2
192.168.139.20	1
192.168.139.30	1
jack	1
lisi	2
zhangsan	3

伪分布式部署

  • 在单机部署的基础上实现伪分布式部署,具体步骤见单机部署
  • 修改java jdk环境
[root@server5 ~]# cd /opt/hadoop-2.8.5/etc/
[root@server5 etc]# cd hadoop/
[root@server5 hadoop]# ls
capacity-scheduler.xml      hadoop-policy.xml        kms-log4j.properties        ssl-client.xml.example
configuration.xsl           hdfs-site.xml            kms-site.xml                ssl-server.xml.example
container-executor.cfg      httpfs-env.sh            log4j.properties            yarn-env.cmd
core-site.xml               httpfs-log4j.properties  mapred-env.cmd              yarn-env.sh
hadoop-env.cmd              httpfs-signature.secret  mapred-env.sh               yarn-site.xml
hadoop-env.sh               httpfs-site.xml          mapred-queues.xml.template
hadoop-metrics2.properties  kms-acls.xml             mapred-site.xml.template
hadoop-metrics.properties   kms-env.sh               slaves

修改java jdk环境
[root@server5 hadoop]# vim hadoop-env.sh 
[root@server5 hadoop]# grep -Ev '^#|^$' hadoop-env.sh |head -1
export JAVA_HOME=/usr/local/jdk

[root@server5 hadoop]# vim mapred-env.sh 
[root@server5 hadoop]# grep -Ev '^#|^$' mapred-env.sh |head -1
export JAVA_HOME=/usr/local/jdk

[root@server5 hadoop]# vim yarn-env.sh 
[root@server5 hadoop]# grep -Ev '#|^$' yarn-env.sh |grep 'export JAVA_HOM'E
export JAVA_HOME=/usr/local/jdk
  • 配置fs和NameNode数据的存放位置
[root@server5 hadoop]# echo '192.168.139.50 hd1' >> /etc/hosts
[root@server5 hadoop]# vim core-site.xml 
[root@server5 hadoop]# tail -12 core-site.xml 
<configuration>
	<!-- 配置fs -->
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://hd1:8020</value>
	</property>
	<!-- 配置NameNode的数据临时存放目录 -->
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/opt/data/tmp</value>
	</property>
</configuration>
  • 配置副本数量
[root@server5 hadoop]# vim hdfs-site.xml 
[root@server5 hadoop]# tail -7 hdfs-site.xml 
<configuration>
	<!-- 配置副本数目 -->
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
</configuration>
  • 格式化hdfs
[root@server5 hadoop]# hdfs namenode -format
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hd1/192.168.139.50
************************************************************/
[root@server5 hadoop]# ls /opt/data/tmp/
dfs
[root@server5 hadoop]# ls /opt/data/tmp/dfs/name/current/
fsimage_0000000000000000000  fsimage_0000000000000000000.md5  seen_txid  VERSION
  • 启动角色
/opt/hadoop-2.8.5/sbin导入环境变量
[root@server5 ~]# vim /etc/profile
[root@server5 ~]# tail -1 /etc/profile
PATH=$PATH:/opt/hadoop-2.8.5/sbin
[root@server5 ~]# . /etc/profile

启动角色
[root@server5 ~]# hadoop-daemon.sh start namenode
starting namenode, logging to /opt/hadoop-2.8.5/logs/hadoop-root-namenode-server5.out
[root@server5 ~]# hadoop-daemon.sh start datanode
starting datanode, logging to /opt/hadoop-2.8.5/logs/hadoop-root-datanode-server5.out

查看java进程,验证启动
[root@server5 ~]# jps
5072 DataNode
5157 Jps
4941 NameNode
  • hdfs文件测试
[root@server5 ~]# hdfs dfs --help
--help: Unknown command
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] <path> ...]
	[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-x] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-truncate [-w] <length> <path> ...]
	[-usage [cmd ...]]

创建目录
[root@server5 ~]# hdfs dfs -mkdir /test

查看
[root@server5 ~]# hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - root supergroup          0 2021-12-19 16:30 /test

上传文件
[root@server5 ~]# echo 'this is a test file' > 1.txt
[root@server5 ~]# hdfs dfs -put 1.txt /test
[root@server5 ~]# hdfs dfs -ls /test
Found 1 items
-rw-r--r--   1 root supergroup         20 2021-12-19 16:32 /test/1.txt

读取文件内容
[root@server5 ~]# hdfs dfs -cat /test/1.txt
this is a test file

下载文件
[root@server5 ~]# rm -rf 1.txt 
[root@server5 ~]# hdfs dfs -get /test/1.txt
[root@server5 ~]# cat 1.txt 
this is a test file
  • 配置yarn
[root@server5 ~]# cd /opt/hadoop-2.8.5/etc/hadoop/
[root@server5 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@server5 hadoop]# vim mapred-site.xml
[root@server5 hadoop]# tail -7 mapred-site.xml
<configuration>
	<!-- 指定运行框架为yarn -->
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>

[root@server5 hadoop]# vim yarn-site.xml 
[root@server5 hadoop]# tail -12 yarn-site.xml 
<configuration>
	<!-- 指定resourcemanager运行在hd1节点上 -->
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>hd1</value>
	</property>
	<!-- 指定yarn的默认洗牌方式 -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
</configuration>
  • 启动yarn
[root@server5 hadoop]# yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/hadoop-2.8.5/logs/yarn-root-resourcemanager-server5.out
[root@server5 hadoop]# yarn-daemon.sh start nodemanager
starting nodemanager, logging to /opt/hadoop-2.8.5/logs/yarn-root-nodemanager-server5.out
[root@server5 hadoop]# jps
5072 DataNode
8565 ResourceManager
8392 NodeManager
4941 NameNode
8781 Jps
  • 词频统计,验证
[root@server5 hadoop]# hadoop jar /opt/hadoop-2.8.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /test/2.txt /output/00
21/12/19 20:42:49 INFO mapreduce.Job: Counters: 13
	Job Counters 
		Failed map tasks=4
		Killed reduce tasks=1
		Launched map tasks=4
		Other local map tasks=3
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=13
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=13
		Total time spent by all reduce tasks (ms)=0
		Total vcore-milliseconds taken by all map tasks=13
		Total vcore-milliseconds taken by all reduce tasks=0
		Total megabyte-milliseconds taken by all map tasks=13312
		Total megabyte-milliseconds taken by all reduce tasks=0
# 报错信息
Container launch failed for container_1639916924136_0003_01_000002 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist

报错:Container launch failed for container_1639916924136_0003_01_000002 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist

原因:yarn-site.xml配置文件编写不正确

解决:如下

  • 重新修改yarn-site.xml
[root@server5 hadoop]# vim yarn-site.xml 
[root@server5 hadoop]# tail -12 yarn-site.xml 
<configuration>
	<!-- 指定resourcemanager运行在hd1节点上 -->
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>hd1</value>
	</property>
	<!-- 指定yarn的默认洗牌方式 -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
</configuration>
  • 重启hadoop
yarn-daemon.sh stop resourcemanager
yarn-daemon.sh stop nodemanager
hadoop-daemon.sh stop namenode
hadoop-daemon.sh stop datanode

hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
删除执行失败产生的空文件
[root@server5 hadoop]# hdfs dfs -rmdir /output/00
[root@server5 hadoop]# hdfs dfs -rmdir /output

再次执行
[root@server5 hadoop]# hadoop jar /opt/hadoop-2.8.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /test/2.txt /output/00
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0

查看结果
[root@server5 hadoop]# hdfs dfs -ls /output/00
Found 2 items
-rw-r--r--   1 root supergroup          0 2021-12-20 14:18 /output/00/_SUCCESS
-rw-r--r--   1 root supergroup         47 2021-12-20 14:18 /output/00/part-r-00000
[root@server5 hadoop]# hdfs dfs -cat /output/00/part-r-00000
168	1
186	2
192.168.139.10	1
lisi	2
zhangsan	1
  • Windows上做域名解析:C:\Windows\System32\drivers\etc\hosts
  • 浏览器访问:192.168.139.50:8088

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0JCXs9aa-1644589954320)(Hadoop.assets/image-20211220143754834.png)]

  • 开启历史服务
[root@server5 hadoop]# mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /opt/hadoop-2.8.5/logs/mapred-root-historyserver-server5.out
[root@server5 hadoop]# jps
8177 Jps
6665 NodeManager
8137 JobHistoryServer
6363 ResourceManager
6157 NameNode
6255 DataNode
  • web查看历史服务hd1:19888

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MTuyDY5Z-1644589954320)(Hadoop.assets/image-20211220145243342.png)]

完全分布式部署

主机名IP地址角色备注
hd1192.168.139.10NameNode需要运行ZKFC
hd2192.168.139.20NameNode需要运行ZKFC
hd3192.168.139.30ResourceManager
hd4192.168.139.40DataNode | NodeManager | JournalNode安装ZooKeeper
hd5192.168.139.50DataNode | NodeManager | JournalNode安装ZooKeeper
hd6192.168.139.60DataNode | NodeManager | JournalNode安装ZooKeeper
环境准备
修改主机名
hostnamectl set-hostname hd1
su

配置静态IP
vim /etc/sysconfig/network-scripts/ifcfg-ens33
#修改UUID和IPADDR即可
-----------------------------------------------------
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.139.10
GATEWAY=192.168.139.2
NETMASK=255.255.255.0
DNS1=114.114.114.114
-----------------------------------------------------
systemctl restart network

域名解析
cat >> /etc/hosts <<EOF
	192.168.139.10 hd1
	192.168.139.20 hd2
	192.168.139.30 hd3
	192.168.139.40 hd4
	192.168.139.50 hd5
	192.168.139.60 hd6
	EOF

关闭防火墙,selinux
systemctl stop firewalld
systemctl disable firewalld
iptables -F
setenforce 0
sed -i 's/SELINUX=enforced/SELINUX=disabled/' /etc/selinux/config

时间同步
ntpdate cn.ntp.org.cn

yum源配置(阿里源)
cd /etc/yum.repos.d/
mv CentOS-Base.repo CentOS-Base.repo.bak
wget -O CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
cd
yum clean all
yum makecache

免密登录(互免,在主机1上粘贴即可)
ssh-keygen -t rsa -f /root/.ssh/id_rsa -N ''
cd /root/.ssh/
cp id_rsa.pub authorized_keys
for i in hd{2..6}
	do
	scp -r /root/.ssh $i:/root
	done
JDK部署
yum install -y lrzsz
rz

tar xf jdk-8u191-linux-x64.tar.gz
mv jdk1.8.0_191/ /usr/local/jdk
cat >> /etc/profile << EOF
	export JAVA_HOME=/usr/local/jdk
	export PATH=\${JAVA_HOME}/bin:\$PATH
	EOF
source /etc/profile
java -version
ZooKeeper部署(节点4 5 6)
节点4安装zookeeper
[root@hd4 ~]# rz
[root@hd4 ~]# tar xf zookeeper-3.4.14.tar.gz 
[root@hd4 ~]# mv zookeeper-3.4.14 /usr/local/zookeeper
[root@hd4 ~]# cd /usr/local/zookeeper/conf/

修改配置文件zoo.cfg(默认命名,更改,可能会启动失败)
	每台服务器myid不同,需要分别修改,这里设置:
		server.1对应的myid为1
		server.2对应的myid为2
		server.3对应的myid为3
	2888端口:follower连接到leader机器的端口
	3888端口:leader选举端口
[root@hd4 conf]# cp zoo_sample.cfg zoo.cfg
[root@hd4 conf]# vim zoo_sample
[root@hd4 conf]# cat zoo_sample |grep -v "#"
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/data
clientPort=2181
server.1=hd4:2888:3888
server.2=hd5:2888:3888
server.3=hd6:2888:3888

拷贝ZooKeeper配置到节点5和节点6
[root@hd4 conf]# scp -r /usr/local/zookeeper/ hd5:/usr/local/
[root@hd4 conf]# scp -r /usr/local/zookeeper/ hd6:/usr/local/

创建dataDir和myid文件
mkdir /opt/data
[root@hd4 ~]# echo 1 > /opt/data/myid
[root@hd5 ~]# echo 2 > /opt/data/myid
[root@hd6 ~]# echo 3 > /opt/data/myid/usr/local/zookeeper/bin导入环境变量
echo 'PATH=$PATH:/usr/local/zookeeper/bin' >>/etc/profile
source /etc/profile

启动ZooKeeper
zkServer.sh start

查看状态(需要三个节点全部启动,才能查看)
[root@hd4 conf]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[root@hd5 ~]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: leader
[root@hd6 ~]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
Hadoop部署
  • 修改相关配置文件的JAVA_HOME
[root@hd1 ~]# cd /opt/hadoop/etc/hadoop/

[root@hd1 hadoop]# vim hadoop-env.sh
 25 export JAVA_HOME=/usr/local/jdk
[root@hd1 hadoop]# vim mapred-env.sh 
 16 export JAVA_HOME=/usr/local/jdk
[root@hd1 hadoop]# vim yarn-env.sh 
 23 export JAVA_HOME=/usr/local/jdk
  • 编写相关内容
[root@hd1 hadoop]# vim core-site.xml
<configuration>
    <!-- 指定hdfs的nameservice为ns1 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://ns1</value>
    </property>

    <!-- 指定hadoop临时目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/data/tmp</value>
    </property>

    <!-- 指定zookeeper地址 -->
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>hd4:2181,hd5:2181,hd6:2181</value>
    </property>
</configuration>

[root@hd1 hadoop]# vim hdfs-site.xml 
<configuration>
	<!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->
	<property>
		<name>dfs.nameservices</name>
		<value>ns1</value>
	</property>
	
	<!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
	<property>
		<name>dfs.ha.namenodes.ns1</name>
		<value>nn1,nn2</value>
	</property>
	
	<!-- nn1的RPC通信地址 -->
	<property>
		<name>dfs.namenode.rpc-address.ns1.nn1</name>
		<value>hd1:9000</value>
	</property>

	<!-- nn1的http通信地址 -->
	<property>
		<name>dfs.namenode.http-address.ns1.nn1</name> 
		<value>hd1:50070</value>
	</property>

	<!-- nn2的RPC通信地址 -->
	<property>
		<name>dfs.namenode.rpc-address.ns1.nn2</name>
		<value>hd2:9000</value>
	</property>

	<!-- nn2的http通信地址 -->
	<property>
		<name>dfs.namenode.http-address.ns1.nn2</name>
		<value>hd2:50070</value>
	</property>

	<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
	<property>
		<name>dfs.namenode.shared.edits.dir</name> 
		<value>qjournal://hd4:8485;hd5:8485;hd6:8485/ns1</value>
	</property>

	<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
	<property>
		<name>dfs.journalnode.edits.dir</name> 
		<value>/opt/data/journal</value>
	</property>

	<!-- 开启NameNode失败自动切换 -->
	<property>
		<name>dfs.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>

	<!-- 配置失败自动切换实现方式 -->
	<property>
		<name>dfs.client.failover.proxy.provider.ns1</name>
 		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
	</property>

	<!-- 配置隔离机制 -->
	<property>
		<name>dfs.ha.fencing.methods</name>
		<value>sshfence</value>
	</property>

	<!-- 使用隔离机制时需要ssh免登陆 -->
	<property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>
		<value>/root/.ssh/id_rsa</value>
	</property>
</configuration>

[root@hd1 hadoop]# vim slaves 
[root@hd1 hadoop]# cat slaves 
hd4
hd5
hd6

[root@hd1 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@hd1 hadoop]# vim mapred-site.xml
<configuration>
    <!-- 指定mr框架为yarn方式 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

[root@hd1 hadoop]# vim yarn-site.xml 
<configuration>
    <!-- 指定resourcemanager地址 -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hd3</value>
    </property>

    <!-- 指定nodemanager启动时加载server的方式为shuffle server -->
    <property>
        <name>yarn.nodemanager.aux-services</name> 
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

  • 将Hadoop的软件包与配置拷贝到其他节点
[root@hd1 ~]# for i in hd{2..6};do scp -r /opt/hadoop/ $i:/opt/;done
  • 将/opt/hadoop/bin和/opt/hadoop/sbin导入环境变量
echo 'PATH=$PATH:/opt/hadoop/bin' >> /etc/profile
echo 'PATH=$PATH:/opt/hadoop/sbin' >> /etc/profile
source /etc/profile
  • 启动集群
启动4,5,6节点的ZooKeeper
[root@hd4 conf]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[root@hd5 ~]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: leader
[root@hd6 ~]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower

启动journalnode
[root@hd1 ~]# hadoop-daemons.sh start journalnode
hd4: starting journalnode, logging to /opt/hadoop/logs/hadoop-root-journalnode-hd4.out
hd5: starting journalnode, logging to /opt/hadoop/logs/hadoop-root-journalnode-hd5.out
hd6: starting journalnode, logging to /opt/hadoop/logs/hadoop-root-journalnode-hd6.out
[root@hd4 ~]# ls /opt/data/journal
[root@hd4 ~]# jps
2083 QuorumPeerMain
5444 Jps
5383 JournalNode

格式化NameNode
[root@hd1 ~]# hdfs namenode -format
[root@hd1 ~]# scp -r /opt/data/ hd2:/opt/

格式化zkfc
[root@hd1 ~]# hdfs zkfc -formatZK

启动hdfs
[root@hd1 ~]# start-dfs.sh 
Starting namenodes on [hd1 hd2]
hd1: starting namenode, logging to /opt/hadoop/logs/hadoop-root-namenode-hd1.out
hd4: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-hd4.out
hd6: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-hd6.out
hd5: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-hd5.out
Starting journal nodes [hd4 hd5 hd6]
hd5: journalnode running as process 50021. Stop it first.
hd6: journalnode running as process 5477. Stop it first.
hd4: journalnode running as process 5383. Stop it first.
Starting ZK Failover Controllers on NN hosts [hd1 hd2]
hd1: starting zkfc, logging to /opt/hadoop/logs/hadoop-root-zkfc-hd1.out
hd2: starting zkfc, logging to /opt/hadoop/logs/hadoop-root-zkfc-hd2.out

启动yarn
[root@hd3 ~]# start-yarn.sh

查看
[root@hd1 ~]# jps
5374 NameNode
5678 DFSZKFailoverController
5774 Jps
[root@hd2 ~]# jps
5638 NameNode
5737 DFSZKFailoverController
5850 Jps
[root@hd3 ~]# jps
5587 Jps
5309 ResourceManager
[root@hd4 ~]# jps
2083 QuorumPeerMain
5619 NodeManager
5749 Jps
5383 JournalNode
5479 DataNode
[root@hd5 ~]# jps
50162 DataNode
50339 NodeManager
47301 QuorumPeerMain
50021 JournalNode
50476 Jps
[root@hd6 ~]# jps
5552 DataNode
5682 NodeManager
5477 JournalNode
5799 Jps
5179 QuorumPeerMain
  • 验证集群正常使用:词频统计
[root@hd1 ~]# vim 1.txt
[root@hd1 ~]# hdfs dfs -mkdir /input
[root@hd1 ~]# hdfs dfs -put 1.txt /input
[root@hd1 ~]# hdfs dfs -ls /input
Found 1 items
-rw-r--r--   3 root supergroup        219 2021-12-21 23:02 /input/1.txt
[root@hd1 ~]# yarn jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /input /output/00
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
[root@hd1 ~]# hdfs dfs -ls /output/00
Found 2 items
-rw-r--r--   3 root supergroup          0 2021-12-21 23:05 /output/00/_SUCCESS
-rw-r--r--   3 root supergroup        237 2021-12-21 23:05 /output/00/part-r-00000
[root@hd1 ~]# hdfs dfs -cat /output/00/part-r-00000
"ens33"	2
"no"	1
"stable-privacy"	1
"yes"	3
114.114.114.114	1
192.168.139.10	1
192.168.139.2	1
255.255.255.0	1
CONF	1
DEVICE	1
DNS1	1
GATEWAY	1
IPADDR	1
IPV6_ADDR_GEN_MODE	1
IPV6_DEFROUTE	1
IPV6_FAILURE_FATAL	1
NAME	1
NETMASK	1
ONBOOT	1
  • Windows访问
域名解析C:\Windows\System32\drivers\etc\hosts
192.168.139.10 hd1
192.168.139.20 hd2
192.168.139.30 hd3
192.168.139.40 hd4
192.168.139.50 hd5
192.168.139.60 hd6

查看NameNode状态

  • hd1:50070

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-piOG3cHc-1644589954321)(Hadoop.assets/image-20211221231311437.png)]

  • hd2:50070

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WzL2BcO7-1644589954321)(Hadoop.assets/image-20211221231418380.png)]

查看yarn状态

  • hd3:8088

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JdpRhlPE-1644589954321)(Hadoop.assets/image-20211221231704453.png)]

Ambari自动部署Hadoop

Ambari介绍

  • Apache Ambari项目旨在通过开发用于配置,管理和监控Apache Hadoop集群的软件来简化Hadoop管理。Ambari提供了一个由RESTful API支持的直观,易用的Hadoop管理Web UI

  • 具体功能

    • 提供Hadoop集群
      • Ambari提供了跨任意数量的主机安装Hadoop服务的分步向导
      • Ambari处理群集的Hadoop服务配置
    • 管理hadoop集群
    • 监控Hadoop集群
      • Ambari提供了一个仪表板,用于监控Hadoop集群的运行状况和状态
      • Ambari利用Ambari指标系统进行指标收集
      • Ambari利用Ambari Alert Framework进行系统警报,并在需要您注意时通知您(例如,节点出现故障,剩余磁盘空间不足等)
    • 使用Ambari REST API轻松将Hadoop配置,管理和监控功能集成到自己的应用程序中
  • Ambari本身也是一个分布式架构软件,主要由两部分组成:Ambari ServerAmbari Agent

  • 用户通过Ambari Server通知Ambari Agent安装对应的软件,Agent会定时发送各个机器每个软件模块的状态给Server,最终这些状态信息会呈现给Ambari的GUI,方便用户了解到集群中各组件状态,做出相应的维护策略

  • Ambari Agent实际用于部署hadoop集群,Ambari Server用于管理,监控hadoop集群

部署

主机名IP地址角色备注
hd1192.168.139.10agent需要运行ZKFC
hd2192.168.139.20agent需要运行ZKFC
hd3192.168.139.30agent
hd4192.168.139.40agent安装ZooKeeper
hd5192.168.139.50agent安装ZooKeeper
hd6192.168.139.60agent安装ZooKeeper
ambari_server192.168.139.70server
环境准备
修改主机名
hostnamectl set-hostname ambari_server
su

配置静态IP
vim /etc/sysconfig/network-scripts/ifcfg-ens33
#修改UUID和IPADDR即可
-----------------------------------------------------
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.139.70
GATEWAY=192.168.139.2
NETMASK=255.255.255.0
DNS1=114.114.114.114
-----------------------------------------------------
systemctl restart network

域名解析
cat >> /etc/hosts <<EOF
	192.168.139.10 hd1 hd1.a.com
	192.168.139.20 hd2 hd2.a.com
	192.168.139.30 hd3 hd3.a.com
	192.168.139.40 hd4 hd4.a.com
	192.168.139.50 hd5 hd5.a.com
	192.168.139.60 hd6 hd6.a.com
	192.168.139.70 ambari_server ambari-server.a.com
	EOF

关闭防火墙,selinux
systemctl stop firewalld
systemctl disable firewalld
iptables -F
setenforce 0
sed -i 's/SELINUX=enforced/SELINUX=disabled/' /etc/selinux/config

时间同步
ntpdate cn.ntp.org.cn

yum源配置(阿里源)
cd /etc/yum.repos.d/
mv CentOS-Base.repo CentOS-Base.repo.bak
wget -O CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
cd
yum clean all
yum makecache

免密登录(互免,在Ambari_server上实现)
ssh-keygen -t rsa -f /root/.ssh/id_rsa -N ''
cd /root/.ssh/
cp id_rsa.pub authorized_keys
for i in hd{1..6}
	do
	scp -r /root/.ssh $i:/root
	done
JDK部署
[root@ambari_server ~]# yum install -y lrzsz
[root@ambari_server ~]# rz

[root@ambari_server ~]# tar xf jdk-8u191-linux-x64.tar.gz
[root@ambari_server ~]# mv jdk1.8.0_191/ /usr/local/jdk
[root@ambari_server ~]# for i in hd{1..6}; do scp -r /usr/local/jdk $i:/usr/local/;done
[root@ambari_server ~]# cat >> /etc/profile << EOF
	export JAVA_HOME=/usr/local/jdk
	export PATH=\${JAVA_HOME}/bin:\$PATH
	EOF
[root@ambari_server ~]# source /etc/profile
[root@ambari_server ~]# java -version
[root@ambari_server ~]# for i in hd{1..6}; do scp -r /etc/profile $i:/etc/profile;done

在6个agent节点实现
source /etc/profile
java -version
数据库部署(ambari_server节点)
安装mariadb
[root@ambari_server ~]# yum install -y mariadb mariadb-server.x86_64
[root@ambari_server ~]# systemctl start mariadb.service 
[root@ambari_server ~]# systemctl enable mariadb.service 
[root@ambari_server ~]# mysqladmin -uroot password 123456

创建数据库并授权
[root@ambari_server ~]# mysql -uroot -p123456
MariaDB [(none)]> create database ambari character set utf8;
MariaDB [(none)]> use ambari
MariaDB [ambari]> grant all on ambari.* to 'ambari'@'ambari_server' identified by '123456';
MariaDB [ambari]> grant all on ambari.* to 'ambari'@'%' identified by '123456';
# '%'代表除本机外的所有主机
MariaDB [(none)]> flush privileges;

验证授权
[root@ambari_server ~]# mysql -h ambari_server -uambari -p123456
ERROR 1045 (28000): Access denied for user 'ambari'@'ambari_server' (using password: YES)

报错:ERROR 1045 (28000): Access denied for user ‘ambari’@‘ambari_server’ (using password: YES)

解决:

登录mysql数据库
[root@ambari_server ~]# mysql -uroot -p123456
MariaDB [(none)]> use mysql
MariaDB [mysql]> select host,user,password from user;
+----------------+--------+-------------------------------------------+
| host           | user   | password                                  |
+----------------+--------+-------------------------------------------+
| localhost      | root   | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| ambari\_server | root   |                                           |
| 127.0.0.1      | root   |                                           |
| ::1            | root   |                                           |
| localhost      |        |                                           |
| ambari\_server |        |                                           |
| ambari_server  | ambari | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| %              | ambari | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
+----------------+--------+-------------------------------------------+


删除user表中user列为空格的行:
MariaDB [mysql]> delete from user where user=' ';
MariaDB [mysql]> select host,user,password from user;
+----------------+--------+-------------------------------------------+
| host           | user   | password                                  |
+----------------+--------+-------------------------------------------+
| localhost      | root   | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| ambari\_server | root   |                                           |
| 127.0.0.1      | root   |                                           |
| ::1            | root   |                                           |
| ambari_server  | ambari | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| %              | ambari | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
+----------------+--------+-------------------------------------------+

刷新授权表
MariaDB [(none)]> flush privileges;
MariaDB [(none)]> exit

成功登录
[root@ambari_server ~]# mysql -h ambari_server -uambari -p123456
MariaDB [(none)]> 
java&mysql的连接工具
[root@ambari_server ~]# yum install -y mysql-connector-java.noarch
本地yum源部署
开启httpd服务
[root@ambari_server ~]# yum install -y httpd

传入ambari软件包
[root@ambari_server ~]# cd /var/www/html/
[root@ambari_server ~]# rz
[root@ambari_server ~]# ls
ambari.zip	HDP.zip	HDP-UTIL.zip
[root@ambari_server html]# unzip ambari.zip 
[root@ambari_server html]# unzip HDP.zip
[root@ambari_server html]# unzip HDP-UTIL.zip 

自建yum源
[root@localhost ~]# vim /etc/yum.repos.d/ambari.repo
[root@ambari_server yum.repos.d]# vim ambari.repo
[root@ambari_server yum.repos.d]# cat ambari.repo
[ambari-2.6.1.5]
name=ambari Version - ambari-2.6.1.5
baseurl=http://ambari_server/ambari
gpgcheck=0
enabled=1
priority=1

[root@ambari_server yum.repos.d]# vim hdp.repo 
[root@ambari_server yum.repos.d]# cat hdp.repo 
#VERSION_NUMBER=2.6.1.0-129
[HDP-2.6.1.0]
name=HDP Version - HDP-2.6.1.0
baseurl=http://ambari_server/HDP
gpgcheck=1
gpgkey=http://ambari_servr/HDP/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1

# 这里还缺少了 HDP-UTIL的源,暂时找不到资源

开启httpd服务
[root@ambari_server ~]# systemctl start httpd
[root@ambari_server ~]# systemctl enable httpd

验证
[root@ambari_server ~]# yum clean all
[root@ambari_server ~]# yum makecache
[root@ambari_server ~]# yum repolist
安装Ambari
[root@ambari_server ~]# cd /var/www/html
[root@ambari_server html]# rz
# 上传ambari-server,ambari-agent包
[root@ambari_server html]# yum install -y /var/www/html/ambari-server-2.5.1.0-159.x86_64.rpm 

[root@ambari_server html]# ls
ambari-agent-2.5.1.0-159.x86_64.rpm  ambari-server-2.5.1.0-159.x86_64.rpm
[root@ambari_server html]# for i in hd{1..6}
	do
	scp -r ambari-agent-2.5.1.0-159.x86_64.rpm $i:/root
	done

所有agent节点安装ambari-agent
yum install -y /root/ambari-agent-2.5.1.0-159.x86_64.rpm
初始化Ambari Server
ambari框架数据
[root@ambari_server ~]# ls /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql 
/var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql

ambari框架数据导入数据库
[root@ambari_server ~]# mysql -h ambari_server -uambari -p123456
MariaDB [(none)]> use ambari
MariaDB [ambari]> source /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql
MariaDB [ambari]> show tables;
+-------------------------------+
| Tables_in_ambari              |
+-------------------------------+
| ClusterHostMapping            |
| QRTZ_BLOB_TRIGGERS            |
| QRTZ_CALENDARS                |
| QRTZ_CRON_TRIGGERS            |
...
| widget                        |
| widget_layout                 |
| widget_layout_user_widget     |
+-------------------------------+
105 rows in set (0.00 sec)

初始化
[root@ambari_server ~]# ambari-server setup
Using python  /usr/bin/python
Setup ambari-server
Checking SELinux...
SELinux status is 'disabled'
Customize user account for ambari-server daemon [y/n] (n)? y
Enter user account for ambari-server daemon (root):root
Adjusting ambari-server permissions and ownership...
Checking firewall status...
Checking JDK...
[1] Oracle JDK 1.8 + Java Cryptography Extension (JCE) Policy Files 8
[2] Oracle JDK 1.7 + Java Cryptography Extension (JCE) Policy Files 7
[3] Custom JDK
==============================================================================
Enter choice (1): 3	选择自定义JDK
WARNING: JDK must be installed on all hosts and JAVA_HOME must be valid on all hosts.
WARNING: JCE Policy files are required for configuring Kerberos security. If you plan to use Kerberos,please make sure JCE Unlimited Strength Jurisdiction Policy Files are valid on all hosts.
Path to JAVA_HOME: /usr/local/jdk
Validating JDK on Ambari Server...done.
Completing setup...
Configuring database...
Enter advanced database configuration [y/n] (n)? y	配置数据库
Configuring database...
==============================================================================
Choose one of the following options:
[1] - PostgreSQL (Embedded)
[2] - Oracle
[3] - MySQL / MariaDB
[4] - PostgreSQL
[5] - Microsoft SQL Server (Tech Preview)
[6] - SQL Anywhere
[7] - BDB
==============================================================================
Enter choice (1): 3	选择mysql
Hostname (localhost): ambari_server
Invalid hostname.
Hostname (localhost): ambari-server.a.com	主机名需要符合规范
Port (3306): 3306	端口
Database name (ambari): ambari	数据库
Username (ambari): ambari	用户
Enter Database Password (bigdata): 123456
Re-enter password: 123456
Configuring ambari database...
Configuring remote database connection properties...
WARNING: Before starting Ambari Server, you must run the following DDL against the database to create the schema: /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql
Proceed with configuring remote database connection properties [y/n] (y)? y	配置远程连接
Extracting system views...
ambari-admin-2.5.1.0.159.jar
...........
Adjusting ambari-server permissions and ownership...
Ambari Server 'setup' completed successfully.

启动Ambari-server
[root@ambari_server ~]# ambari-server start
Using python  /usr/bin/python
Starting ambari-server
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Ambari database consistency check started...
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start........................
Server started listening on 8080

DB configs consistency check: no errors and warnings were found.
Ambari Server 'start' completed successfully.

开机自启
[root@ambari_server ~]# chkconfig --list |grep ambari
ambari-server  	0:关	1:关	2:开	3:开	4:开	5:开	6:关
  • 浏览器访问192.168.139.70:8080
  • 用户名和密码都是admin

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MRcenEYZ-1644589954322)(Hadoop.assets/image-20211222210812534.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EtD0Z1kS-1644589954326)(Hadoop.assets/image-20211222210914986.png)]

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值