Hadoop
大数据
概述
- 数据量越来越大,数据分析的实时性越来越强,数据结果的应用越来越广泛,大数据技术应运而生
- 大数据:大数据是收集、整理、处理大容量数据集,并从中获得结果的技术总称
大数据处理框架
- 处理框架:实际负责处理数据操作的一系列组件
- 常见框架
- 批处理框架:用于批量处理大数据集的处理框架,可对整个数据集进行操作。如Apache Hadoop
- 流处理框架:用于对随时进入系统的数据进行实时计算,是一种“无数据边界”的操作方式。如Apache Storm,Apache Samza
- 混合处理框架:一些大数据处理框架可同时处理批处理和流处理工作负载。如:Apache Spark,Apache Flink
hadoop介绍
简介
- Hadoop是一个由Apache基金会所开发的分布式系统基础架构。主要解决海量数据的存储与分析计算问题。
- 广义上来说,Hadoop通常指一个更加广阔到概念——Hadoop生态圈
- Hadoop是一个可靠,可扩展的分布式计算的开源软件,可以从单个服务器扩展到数千台计算机。集群中每台计算机都提供本地计算和存储
- Hadoop把硬件故障认为常态,通过软件把控故障,在软件水平实现高可用
- Hadoop是一个大数据处理框架,允许使用简单的编程模型跨计算机集群分布式处理大型数据集
项目
核心项目
- Hadoop HDFS:分布式文件系统,可提供对应用程序数据的高吞吐量访问
- Hadoop YARN:作业调度和集群资源管理的框架
- Hadoop MapReduce:基于YARN的系统,用于并行处理大型数据集
- Hadoop Common:支持其他Hadoop模块的常用实用程序
- Hadoop Ozone: Hadoop集群所提供的对象存储
其他项目
- Ambari:基于Web的工具,用于配置,管理和监控Apache Hadoop集群,包括对Hadoop HDFS,HadoopMapReduce,Hive,HCatalog,HBase,ZooKeeper,Oozie,Pig和Sqoop的支持。Ambari还提供了一个用于查看集群运行状况的仪表板,例如热图,以及可视化查看MapReduce,Pig和Hive应用程序的功能,以及以用户友好的方式诊断其性能特征的功能
- Avro:数据序列化系统
- HBase:可扩展的分布式数据库,支持大型表的结构化数据存储。
- Mahout:可扩展的机器学习和数据挖掘库
- Spark:用于Hadoop数据的快速通用计算引擎。Spark提供了一种简单而富有表现力的编程模型,支持广泛的应用程序,包括ETL,机器学习,流处理和图形计算。
- ZooKeeper:用于分布式应用程序的高性能协调服务
HDFS
介绍
- HDFS:Hadoop Distributed File System,Hadoop分布式文件系统
- HDFS它是一个高度容错性的系统,可以部署在廉价机上,提供高吞吐量的数据访问,适合具有超大数据集(海量数据分析,机器学习等)的应用程序
- 特点:
- 支持大数据文件:适合TB级别数据
- 文件分块存储
- 支持一次写入,多次读取,顺序读取
- 支持廉价硬件
- 支持硬件故障
读写流程
相关名词
-
Block:最基本的存储单位。将文件进行分块处理,通常是128M/块
-
Hadoop集群:一主多从架构
-
NameNode:主节点
用于保存整个文件系统的目录信息、文件信息及分块信息
功能:
- 接收用户的操作请求
- 维护文件系统的目录结构
- 管理文件和Block之间的映射管理
- 管理 block 和 DataNode 之间的映射
-
DataNode:从节点
分布在廉价的计算机上,用于存储Block块文件
具体储存方式:文件被分成块存储到 DataNode 的磁盘上,每个Block(块)可以设置多副本
写流程
- 客户端向namenode发起请求
- 客户端向datenode发起建立连接请求
- 客户端向datenode存储数据
读流程
MapReduce
介绍
- MapReduce:是一套从海量源数据提取、分析元素,最后返回结果集的方法
- MapReduce框架的核心步骤主要分两部分:Map 和Reduce
- Map把大数据分成小数据,进行分析计算,将结果通过洗牌的方式给Reduce。Reduce对Map的结果进行汇总。
- 举例说明:
- 求和:1+5+7+3+4+9+3+5+6=?
- 求和:1+5+7+3+4+9+3+5+6=?
工作流程
-
当向MapReduce 框架提交一个计算作业时,它会首先把计算作业拆分成若干个Map 任务,然后分配到不同的节点(DataNode)上去执行,每一个Map 任务处理输入数据中的一部分,当Map 任务完成后,它会生成一些中间文件,这些中间文件将会作为Reduce 任务的输入数据。Reduce 任务的主要目标就是把前面若干个Map 的输出汇总到一起并输出。
YARN
-
YARN: Yet An other Resouce Negotiator,另一种资源协调者
-
主要功能:任务调度和集群资源管理,让hadoop平台性能及扩展性得到更好发挥
-
YARN是Hadoop 2.0新增加的一个子项目,弥补了Hadoop 1.0(MRv1)扩展性差、可靠性资源利用率低以及无法支持其他计算框架等不足,Hadoop的下一代计算框架MRv2将资源管理功能抽象成一个通用系统YARN。也就是说,YARN由MapReduce拆分而来
-
优点:降低集群资源空闲,提高集群资源利用率。维护成本低。数据共享。 避免了集群之间移动数据。
-
主从架构
-
ResourceManager:资源管理(主)
负责对各个NodeManager上的资源进行统一管理和任务调度
-
NodeManager:节点管理(从)
在各个计算节点运行,用于接收RM中ApplicationsManager的计算任务、启动/停止任务、和RM中Scheduler汇报并协商资源、监控并汇报本节点的情况
-
Hadoop部署
常见部署分类
单机
- 单机(本地模式)是Hadoop的默认部署模式
- 当配置文件为空时,Hadoop完全运行在本地
- 不需要与其他节点交互,单机(本地模式)就不使用HDFS,也不加载任何Hadoop的守护进程
- 该模式主要用于开发调试MapReduce程序的应用逻辑
伪分布式
- Hadoop守护进程运行在本地机器上,模拟一个小规模的的集群
- 该模式在单机模式之上增加了代码调试功能,允许你检查内存使用情况,HDFS输入/输出,以及其他的守护进程交互
完全分布式
介绍
- 单机和伪分布式部署仅用于测试环境,生产环境需要完全分布式部署
- 完全分部式是真正利用多台Linux主机来进行部署Hadoop,对Linux机器集群进行规划,使得Hadoop各个模块分别部署在不同的多台机器上
- 由于NameNode和ResourceManager一主多从的架构模式,需要对其做高可用
NameNode HA故障切换
-
一个NameService下面有两个NameNode,分别处于Active和Standby状态。通过Zookeeper进行协调选举,确保只有一个活跃的NameNode。一旦主(Active)宕掉,standby会切换成Active
-
ZKFailoverController作为一个ZK(ZooKeeper)集群的客户端,用来监控NN的状态信息。每个运行NN的节点必须要运行一个ZKFC(ZKFailoverControlle)
-
ZKFC功能
- Health monitoring健康检查:zkfc定期对本地的NN发起health-check的命令,如果NN正确返回,那么这个NN被认为是OK的。否则被认为是失效节点
- ZooKeeper session management:当本地NN是健康的时候,zkfc将会在zk中持有一个session。如果本地NN又正好是active的,那么zkfc还有持有一个"ephemeral"的节点作为锁,一旦本地NN失效了,那么这个节点将会被自动删除
- ZooKeeper-based election主备选举:如果本地NN是健康的,并且zkfc发现没有其他的NN持有那个独占锁。那么他将试图去获取该锁,一旦成功,那么它就需要执行Failover,然后成为active的NN节点。
NameNode HA数据共享
- Namenode主要维护两个文件,一个是fsimage,一个是editlog
- fsimage保存了最新的元数据检查点,包含了整个HDFS文件系统的所有目录和文件的信息。对于文件来说包括了数据块描述信息、修改时间、访问时间等;对于目录来说包括修改时间、访问权限控制信息(目录所属用户,所在组)等。
- editlog主要是在NameNode已经启动情况下对HDFS进行的各种更新操作进行记录,HDFS客户端执行所有的写操作都会被记录到editlog中,editlog保存在JournalNode节点
- StandBy从JournalNode节点里读取editlog中数据进行同步
单机部署
-
软件包获取
-
java环境准备
[root@server5 ~]# java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
更换jdk(提前下载好)
[root@server5 ~]# tar xf jdk-8u191-linux-x64.tar.gz
[root@server5 ~]# mv jdk1.8.0_191/ /usr/local/jdk
导入环境变量(注意顺序)
[root@server5 ~]# vim /etc/profile
[root@server5 ~]# tail -2 /etc/profile
export JAVA_HOME=/usr/local/jdk
export PATH=${JAVA_HOME}/bin:$PATH
[root@server5 ~]# source /etc/profile
[root@server5 ~]# java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
- Hadoop安装
[root@server5 ~]# rz
[root@server5 ~]# tar xf hadoop-2.8.5.tar.gz
[root@server5 ~]# mv hadoop-2.8.5 /opt/
导入环境变量
[root@server5 ~]# echo 'PATH=$PATH:/opt/hadoop-2.8.5/bin' >> /etc/profile
[root@server5 ~]# source /etc/profile
- 词频统计,验证可用性
准备文件
[root@server5 ~]# mkdir /tmp/input
[root@server5 ~]# vim /tmp/input/test.txt
[root@server5 ~]# cat /tmp/input/test.txt
zhangsan
lisi
zhangsan 192.168.139.10
lisi 192.168.139.20
zhangsan 192.168.139.10
jack 192.168.139.30
词频统计
[root@server5 ~]# hadoop jar /opt/hadoop-2.8.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /tmp/input /tmp/output
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
查看结果
[root@server5 ~]# ls /tmp/output/
part-r-00000 _SUCCESS
# _SUCCESS仅说明运行成功;part-r-00000保存输出结果文件
[root@server5 ~]# cat /tmp/output/part-r-00000
192.168.139.10 2
192.168.139.20 1
192.168.139.30 1
jack 1
lisi 2
zhangsan 3
伪分布式部署
- 在单机部署的基础上实现伪分布式部署,具体步骤见单机部署
- 修改java jdk环境
[root@server5 ~]# cd /opt/hadoop-2.8.5/etc/
[root@server5 etc]# cd hadoop/
[root@server5 hadoop]# ls
capacity-scheduler.xml hadoop-policy.xml kms-log4j.properties ssl-client.xml.example
configuration.xsl hdfs-site.xml kms-site.xml ssl-server.xml.example
container-executor.cfg httpfs-env.sh log4j.properties yarn-env.cmd
core-site.xml httpfs-log4j.properties mapred-env.cmd yarn-env.sh
hadoop-env.cmd httpfs-signature.secret mapred-env.sh yarn-site.xml
hadoop-env.sh httpfs-site.xml mapred-queues.xml.template
hadoop-metrics2.properties kms-acls.xml mapred-site.xml.template
hadoop-metrics.properties kms-env.sh slaves
修改java jdk环境
[root@server5 hadoop]# vim hadoop-env.sh
[root@server5 hadoop]# grep -Ev '^#|^$' hadoop-env.sh |head -1
export JAVA_HOME=/usr/local/jdk
[root@server5 hadoop]# vim mapred-env.sh
[root@server5 hadoop]# grep -Ev '^#|^$' mapred-env.sh |head -1
export JAVA_HOME=/usr/local/jdk
[root@server5 hadoop]# vim yarn-env.sh
[root@server5 hadoop]# grep -Ev '#|^$' yarn-env.sh |grep 'export JAVA_HOM'E
export JAVA_HOME=/usr/local/jdk
- 配置fs和NameNode数据的存放位置
[root@server5 hadoop]# echo '192.168.139.50 hd1' >> /etc/hosts
[root@server5 hadoop]# vim core-site.xml
[root@server5 hadoop]# tail -12 core-site.xml
<configuration>
<!-- 配置fs -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hd1:8020</value>
</property>
<!-- 配置NameNode的数据临时存放目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/data/tmp</value>
</property>
</configuration>
- 配置副本数量
[root@server5 hadoop]# vim hdfs-site.xml
[root@server5 hadoop]# tail -7 hdfs-site.xml
<configuration>
<!-- 配置副本数目 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
- 格式化hdfs
[root@server5 hadoop]# hdfs namenode -format
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hd1/192.168.139.50
************************************************************/
[root@server5 hadoop]# ls /opt/data/tmp/
dfs
[root@server5 hadoop]# ls /opt/data/tmp/dfs/name/current/
fsimage_0000000000000000000 fsimage_0000000000000000000.md5 seen_txid VERSION
- 启动角色
将/opt/hadoop-2.8.5/sbin导入环境变量
[root@server5 ~]# vim /etc/profile
[root@server5 ~]# tail -1 /etc/profile
PATH=$PATH:/opt/hadoop-2.8.5/sbin
[root@server5 ~]# . /etc/profile
启动角色
[root@server5 ~]# hadoop-daemon.sh start namenode
starting namenode, logging to /opt/hadoop-2.8.5/logs/hadoop-root-namenode-server5.out
[root@server5 ~]# hadoop-daemon.sh start datanode
starting datanode, logging to /opt/hadoop-2.8.5/logs/hadoop-root-datanode-server5.out
查看java进程,验证启动
[root@server5 ~]# jps
5072 DataNode
5157 Jps
4941 NameNode
- hdfs文件测试
[root@server5 ~]# hdfs dfs --help
--help: Unknown command
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] <path> ...]
[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-x] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]
创建目录
[root@server5 ~]# hdfs dfs -mkdir /test
查看
[root@server5 ~]# hdfs dfs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2021-12-19 16:30 /test
上传文件
[root@server5 ~]# echo 'this is a test file' > 1.txt
[root@server5 ~]# hdfs dfs -put 1.txt /test
[root@server5 ~]# hdfs dfs -ls /test
Found 1 items
-rw-r--r-- 1 root supergroup 20 2021-12-19 16:32 /test/1.txt
读取文件内容
[root@server5 ~]# hdfs dfs -cat /test/1.txt
this is a test file
下载文件
[root@server5 ~]# rm -rf 1.txt
[root@server5 ~]# hdfs dfs -get /test/1.txt
[root@server5 ~]# cat 1.txt
this is a test file
- 配置yarn
[root@server5 ~]# cd /opt/hadoop-2.8.5/etc/hadoop/
[root@server5 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@server5 hadoop]# vim mapred-site.xml
[root@server5 hadoop]# tail -7 mapred-site.xml
<configuration>
<!-- 指定运行框架为yarn -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[root@server5 hadoop]# vim yarn-site.xml
[root@server5 hadoop]# tail -12 yarn-site.xml
<configuration>
<!-- 指定resourcemanager运行在hd1节点上 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hd1</value>
</property>
<!-- 指定yarn的默认洗牌方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
- 启动yarn
[root@server5 hadoop]# yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/hadoop-2.8.5/logs/yarn-root-resourcemanager-server5.out
[root@server5 hadoop]# yarn-daemon.sh start nodemanager
starting nodemanager, logging to /opt/hadoop-2.8.5/logs/yarn-root-nodemanager-server5.out
[root@server5 hadoop]# jps
5072 DataNode
8565 ResourceManager
8392 NodeManager
4941 NameNode
8781 Jps
- 词频统计,验证
[root@server5 hadoop]# hadoop jar /opt/hadoop-2.8.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /test/2.txt /output/00
21/12/19 20:42:49 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=4
Killed reduce tasks=1
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=13
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=13
Total time spent by all reduce tasks (ms)=0
Total vcore-milliseconds taken by all map tasks=13
Total vcore-milliseconds taken by all reduce tasks=0
Total megabyte-milliseconds taken by all map tasks=13312
Total megabyte-milliseconds taken by all reduce tasks=0
# 报错信息
Container launch failed for container_1639916924136_0003_01_000002 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist
报错:Container launch failed for container_1639916924136_0003_01_000002 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist
原因:yarn-site.xml配置文件编写不正确
解决:如下
- 重新修改yarn-site.xml
[root@server5 hadoop]# vim yarn-site.xml [root@server5 hadoop]# tail -12 yarn-site.xml <configuration> <!-- 指定resourcemanager运行在hd1节点上 --> <property> <name>yarn.resourcemanager.hostname</name> <value>hd1</value> </property> <!-- 指定yarn的默认洗牌方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
- 重启hadoop
yarn-daemon.sh stop resourcemanager yarn-daemon.sh stop nodemanager hadoop-daemon.sh stop namenode hadoop-daemon.sh stop datanode hadoop-daemon.sh start namenode hadoop-daemon.sh start datanode yarn-daemon.sh start resourcemanager yarn-daemon.sh start nodemanager
删除执行失败产生的空文件
[root@server5 hadoop]# hdfs dfs -rmdir /output/00
[root@server5 hadoop]# hdfs dfs -rmdir /output
再次执行
[root@server5 hadoop]# hadoop jar /opt/hadoop-2.8.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /test/2.txt /output/00
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
查看结果
[root@server5 hadoop]# hdfs dfs -ls /output/00
Found 2 items
-rw-r--r-- 1 root supergroup 0 2021-12-20 14:18 /output/00/_SUCCESS
-rw-r--r-- 1 root supergroup 47 2021-12-20 14:18 /output/00/part-r-00000
[root@server5 hadoop]# hdfs dfs -cat /output/00/part-r-00000
168 1
186 2
192.168.139.10 1
lisi 2
zhangsan 1
- Windows上做域名解析:C:\Windows\System32\drivers\etc\hosts
- 浏览器访问:192.168.139.50:8088
- 开启历史服务
[root@server5 hadoop]# mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /opt/hadoop-2.8.5/logs/mapred-root-historyserver-server5.out
[root@server5 hadoop]# jps
8177 Jps
6665 NodeManager
8137 JobHistoryServer
6363 ResourceManager
6157 NameNode
6255 DataNode
- web查看历史服务hd1:19888
完全分布式部署
主机名 | IP地址 | 角色 | 备注 |
---|---|---|---|
hd1 | 192.168.139.10 | NameNode | 需要运行ZKFC |
hd2 | 192.168.139.20 | NameNode | 需要运行ZKFC |
hd3 | 192.168.139.30 | ResourceManager | |
hd4 | 192.168.139.40 | DataNode | NodeManager | JournalNode | 安装ZooKeeper |
hd5 | 192.168.139.50 | DataNode | NodeManager | JournalNode | 安装ZooKeeper |
hd6 | 192.168.139.60 | DataNode | NodeManager | JournalNode | 安装ZooKeeper |
环境准备
修改主机名
hostnamectl set-hostname hd1
su
配置静态IP
vim /etc/sysconfig/network-scripts/ifcfg-ens33
#修改UUID和IPADDR即可
-----------------------------------------------------
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.139.10
GATEWAY=192.168.139.2
NETMASK=255.255.255.0
DNS1=114.114.114.114
-----------------------------------------------------
systemctl restart network
域名解析
cat >> /etc/hosts <<EOF
192.168.139.10 hd1
192.168.139.20 hd2
192.168.139.30 hd3
192.168.139.40 hd4
192.168.139.50 hd5
192.168.139.60 hd6
EOF
关闭防火墙,selinux
systemctl stop firewalld
systemctl disable firewalld
iptables -F
setenforce 0
sed -i 's/SELINUX=enforced/SELINUX=disabled/' /etc/selinux/config
时间同步
ntpdate cn.ntp.org.cn
yum源配置(阿里源)
cd /etc/yum.repos.d/
mv CentOS-Base.repo CentOS-Base.repo.bak
wget -O CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
cd
yum clean all
yum makecache
免密登录(互免,在主机1上粘贴即可)
ssh-keygen -t rsa -f /root/.ssh/id_rsa -N ''
cd /root/.ssh/
cp id_rsa.pub authorized_keys
for i in hd{2..6}
do
scp -r /root/.ssh $i:/root
done
JDK部署
yum install -y lrzsz
rz
tar xf jdk-8u191-linux-x64.tar.gz
mv jdk1.8.0_191/ /usr/local/jdk
cat >> /etc/profile << EOF
export JAVA_HOME=/usr/local/jdk
export PATH=\${JAVA_HOME}/bin:\$PATH
EOF
source /etc/profile
java -version
ZooKeeper部署(节点4 5 6)
节点4安装zookeeper
[root@hd4 ~]# rz
[root@hd4 ~]# tar xf zookeeper-3.4.14.tar.gz
[root@hd4 ~]# mv zookeeper-3.4.14 /usr/local/zookeeper
[root@hd4 ~]# cd /usr/local/zookeeper/conf/
修改配置文件zoo.cfg(默认命名,更改,可能会启动失败)
每台服务器myid不同,需要分别修改,这里设置:
server.1对应的myid为1
server.2对应的myid为2
server.3对应的myid为3
2888端口:follower连接到leader机器的端口
3888端口:leader选举端口
[root@hd4 conf]# cp zoo_sample.cfg zoo.cfg
[root@hd4 conf]# vim zoo_sample
[root@hd4 conf]# cat zoo_sample |grep -v "#"
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/data
clientPort=2181
server.1=hd4:2888:3888
server.2=hd5:2888:3888
server.3=hd6:2888:3888
拷贝ZooKeeper配置到节点5和节点6
[root@hd4 conf]# scp -r /usr/local/zookeeper/ hd5:/usr/local/
[root@hd4 conf]# scp -r /usr/local/zookeeper/ hd6:/usr/local/
创建dataDir和myid文件
mkdir /opt/data
[root@hd4 ~]# echo 1 > /opt/data/myid
[root@hd5 ~]# echo 2 > /opt/data/myid
[root@hd6 ~]# echo 3 > /opt/data/myid
将/usr/local/zookeeper/bin导入环境变量
echo 'PATH=$PATH:/usr/local/zookeeper/bin' >>/etc/profile
source /etc/profile
启动ZooKeeper
zkServer.sh start
查看状态(需要三个节点全部启动,才能查看)
[root@hd4 conf]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[root@hd5 ~]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: leader
[root@hd6 ~]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
Hadoop部署
- 修改相关配置文件的JAVA_HOME
[root@hd1 ~]# cd /opt/hadoop/etc/hadoop/
[root@hd1 hadoop]# vim hadoop-env.sh
25 export JAVA_HOME=/usr/local/jdk
[root@hd1 hadoop]# vim mapred-env.sh
16 export JAVA_HOME=/usr/local/jdk
[root@hd1 hadoop]# vim yarn-env.sh
23 export JAVA_HOME=/usr/local/jdk
- 编写相关内容
[root@hd1 hadoop]# vim core-site.xml
<configuration>
<!-- 指定hdfs的nameservice为ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/data/tmp</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hd4:2181,hd5:2181,hd6:2181</value>
</property>
</configuration>
[root@hd1 hadoop]# vim hdfs-site.xml
<configuration>
<!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>hd1:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>hd1:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>hd2:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>hd2:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hd4:8485;hd5:8485;hd6:8485/ns1</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/data/journal</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
</configuration>
[root@hd1 hadoop]# vim slaves
[root@hd1 hadoop]# cat slaves
hd4
hd5
hd6
[root@hd1 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@hd1 hadoop]# vim mapred-site.xml
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[root@hd1 hadoop]# vim yarn-site.xml
<configuration>
<!-- 指定resourcemanager地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hd3</value>
</property>
<!-- 指定nodemanager启动时加载server的方式为shuffle server -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
- 将Hadoop的软件包与配置拷贝到其他节点
[root@hd1 ~]# for i in hd{2..6};do scp -r /opt/hadoop/ $i:/opt/;done
- 将/opt/hadoop/bin和/opt/hadoop/sbin导入环境变量
echo 'PATH=$PATH:/opt/hadoop/bin' >> /etc/profile
echo 'PATH=$PATH:/opt/hadoop/sbin' >> /etc/profile
source /etc/profile
- 启动集群
启动4,5,6节点的ZooKeeper
[root@hd4 conf]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[root@hd5 ~]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: leader
[root@hd6 ~]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
启动journalnode
[root@hd1 ~]# hadoop-daemons.sh start journalnode
hd4: starting journalnode, logging to /opt/hadoop/logs/hadoop-root-journalnode-hd4.out
hd5: starting journalnode, logging to /opt/hadoop/logs/hadoop-root-journalnode-hd5.out
hd6: starting journalnode, logging to /opt/hadoop/logs/hadoop-root-journalnode-hd6.out
[root@hd4 ~]# ls /opt/data/journal
[root@hd4 ~]# jps
2083 QuorumPeerMain
5444 Jps
5383 JournalNode
格式化NameNode
[root@hd1 ~]# hdfs namenode -format
[root@hd1 ~]# scp -r /opt/data/ hd2:/opt/
格式化zkfc
[root@hd1 ~]# hdfs zkfc -formatZK
启动hdfs
[root@hd1 ~]# start-dfs.sh
Starting namenodes on [hd1 hd2]
hd1: starting namenode, logging to /opt/hadoop/logs/hadoop-root-namenode-hd1.out
hd4: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-hd4.out
hd6: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-hd6.out
hd5: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-hd5.out
Starting journal nodes [hd4 hd5 hd6]
hd5: journalnode running as process 50021. Stop it first.
hd6: journalnode running as process 5477. Stop it first.
hd4: journalnode running as process 5383. Stop it first.
Starting ZK Failover Controllers on NN hosts [hd1 hd2]
hd1: starting zkfc, logging to /opt/hadoop/logs/hadoop-root-zkfc-hd1.out
hd2: starting zkfc, logging to /opt/hadoop/logs/hadoop-root-zkfc-hd2.out
启动yarn
[root@hd3 ~]# start-yarn.sh
查看
[root@hd1 ~]# jps
5374 NameNode
5678 DFSZKFailoverController
5774 Jps
[root@hd2 ~]# jps
5638 NameNode
5737 DFSZKFailoverController
5850 Jps
[root@hd3 ~]# jps
5587 Jps
5309 ResourceManager
[root@hd4 ~]# jps
2083 QuorumPeerMain
5619 NodeManager
5749 Jps
5383 JournalNode
5479 DataNode
[root@hd5 ~]# jps
50162 DataNode
50339 NodeManager
47301 QuorumPeerMain
50021 JournalNode
50476 Jps
[root@hd6 ~]# jps
5552 DataNode
5682 NodeManager
5477 JournalNode
5799 Jps
5179 QuorumPeerMain
- 验证集群正常使用:词频统计
[root@hd1 ~]# vim 1.txt
[root@hd1 ~]# hdfs dfs -mkdir /input
[root@hd1 ~]# hdfs dfs -put 1.txt /input
[root@hd1 ~]# hdfs dfs -ls /input
Found 1 items
-rw-r--r-- 3 root supergroup 219 2021-12-21 23:02 /input/1.txt
[root@hd1 ~]# yarn jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /input /output/00
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
[root@hd1 ~]# hdfs dfs -ls /output/00
Found 2 items
-rw-r--r-- 3 root supergroup 0 2021-12-21 23:05 /output/00/_SUCCESS
-rw-r--r-- 3 root supergroup 237 2021-12-21 23:05 /output/00/part-r-00000
[root@hd1 ~]# hdfs dfs -cat /output/00/part-r-00000
"ens33" 2
"no" 1
"stable-privacy" 1
"yes" 3
114.114.114.114 1
192.168.139.10 1
192.168.139.2 1
255.255.255.0 1
CONF 1
DEVICE 1
DNS1 1
GATEWAY 1
IPADDR 1
IPV6_ADDR_GEN_MODE 1
IPV6_DEFROUTE 1
IPV6_FAILURE_FATAL 1
NAME 1
NETMASK 1
ONBOOT 1
- Windows访问
域名解析C:\Windows\System32\drivers\etc\hosts
192.168.139.10 hd1
192.168.139.20 hd2
192.168.139.30 hd3
192.168.139.40 hd4
192.168.139.50 hd5
192.168.139.60 hd6
查看NameNode状态
- hd1:50070
- hd2:50070
查看yarn状态
- hd3:8088
Ambari自动部署Hadoop
Ambari介绍
-
Apache Ambari项目旨在通过开发用于配置,管理和监控Apache Hadoop集群的软件来简化Hadoop管理。Ambari提供了一个由RESTful API支持的直观,易用的Hadoop管理Web UI
-
具体功能
- 提供Hadoop集群
- Ambari提供了跨任意数量的主机安装Hadoop服务的分步向导
- Ambari处理群集的Hadoop服务配置
- 管理hadoop集群
- 监控Hadoop集群
- Ambari提供了一个仪表板,用于监控Hadoop集群的运行状况和状态
- Ambari利用Ambari指标系统进行指标收集
- Ambari利用Ambari Alert Framework进行系统警报,并在需要您注意时通知您(例如,节点出现故障,剩余磁盘空间不足等)
- 使用Ambari REST API轻松将Hadoop配置,管理和监控功能集成到自己的应用程序中
- 提供Hadoop集群
-
Ambari本身也是一个分布式架构软件,主要由两部分组成:Ambari Server和Ambari Agent
-
用户通过Ambari Server通知Ambari Agent安装对应的软件,Agent会定时发送各个机器每个软件模块的状态给Server,最终这些状态信息会呈现给Ambari的GUI,方便用户了解到集群中各组件状态,做出相应的维护策略
-
Ambari Agent实际用于部署hadoop集群,Ambari Server用于管理,监控hadoop集群
部署
主机名 | IP地址 | 角色 | 备注 |
---|---|---|---|
hd1 | 192.168.139.10 | agent | 需要运行ZKFC |
hd2 | 192.168.139.20 | agent | 需要运行ZKFC |
hd3 | 192.168.139.30 | agent | |
hd4 | 192.168.139.40 | agent | 安装ZooKeeper |
hd5 | 192.168.139.50 | agent | 安装ZooKeeper |
hd6 | 192.168.139.60 | agent | 安装ZooKeeper |
ambari_server | 192.168.139.70 | server |
环境准备
修改主机名
hostnamectl set-hostname ambari_server
su
配置静态IP
vim /etc/sysconfig/network-scripts/ifcfg-ens33
#修改UUID和IPADDR即可
-----------------------------------------------------
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.139.70
GATEWAY=192.168.139.2
NETMASK=255.255.255.0
DNS1=114.114.114.114
-----------------------------------------------------
systemctl restart network
域名解析
cat >> /etc/hosts <<EOF
192.168.139.10 hd1 hd1.a.com
192.168.139.20 hd2 hd2.a.com
192.168.139.30 hd3 hd3.a.com
192.168.139.40 hd4 hd4.a.com
192.168.139.50 hd5 hd5.a.com
192.168.139.60 hd6 hd6.a.com
192.168.139.70 ambari_server ambari-server.a.com
EOF
关闭防火墙,selinux
systemctl stop firewalld
systemctl disable firewalld
iptables -F
setenforce 0
sed -i 's/SELINUX=enforced/SELINUX=disabled/' /etc/selinux/config
时间同步
ntpdate cn.ntp.org.cn
yum源配置(阿里源)
cd /etc/yum.repos.d/
mv CentOS-Base.repo CentOS-Base.repo.bak
wget -O CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
cd
yum clean all
yum makecache
免密登录(互免,在Ambari_server上实现)
ssh-keygen -t rsa -f /root/.ssh/id_rsa -N ''
cd /root/.ssh/
cp id_rsa.pub authorized_keys
for i in hd{1..6}
do
scp -r /root/.ssh $i:/root
done
JDK部署
[root@ambari_server ~]# yum install -y lrzsz
[root@ambari_server ~]# rz
[root@ambari_server ~]# tar xf jdk-8u191-linux-x64.tar.gz
[root@ambari_server ~]# mv jdk1.8.0_191/ /usr/local/jdk
[root@ambari_server ~]# for i in hd{1..6}; do scp -r /usr/local/jdk $i:/usr/local/;done
[root@ambari_server ~]# cat >> /etc/profile << EOF
export JAVA_HOME=/usr/local/jdk
export PATH=\${JAVA_HOME}/bin:\$PATH
EOF
[root@ambari_server ~]# source /etc/profile
[root@ambari_server ~]# java -version
[root@ambari_server ~]# for i in hd{1..6}; do scp -r /etc/profile $i:/etc/profile;done
在6个agent节点实现
source /etc/profile
java -version
数据库部署(ambari_server节点)
安装mariadb
[root@ambari_server ~]# yum install -y mariadb mariadb-server.x86_64
[root@ambari_server ~]# systemctl start mariadb.service
[root@ambari_server ~]# systemctl enable mariadb.service
[root@ambari_server ~]# mysqladmin -uroot password 123456
创建数据库并授权
[root@ambari_server ~]# mysql -uroot -p123456
MariaDB [(none)]> create database ambari character set utf8;
MariaDB [(none)]> use ambari
MariaDB [ambari]> grant all on ambari.* to 'ambari'@'ambari_server' identified by '123456';
MariaDB [ambari]> grant all on ambari.* to 'ambari'@'%' identified by '123456';
# '%'代表除本机外的所有主机
MariaDB [(none)]> flush privileges;
验证授权
[root@ambari_server ~]# mysql -h ambari_server -uambari -p123456
ERROR 1045 (28000): Access denied for user 'ambari'@'ambari_server' (using password: YES)
报错:ERROR 1045 (28000): Access denied for user ‘ambari’@‘ambari_server’ (using password: YES)
解决:
登录mysql数据库 [root@ambari_server ~]# mysql -uroot -p123456 MariaDB [(none)]> use mysql MariaDB [mysql]> select host,user,password from user; +----------------+--------+-------------------------------------------+ | host | user | password | +----------------+--------+-------------------------------------------+ | localhost | root | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 | | ambari\_server | root | | | 127.0.0.1 | root | | | ::1 | root | | | localhost | | | | ambari\_server | | | | ambari_server | ambari | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 | | % | ambari | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 | +----------------+--------+-------------------------------------------+ 删除user表中user列为空格的行: MariaDB [mysql]> delete from user where user=' '; MariaDB [mysql]> select host,user,password from user; +----------------+--------+-------------------------------------------+ | host | user | password | +----------------+--------+-------------------------------------------+ | localhost | root | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 | | ambari\_server | root | | | 127.0.0.1 | root | | | ::1 | root | | | ambari_server | ambari | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 | | % | ambari | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 | +----------------+--------+-------------------------------------------+ 刷新授权表 MariaDB [(none)]> flush privileges; MariaDB [(none)]> exit 成功登录 [root@ambari_server ~]# mysql -h ambari_server -uambari -p123456 MariaDB [(none)]>
java&mysql的连接工具
[root@ambari_server ~]# yum install -y mysql-connector-java.noarch
本地yum源部署
开启httpd服务
[root@ambari_server ~]# yum install -y httpd
传入ambari软件包
[root@ambari_server ~]# cd /var/www/html/
[root@ambari_server ~]# rz
[root@ambari_server ~]# ls
ambari.zip HDP.zip HDP-UTIL.zip
[root@ambari_server html]# unzip ambari.zip
[root@ambari_server html]# unzip HDP.zip
[root@ambari_server html]# unzip HDP-UTIL.zip
自建yum源
[root@localhost ~]# vim /etc/yum.repos.d/ambari.repo
[root@ambari_server yum.repos.d]# vim ambari.repo
[root@ambari_server yum.repos.d]# cat ambari.repo
[ambari-2.6.1.5]
name=ambari Version - ambari-2.6.1.5
baseurl=http://ambari_server/ambari
gpgcheck=0
enabled=1
priority=1
[root@ambari_server yum.repos.d]# vim hdp.repo
[root@ambari_server yum.repos.d]# cat hdp.repo
#VERSION_NUMBER=2.6.1.0-129
[HDP-2.6.1.0]
name=HDP Version - HDP-2.6.1.0
baseurl=http://ambari_server/HDP
gpgcheck=1
gpgkey=http://ambari_servr/HDP/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
# 这里还缺少了 HDP-UTIL的源,暂时找不到资源
开启httpd服务
[root@ambari_server ~]# systemctl start httpd
[root@ambari_server ~]# systemctl enable httpd
验证
[root@ambari_server ~]# yum clean all
[root@ambari_server ~]# yum makecache
[root@ambari_server ~]# yum repolist
安装Ambari
[root@ambari_server ~]# cd /var/www/html
[root@ambari_server html]# rz
# 上传ambari-server,ambari-agent包
[root@ambari_server html]# yum install -y /var/www/html/ambari-server-2.5.1.0-159.x86_64.rpm
[root@ambari_server html]# ls
ambari-agent-2.5.1.0-159.x86_64.rpm ambari-server-2.5.1.0-159.x86_64.rpm
[root@ambari_server html]# for i in hd{1..6}
do
scp -r ambari-agent-2.5.1.0-159.x86_64.rpm $i:/root
done
所有agent节点安装ambari-agent
yum install -y /root/ambari-agent-2.5.1.0-159.x86_64.rpm
初始化Ambari Server
ambari框架数据
[root@ambari_server ~]# ls /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql
/var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql
ambari框架数据导入数据库
[root@ambari_server ~]# mysql -h ambari_server -uambari -p123456
MariaDB [(none)]> use ambari
MariaDB [ambari]> source /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql
MariaDB [ambari]> show tables;
+-------------------------------+
| Tables_in_ambari |
+-------------------------------+
| ClusterHostMapping |
| QRTZ_BLOB_TRIGGERS |
| QRTZ_CALENDARS |
| QRTZ_CRON_TRIGGERS |
...
| widget |
| widget_layout |
| widget_layout_user_widget |
+-------------------------------+
105 rows in set (0.00 sec)
初始化
[root@ambari_server ~]# ambari-server setup
Using python /usr/bin/python
Setup ambari-server
Checking SELinux...
SELinux status is 'disabled'
Customize user account for ambari-server daemon [y/n] (n)? y
Enter user account for ambari-server daemon (root):root
Adjusting ambari-server permissions and ownership...
Checking firewall status...
Checking JDK...
[1] Oracle JDK 1.8 + Java Cryptography Extension (JCE) Policy Files 8
[2] Oracle JDK 1.7 + Java Cryptography Extension (JCE) Policy Files 7
[3] Custom JDK
==============================================================================
Enter choice (1): 3 选择自定义JDK
WARNING: JDK must be installed on all hosts and JAVA_HOME must be valid on all hosts.
WARNING: JCE Policy files are required for configuring Kerberos security. If you plan to use Kerberos,please make sure JCE Unlimited Strength Jurisdiction Policy Files are valid on all hosts.
Path to JAVA_HOME: /usr/local/jdk
Validating JDK on Ambari Server...done.
Completing setup...
Configuring database...
Enter advanced database configuration [y/n] (n)? y 配置数据库
Configuring database...
==============================================================================
Choose one of the following options:
[1] - PostgreSQL (Embedded)
[2] - Oracle
[3] - MySQL / MariaDB
[4] - PostgreSQL
[5] - Microsoft SQL Server (Tech Preview)
[6] - SQL Anywhere
[7] - BDB
==============================================================================
Enter choice (1): 3 选择mysql
Hostname (localhost): ambari_server
Invalid hostname.
Hostname (localhost): ambari-server.a.com 主机名需要符合规范
Port (3306): 3306 端口
Database name (ambari): ambari 数据库
Username (ambari): ambari 用户
Enter Database Password (bigdata): 123456
Re-enter password: 123456
Configuring ambari database...
Configuring remote database connection properties...
WARNING: Before starting Ambari Server, you must run the following DDL against the database to create the schema: /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql
Proceed with configuring remote database connection properties [y/n] (y)? y 配置远程连接
Extracting system views...
ambari-admin-2.5.1.0.159.jar
...........
Adjusting ambari-server permissions and ownership...
Ambari Server 'setup' completed successfully.
启动Ambari-server
[root@ambari_server ~]# ambari-server start
Using python /usr/bin/python
Starting ambari-server
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Ambari database consistency check started...
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start........................
Server started listening on 8080
DB configs consistency check: no errors and warnings were found.
Ambari Server 'start' completed successfully.
开机自启
[root@ambari_server ~]# chkconfig --list |grep ambari
ambari-server 0:关 1:关 2:开 3:开 4:开 5:开 6:关
- 浏览器访问192.168.139.70:8080
- 用户名和密码都是admin