Hive分布式集群部署

最新推荐文章于 2024-07-24 08:30:24 发布

仗剑江湖.红尘笑

最新推荐文章于 2024-07-24 08:30:24 发布

阅读量2.8k

点赞数 2

文章标签： hadoop hive

本文链接：https://blog.csdn.net/lmocm/article/details/107956148

版权

Hive编程，可以使开发者不需要考虑底层MapReduce算法如何实现，而只需集中精力关注SQL语句的编写即可。

Apache Hive 是建立在Hadoop上的数据仓库，它提供了一系列工具，可以用来查询和分析数据，Hive提供了执行SQL的接口，用于操作存储在Hadoop分布式文件系统HDFS中的数据。

Hive 可以将结构化的数据文件映射成为一张数据库表，并且提供了便捷的sql语句查询功能，开发者可以通过SQL语句将实现的业务功能转化为MapReduce任务来运行。

Hive 数据仓库是构建在Hadoop的分布式文件系统HDFS之上，而Hive底层的设计是通过MapReduce计算框架来执行用户提交的任务。 Hive比较适合处理离线数据。

部署Hive 分布式之前，先部署 Hadoop 分布式集群系统。然后 hive 通过 haproxy + mysql 组成。

一，Hive 体系结构：

hive 体系结构由多个组件组成，其中包含元数据，驱动(包含编辑器，优化器，执行器），用户接口(client,UI,ThriftServer）。 Hive 体系结构如图：

1，元数据（metastore）

元数据通常存储在关系型数据库C RDBMS ）中，如MySQL，元数据中包含了表名、表列、分区、表的类型（是否属于外部表〉和数据存储的路径等信息。

2，驱动(driver)

Hive 的驱动在接收到Hive SQL 语句后，通过创建会话来启动语旬的执行，并监控执行的生命周期和进度。同时，它会将Hive SQL 在执行过程中产生的元数据信息进行存储。

3, 编辑器

编译器对Hive SQL 查询进行编译，将其转化成可执行的计划，该计划包含了Hadoop

MapReduce 需要执行的任务和步骤。编译器将查询转换为抽象语法树CAS T ）。

编译器在检查兼容性和编译时错误之后，将抽象语法树CAST ）转换为有向无环图

( DAG ）。有向无环图根据输入的查询和数据将操作符划分到Map Reduce 的各个阶段

( Stage ）和任务（ Task ）中。

4，优化器

在执行计划上执行各种转换以获得优化的有向无环图（ DAG ），如将连接管道转换成单个连接来获得更好的性能。

优化器还可以拆分任务，如在Reduce 操作之前对数据应用进行转换，以便提供更好的性能和可伸缩性。

5 . 执行器

在编译和优化之后，执行器将执行任务。它对Hadoop 的作业CJob ）进行跟踪和交互，并调度需要运行的任务。

6. 用户接口

客户端（ Client ）在日常开发中用得较为频繁，启动Hive 终端会同时启动一个Hi ve 副本。用户可以使用JDBC C 或ODBC ）客户端连接到Hi ve Server 。

＆注意：连接到Hive Server 时，需指定Hive Server 所在的节点信息，并且确保该节点的Hive Server 服务进程运行正常。Hive 的Thrift Server 支持多语言，如C＋＋、Java 和Python 等。

& 提示： Hive 的数据存储依赖Hadoop 的分布式文件系统（ HDFS ），在Hive 的查询任务中SELECT * FROM TBL 语句不会产生MapReduce 任务，其他带条件和聚合类

的查询都会启动MapRe duce 任务。

二， Hive 与关系型数据库（RDBMS)：

数据仓库（ Hive ）和关系型数据库C RDBMS ）虽然都是将数据进行结构化存储，但是二者之间的使用方式和应用场景还是有区别的。其异同点如表

三，Hive 的安装和配置;

在大数据应用场景下，单个节点的Hive 是难以满足业务需求的。因而需要安装一个高可用、分布式的Hi ve 集群未满足用户提交的任务请求。

这里将Hive 安装在Hadoop 集群中，能够省略一些软件的安装如JDK 、Hadoop 。

3.1 Hive 集群基础架构

一个高可用、分布式的Hive 集群由3 个Hive节点(至少) 和2 个代理（ HAProxy ）构成。3个Hive 节点负责提交任务到集群上， 2 个代理(HAProxy, haproxy 可以使用keepalive自动切换）负责给客户端（ Cli ent ）提供服务并承担负载均衡的职责。

实际测试环境如下：

[hadoop@big-master2 ~]$ cat /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

## bigdata cluster ##

192.168.41.20 big-master1 #bigdata1 namenode1,zookeeper,resourcemanager Haproxy-master

192.168.41.21 big-master2 #bigdata2 namenode2,zookeeper,slave,resourcemanager haproxy-standby

192.168.41.22 big-slave01 #bigdata3 datanode1,zookeeper,slave hive1

192.168.41.25 big-slave02 #bigdata4 datanode2,zookeeper,slave hive2

192.168.41.27 big-slave03 #bigdata5 datanode3,zookeeper,slave hive3

192.168.41.17 tidb05.500.com #hive mysql

----------------------------------------------------

3.2 Hive 集群的架构图：

备注：所有的数据仓库Hive 节点的地址需要指向相同的hadoop 分布式文件系统(hdfs)上，否则，各个Hive节点的数据源地址不一致，会导致统计结果发生错误。

3.3 利用Haproxy 实现Hive server 负载均衡

Haproxy 是一款提供高可用性，负载均衡及基于TCP(第4层)和HTTP(第7层）应用的代理软件，Haproxy是完全开源免费的，它可以快速提供代理解决方案。这里haproxy 通过 master - standby 模式部署，分别在big-master1/2 节点上。 hive 分别在 hadoop 数据节点 big-slave01-3 三个节点上。元数据信息放在独立的mysql 数据库上。

https://www.haproxy.org/ -- haproxy 官方网站

https://www.haproxy.org/download/ --haproxy下载地址

http://www.apache.org/dyn/closer.cgi/hive/ --hive官方网站

https://mirrors.tuna.tsinghua.edu.cn/apache/hive/ --hive下载地址

3.4 haproxy 安装编辑：

安装gcc组，安装openssl

tar -xvf haproxy-1.8.24.tar.gz -C /usr/local/ && cp haproxy-1.8* haproxy

主备一样：

[hadoop@big-master1 ~]$ haproxy -vv

HA-Proxy version 1.8.24 2020/02/15

Build options :

TARGET = linux2628

CPU = generic

CC = gcc

CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label

OPTIONS = USE_LIBCRYPT=1 USE_CRYPT_H=1 USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :

maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017

Running on OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017

OpenSSL library supports TLS extensions : yes

OpenSSL library supports SNI : yes

OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Encrypted password support via crypt(3): yes

Built with multi-threading support.

Built with PCRE version : 8.32 2012-11-30

Running on PCRE version : 8.32 2012-11-30

PCRE library supports JIT : no (USE_PCRE_JIT not set)

Built with zlib version : 1.2.7

Running on zlib version : 1.2.7

Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with network namespace support.

Available polling systems :

epoll : pref=300, test result OK

poll : pref=200, test result OK

select : pref=150, test result OK

Total: 3 (3 usable), will use epoll.

Available filters :

[SPOE] spoe

[COMP] compression

[TRACE] trace

3.4.1 对应的配置文件：

[hadoop@big-master1 ~]$ cat /usr/local/haproxy/

.build_opts contrib/ ebtree/ haproxy LICENSE README scripts/ tests/

CHANGELOG CONTRIBUTING examples/ haproxy-1.8.24/ MAINTAINERS reg-tests/ src/ VERDATE

config.cfg doc/ .gitignore include/ Makefile ROADMAP SUBVERS VERSION

[hadoop@big-master1 ~]$ cat /usr/local/haproxy/config.cfg # 主节点

global

daemon

nbproc 1

defaults

#mode {tcp|http|health},tcp表示4层,http表示7层,health仅作为健康检查使用.

mode tcp

retries 2 #尝试2次失败则从集群移除

option redispatch #如果失效则强制转换其他服务器

option abortonclose #连接数过大则自动关闭

maxconn 1024 #最大连接数

timeout connect 1d #连接超时时间,用来保证hive查询数据能返回结果

timeout client 1d #连接超时时间,用来保证hive查询数据返回结果

timeout server 1d #同上

timeout check 2000 #健康检测时间

log 127.0.0.1 local0 err #日志级别,[err|warning|info|debug]

listen admin_stats #定义管理界面

bind 0.0.0.0:1090 #管理界面访问IP和端口

mode http #管理界面所使用的的协议

maxconn 10 #max connections

stats refresh 30s #30s 自动刷新

stats uri / #访问URL

stats realm Hive\ Haproxy #验证窗口提示

stats auth admin:admin #401验证用户密码

listen hive #hive后端定义

bind 0.0.0.0:10001 #ha作为porxy所绑定的IP端口

mode tcp #以第4层方式代理

balance leastconn #调度算法leastconn最少连接数分配,或者roundrobin轮询分配

maxconn 1024 #max connections

server hive_1 big-slave01:10000 check inter 180000 rise 1 fall 2

server hive_2 big-slave02:10000 check inter 180000 rise 1 fall 2

server hive_3 big-slave03:10000 check inter 180000 rise 1 fall 2

[hadoop@big-master2 ~]$ cat /usr/local/haproxy/config.cfg

global

daemon

nbproc 1

defaults

#mode {tcp|http|health},tcp表示4层,http表示7层,health仅作为健康检查使用.

mode tcp

retries 2 #尝试2次失败则从集群移除

option redispatch #如果失效则强制转换其他服务器

option abortonclose #连接数过大则自动关闭

maxconn 1024 #最大连接数

timeout connect 1d #连接超时时间,用来保证hive查询数据能返回结果

timeout client 1d #连接超时时间,用来保证hive查询数据返回结果

timeout server 1d #同上

timeout check 2000 #健康检测时间

log 127.0.0.1 local0 err #日志级别,[err|warning|info|debug]

listen admin_stats #定义管理界面

bind 0.0.0.0:1090 #管理界面访问IP和端口

mode http #管理界面所使用的的协议

maxconn 10 #max connections

stats refresh 30s #30s 自动刷新

stats uri / #访问URL

stats realm Hive\ Haproxy #验证窗口提示

stats auth admin:admin #401验证用户密码

listen hive #hive后端定义

bind 0.0.0.0:10001 #ha作为porxy所绑定的IP端口

mode tcp #以第4层方式代理

balance leastconn #调度算法leastconn最少连接数分配,或者roundrobin轮询分配

maxconn 1024 #max connections

server hive_1 big-slave01:10000 check inter 180000 rise 1 fall 2

server hive_2 big-slave02:10000 check inter 180000 rise 1 fall 2

server hive_3 big-slave03:10000 check inter 180000 rise 1 fall 2

备注：在server 配置模块中，设直的主机别名或者域名要能够被识别。IP 和端口每隔3分钟（ 180000 毫秒）检查一次。每当有用户请求10000 端口号，就会创建一个

log 文件。如果时间设直太短，会导致log 文件频繁被创建。

测试：

master1：

master 2

3.5 安装分布式Hive 集群.

hive 的核心部分是由java代码实现的，在$hive_home/lib目录下存放着众多JAR文件，如hive-jdbc*.jar, hive-exec*.jar 等。每个jar包文件都实现了hive 对应的功能，用户开发者只需要关心如何使用即可。

Hive 安装：

tar -zxvf apache-hive-2.1.1-bin.tar.gz

mv apache-hive-2.1.1-bin hive

添加全局变量：

/etc/profile

## Hive ##

export HIVE_HOME=/usr/local/hive

export PATH=$HIVE_HOME/bin:$PATH

3.5.1 目录创建：

在Hadoop 分布式文件系统上创建数据仓库（ Hive ）的路径地址。

＃在Hadoop 分布式文件系统（ HDFS ）中创建Hive 目录

[hadoop@big-master1～］$ hdfs dfs mkdir - p /user/hive/warehouse

＃在Hadoop 分布式文件系统（ HDFS ）中创建Hive 临时目录

[hadoop@big-master1 ～］$ hdfs dfs - mkdir - p /tmp/hive/

＃给数据仓库（ Hive ）地址赋予权限

[hadoop@big-master1 ～］$ hdfs dfs - chmod 777 /user/hive/warehouse

＃给Hive 临时目录赋予权限

[hadoop@big-master1 ~]$ hdfs dfs - chmod 777 /tmp/hive

[hadoop@big-master1 ~]$ hdfs dfs -ls /

Found 10 items

drwxr-xr-x - hadoop supergroup 0 2020-05-26 16:17 /data

drwxr-xr-x - hadoop supergroup 0 2020-06-04 23:41 /hbase

drwxr-xr-x - hadoop supergroup 0 2020-05-24 02:53 /sqoop-mysql

drwxr-xr-x - hadoop supergroup 0 2020-05-24 03:04 /sqoop-mysql11

drwxr-xr-x - hadoop supergroup 0 2020-05-24 02:59 /sqoop-mysql22

drwxr-xr-x - hadoop supergroup 0 2020-05-15 14:59 /test

drwxr-xr-x - hadoop supergroup 0 2020-05-18 17:15 /test01

drwx------ - hadoop supergroup 0 2020-06-11 12:22 /tmp

drwxr-xr-x - hadoop supergroup 0 2020-06-11 12:21 /user

drwxr-xr-x - root supergroup 0 2020-05-26 23:08 /var

[hadoop@big-master1 ~]$ hdfs dfs -ls /user/hive

Found 2 items

drwx-wx-wx - hadoop supergroup 0 2020-06-12 12:06 /user/hive/tmp

drwxrwxrwx - hadoop supergroup 0 2020-06-12 17:27 /user/hive/warehouse

[hadoop@big-master1 ~]$ hdfs dfs -ls /user/hive/warehouse/

Found 2 items

drwxrwxrwx - hadoop supergroup 0 2020-06-15 11:44 /user/hive/warehouse/hive_test01.db

drwxrwxrwx - hadoop supergroup 0 2020-06-12 17:22 /user/hive/warehouse/test

3.5.2 配置文件： #hive 主要hadoop big-slave01 - big-slave03 三个数据节点上。

[hadoop@big-slave01 ~]$ cd /usr/local/hive/

[hadoop@big-slave01 hive]$ ls

bin binary-package-licenses conf examples hcatalog jdbc lib LICENSE NOTICE RELEASE_NOTES.txt scripts

[hadoop@big-slave01 hive]$ cd conf/

[hadoop@big-slave01 conf]$ ls

\ hive-env.sh hive-log4j2.properties.template llap-daemon-log4j2.properties.template

beeline-log4j2.properties.template hive-env.sh.template hive-site.xml metastore_db

derby.log hive-exec-log4j2.properties.template ivysettings.xml nohup.out

hive-default.xml.template hive-log4j2.properties llap-cli-log4j2.properties.template parquet-logging.properties

[hadoop@big-slave01 conf]$ cat hive-env.sh

## 添加修改,其他为默认 ##

export HADOOP_HOME=/usr/local/hadoop

export JAVA_HOME=/usr/local/jdk1.8.0_251

export HIVE_HOME=/usr/local/hive

export HIVE_CONF_DIR=$HIVE_HOME/conf

[hadoop@big-slave01 conf]$ cat hive-site.xml

## 以下为新增修改 ##

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>hive.exec.scratchdir</name>

<value>/user/hive/tmp</value> # 在hdfs 系统上，预先创建好。

<description>HDFS路径,用于存储不同map/reduce阶段的执行计划和这些阶段的中间输出结果.</description>

</property>

<name>hive.exec.local.scratchdir</name>

<description>本地存储Hive的作业(JOB)信息</description>

</property>

<name>hive.downloaded.resources.dir</name>

<value>/data/hive/software/${hive.session.id}_resources</value>

<description>本地临时目录添加资源到远程文件系统中</description>

</property>

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive/warehouse</value>

<description>数据仓库的存储路径地址</description>

</property>

<name>javax.jdo.option.ConnectionDirverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>MySQL驱动类</description>

</property>

<name>javax.jdo.option.ConnectionUserName</name>

<description>MySQL数据库登陆账号</description>

</property>

<name>javax.jdo.option.ConnectionPassword</name>

<description>MySQL数据库登陆密码</description>

</property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://192.168.41.17:3309/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>

<description>MySQL数据库的JDBC连接地址</description>

</property>

<name>datanucleus.schema.autoCreateAll</name>

</property>

</configuration>

备注：Hive 默认存储元数据的数据库是Derby ，它只能九许一个会话连接，只适合用于

简单的测试场景。在实际业务场景中，往往会有多个会话连接的情况存在。

为了支持多用户、多会话，需要使用一个独立的元数据库，这里选择了MySQL

作为元数据库。由于Hive 安装包，默认是没有MySQL 驱动包的，所以在启动Hive Server 之前，

需要确保$HIVE_HOME/ lib 目录下存在MySQL 驱动包。 mysql 驱动包详见前面的章节

[hadoop@big-slave01 conf]$ cat hive-log4j2.properties

# 以下为修改,其他默认 #

property.hive.log.dir = /data/hive/log

property.hive.log.file = hive.log

property.hive.perflogger.log.level = INFO

3.5.3 启动

在$HIVE HOME/bin 目录下存放着可以执行各种H ive 服务的脚本文件，其中包含Hive的命令行界面Chive 脚本）。

提示： Hive 命令行界面使用方式包含两种，一种是hive 脚本，无须登录账号和密码；另一种是beeline 脚本，需要使用登录账号和密码。

-- 在运行Hive 命令行或H ive 的Thri 丘Server 之前，需要先初始化元数据信息。

[hadoop@big-slave01 conf]$ schematool -initSchema -dbType mysql

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true

Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver

Metastore connection User: hive_rw

Starting metastore schema initialization to 2.3.0

Initialization script hive-schema-2.3.0.mysql.sql

Error: Syntax error: Encountered "<EOF>" at line 1, column 64. (state=42X01,code=30000)

org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!

Underlying cause: java.io.IOException : Schema script failed, errorcode 2

Use --verbose for detailed stacktrace.

*** schemaTool failed ***

### 提示找不到URL 地址

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://192.168.41.17:3309/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>

<description>MySQL数据库的JDBC连接地址</description>

</property>

这里写错了，所以，这里面的格式，一定要正确。

正常初始化的日志：

＃将元数据信息初始化到MySQL 数据库中

[hadoop@big-slave01 conf]$ schematool -initSchema -dbType mysql

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Metastore connection URL: jdbc:mysql://192.168.41.17:3309/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false

Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver

Metastore connection User: hive_rw

Starting metastore schema initialization to 2.3.0

Initialization script hive-schema-2.3.0.mysql.sql

Initialization script completed

schemaTool completed

此时在MySQL服务器可以看见有对应的连接：（192.168.41.17 mysql数据库服务）

(root@localhost@(none))>> show processlist;

+----+---------+---------------------+------+---------+------+-----------+------------------------------------------------------------------------------------------------------+

+----+---------+---------------------+------+---------+------+-----------+------------------------------------------------------------------------------------------------------+

+----+---------+---------------------+------+---------+------+-----------+------------------------------------------------------------------------------------------------------+

2 rows in set (0.00 sec)

完成hive元数据初始化后，可以运行Hive 命令行界面做简单的测试，具体操作如下：

[hadoop@big-slave01 conf]$ hive -e "show tables;"

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Time taken: 10.9 seconds

----------------------------------------

[hadoop@big-slave02 conf]$ beeline

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Beeline version 2.3.7 by Apache Hive

beeline>

-- Hive 的η盯住Se凹er 提供了可远程访问其他进程的功能，同时也提供了JDBC （或ODBC )

访问Hive 的功能。所有Hive 的Client 都需要元数据服务（ MetaStore ) , Hive 使用该服务来

存储表结构（ Schema ）和其他元数据信息。

3.5.4 启动高可用：

将各个节点的HAProxy 服务进程和Hive 的服务进程进行启动，构建成一个分布式的、高可用的Hive 集群。具体操作命令如下：

＃在data1 节点启动hiveserver2 服务

[hadoop@big-slave01 ~]$ nohup hive --service hiveserver2 &

＃在data2节点启动hivesreserver2 服务

[hadoop@big-slave02 ~]$ nohup hive --service hiveserver2 &

＃在data3 节点启动hiveserver2 服务

[hadoop@big-slave03 ~]$ nohup hive --service hiveserver2 &

＃在big-master1 节点启动haproxy 服务

[hadoop@big-master1 ~]$ haproxy -f /usr/local/haproxy/config.cfg

＃在big-master2 节点启动haproxy 服务

[hadoop@big-maste2 ~]$ haproxy -f /usr/local/haproxy/config.cfg

至此，若整个流程未出错，分布式的、高可用的Hive 集群就安装完成了。

[hadoop@big-master1 ~]$ jps

30037 JournalNode

10181 HMaster

4023 ResourceManager

29642 DFSZKFailoverController

29804 NameNode

28141 QuorumPeerMain

26558 Jps

[hadoop@big-master2 ~]$ jps

20032 NameNode

20116 JournalNode

20324 DFSZKFailoverController

31540 HMaster

18830 QuorumPeerMain

2462 ResourceManager

1982 Jps

[hadoop@big-slave01 ~]$ jps

10161 NodeManager

28338 HRegionServer

10546 RunJar

7702 QuorumPeerMain

6070 Jps

8583 DataNode

8686 JournalNode

10638 RunJar

[root@big-slave02 ~]# jps

5187 DataNode

8581 RunJar

6697 NodeManager

4153 Jps

4362 QuorumPeerMain

5290 JournalNode

25869 HRegionServer

[root@big-slave03 ~]# jps

4562 QuorumPeerMain

5442 DataNode

26004 HRegionServer

6389 RunJar

29622 Jps

6903 NodeManager

5545 JournalNode

[hadoop@big-slave01 ~]$ ps -ef |grep hive

hadoop 6091 5882 0 18:09 pts/0 00:00:00 grep --color=auto hive

hadoop 10546 1 0 Jun12 ? 03:47:02 /usr/local/jdk1.8.0_251/bin/java -Xmx256m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/local/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/local/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dproc_metastore -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/usr/local/hive/conf/parquet-logging.properties -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/local/hive/lib/hive-metastore-2.3.7.jar org.apache.hadoop.hive.metastore.HiveMetaStore

hadoop 10638 1 2 Jun12 ? 18:30:51 /usr/local/jdk1.8.0_251/bin/java -Xmx256m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/local/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/local/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dproc_hiveserver2 -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/usr/local/hive/conf/parquet-logging.properties -Djline.terminal=jline.UnsupportedTerminal -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/local/hive/libhive-service-2.3.7.jar org.apache.hive.service.server.HiveServer2

[hadoop@big-master1 ~]$ ps -ef |grep haproxy

hadoop 26604 25313 0 18:09 pts/0 00:00:00 grep --color=auto haproxy

hadoop 27175 1 0 Jun12 ? 00:11:16 haproxy -f /usr/local/haproxy/config.cfg

---------------------------------- The End --------------------------------------

仗剑江湖.红尘笑

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
Hive分布式集群部署

Hive编程，可以使开发者不需要考虑底层MapReduce算法如何实现，而只需集中精力关注SQL语句的编写即可。Apache Hive 是建立在Hadoop上的数据仓库，它提供了一系列工具，可以用来查询和分析数据，Hive提供了执行SQL的接口，用于操作存储在Hadoop分布式文件系统HDFS中的数据。Hive 可以将结构化的数据文件映射成为一张数据库表，并且提供了便捷的sql语句查询功能，开发者可以通过SQL语句将实现的业务功能转化为MapReduce任务来运行。Hive 数据仓库是构建在Ha
复制链接

扫一扫