hadoop组件---数据仓库(五)---通过JDBC连接hive的thrift或者hiveserver2

最新推荐文章于 2024-08-05 20:59:55 发布

张小凡vip

最新推荐文章于 2024-08-05 20:59:55 发布

阅读量1.9w

点赞数 8

分类专栏：数据仓库文章标签： hive Thrift hiveserver2 JDBC

本文链接：https://blog.csdn.net/zzq900503/article/details/79034354

版权

数据仓库专栏收录该内容

47 篇文章 36 订阅

订阅专栏

我们在上一篇文章中已经学习了Hive的常用命令，但是如果使用其他的语言如何跟Hive进行交互呢。

Thrift简介

Hive拥有HiveServer(Thrift)或者Hiveserver2组件，提供了JDBC驱动服务，使得我们可以用Java代码或者Python来连接Hive并进行一些关系型数据库的sql语句查询等操作。

HiveServer或者HiveServer2都是基于Thrift的，但HiveSever有时被称为Thrift server，而HiveServer2却不会。既然已经存在HiveServer为什么还需要HiveServer2呢？这是因为HiveServer不能处理多于一个客户端的并发请求，这是由于HiveServer使用的Thrift接口所导致的限制，不能通过修改HiveServer的代码修正。因此在Hive-0.11.0版本中重写了HiveServer代码得到了HiveServer2，进而解决了该问题。HiveServer2支持多客户端的并发和认证，为开放API客户端如JDBC、ODBC提供了更好的支持。

既然HiveServer2提供了更强大的功能，将会对其进行着重学习，但也会简单了解一下HiveServer的使用方法。在命令中输入

hive --service help

结果如下

我们可以看到上边输出项Server List，里边显示出Hive支持的服务列表，beeline cli help hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat，下面介绍最有用的一些服务
1、cli：是Command Line Interface 的缩写，是Hive的命令行界面，用的比较多，是默认服务，直接可以在命令行里使用

2、hiveserver：这个可以让Hive以提供Thrift服务的服务器形式来运行，可以允许许多个不同语言编写的客户端进行通信，使用需要启动HiveServer服务以和客户端联系，我们可以通过设置HIVE_PORT环境变量来设置服务器所监听的端口，在默认情况下，端口号为10000，这个可以通过以下方式来启动Hiverserver：

bin/hive --service hiveserver -p 10002

其中-p参数也是用来指定监听端口的

3、hwi：其实就是hive web interface的缩写它是hive的web借口，是hive cli的一个web替代方案

4、jar：与hadoop jar等价的Hive接口，这是运行类路径中同时包含Hadoop 和Hive类的Java应用程序的简便方式

5、metastore：在默认的情况下，metastore和hive服务运行在同一个进程中，使用这个服务，可以让metastore作为一个单独的进程运行，我们可以通过METASTOE——PORT来指定监听的端口号

从结果可以了解到，可以使用

hive <parameters> --service serviceName <serviceparameters>

启动特定的服务，如cli、hiverserver、hiveserver2等。

在命令行输入

hive --service hiveserver –help
hive --service hiveserver2 –help

查看hiveserver的帮助信息：

从图中可以看到有些版本hive已经不支持hiveserver了，只能使用hiveserver2。

启动hiveserver2

使用命令启动

hive --service hiveserver2

如图:

hiveserver2配置

配置文件可以使用如下命令查找:

whereis hive

如图:

Hiveserver2允许在配置文件hive-site.xml中进行配置管理，常用参数有：
hive.server2.thrift.min.worker.threads– 最小工作线程数，默认为5。 hive.server2.thrift.max.worker.threads – 最小工作线程数，默认为500。 hive.server2.thrift.port– TCP 的监听端口，默认为10000。 hive.server2.thrift.bind.host– TCP绑定的主机，默认为localhost。
也可以设置环境变量HIVE_SERVER2_THRIFT_BIND_HOST和HIVE_SERVER2_THRIFT_PORT覆盖hive-site.xml设置的主机和端口号。
hive.server2.thrift.port 10000 hive.server2.thrift.bind.host 192.168.48.130
从Hive-0.13.0开始，HiveServer2支持通过HTTP传输消息，该特性当客户端和服务器之间存在代理中介时特别有用。与HTTP传输相关的参数如下：
hive.server2.transport.mode – 默认值为binary（TCP），可选值HTTP。 hive.server2.thrift.http.port– HTTP的监听端口，默认值为10001。 hive.server2.thrift.http.path – 服务的端点名称，默认为 cliservice。 hive.server2.thrift.http.min.worker.threads– 服务池中的最小工作线程，默认为5。 hive.server2.thrift.http.max.worker.threads– 服务池中的最小工作线程，默认为500。

默认情况下，HiveServer2以提交查询的用户执行查询（true），如果hive.server2.enable.doAs设置为false，查询将以运行hiveserver2进程的用户运行。

<property>
  <name>hive.server2.enable.doAs</name>
  <value>true</value>
</property>

为了防止非加密模式下的内存泄露，可以通过设置下面的参数为true禁用文件系统的缓存：
fs.hdfs.impl.disable.cache – 禁用HDFS文件系统缓存，默认值为false。 fs.file.impl.disable.cache – 禁用本地文件系统缓存，默认值为false。

测试hiveserver2

hive自带了一个thrift的客户端——-beeline
打开beeline
使用命令

beeline

连接hiveserver2
使用命令

!connect jdbc:hive2://host253:10000

（host253是hiveserver2所启动的那台主机名，端口默认是10000）
如果是在hiveserver2所启动的那台主机进行操作也可以使用命令

!connect jdbc:hive2://localhost:10000

有可能需要输入当前linux用户名和密码。
正常连接上之后会出现
0: jdbc:hive2://host253:10000>
这时可以尝试操作数据库了，使用命令

show databases;

结果如下图:

HiveServer2 Thrift API

https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API

用Java代码通过JDBC连接Hiveserver2

添加依赖的包

依赖的jar包有以下几个：

hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
$HIVE_HOME/lib/hive-exec-0.11.0.jar 
$HIVE_HOME/lib/hive-jdbc-0.11.0.jar 
$HIVE_HOME/lib/hive-metastore-0.11.0.jar  
$HIVE_HOME/lib/hive-service-0.11.0.jar   
$HIVE_HOME/lib/libfb303-0.9.0.jar   
$HIVE_HOME/lib/commons-logging-1.0.4.jar  
$HIVE_HOME/lib/slf4j-api-1.6.1.jar

如果是使用Maven，则加入依赖如下:

<dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-jdbc</artifactId>
        <version>0.11.0</version>
</dependency>

<dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.2.0</version>
</dependency>

Java代码如下

package com.test;

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveJdbcTest {

    private static String driverName = "org.apache.hive.jdbc.HiveDriver";

    public static void main(String[] args) throws SQLException {
        try {
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
            System.exit(1);
        }

        Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "zzq", "12345");
        Statement stmt = con.createStatement();
        String tableName = "students";
        stmt.execute("drop table if exists " + tableName);
        stmt.execute("create table " + tableName +  " (key int, value string)");
        System.out.println("Create table success!");
        // show tables
        String sql = "show tables '" + tableName + "'";
        System.out.println("Running: " + sql);
        ResultSet res = stmt.executeQuery(sql);
        if (res.next()) {
            System.out.println(res.getString(1));
        }

        // describe table
        sql = "describe " + tableName;
        System.out.println("Running: " + sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(res.getString(1) + "\t" + res.getString(2));
        }


        sql = "select * from " + tableName;
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2));
        }

        sql = "select count(1) from " + tableName;
        System.out.println("Running: " + sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(res.getString(1));
        }
    }
}

Hiveserver1和hiveserver2的JDBC区别

HiveServer version
HiveServer2
HiveServer1

Connection URL
jdbc:hive2://:
jdbc:hive://:

Driver Class
org.apache.hive.jdbc.HiveDriver
org.apache.hadoop.hive.jdbc.HiveDriver

用Java代码通过JDBC连接Hiveserver

我们前面已经熟悉了用Java代码通过JDBC连接Hiveserver2，也知道了Hiveserver1和hiveserver2的JDBC区别，连接Hiveserver只需要修改相应的URL和驱动即可。
也就是

private static String driverName = "org.apache.hive.jdbc.HiveDriver";
改为
private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";

Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10002/default", "zzq", "12345");
改为
Connection con = DriverManager.getConnection("jdbc:hive://localhost:10002/default", "zzq", "12345");

Java代码如下

package com.test;

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveJdbcTest {

    private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";

    public static void main(String[] args) throws SQLException {
        try {
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
            System.exit(1);
        }

Connection con = DriverManager.getConnection("jdbc:hive://localhost:10002/default", "zzq", "12345");
        Statement stmt = con.createStatement();
        String tableName = "students";
        stmt.execute("drop table if exists " + tableName);
        stmt.execute("create table " + tableName +  " (key int, value string)");
        System.out.println("Create table success!");
        // show tables
        String sql = "show tables '" + tableName + "'";
        System.out.println("Running: " + sql);
        ResultSet res = stmt.executeQuery(sql);
        if (res.next()) {
            System.out.println(res.getString(1));
        }

        // describe table
        sql = "describe " + tableName;
        System.out.println("Running: " + sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(res.getString(1) + "\t" + res.getString(2));
        }


        sql = "select * from " + tableName;
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2));
        }

        sql = "select count(1) from " + tableName;
        System.out.println("Running: " + sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(res.getString(1));
        }
    }
}