Hive 之 API

小强签名设计

已于 2024-02-05 18:14:52 修改

阅读量1.6k

点赞数

分类专栏： hive 文章标签： hive spark big data

于 2021-08-11 15:41:21 首次发布

本文链接：https://blog.csdn.net/m0_37739193/article/details/119608467

版权

hive 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文章目录

一、Java

1.jdbc：

JDBC 连接hive2, 程序挺简单, 跟其他数据库查询类似, 连接/执行查询/得到结果：

package com.huiq.bigdata.test;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class ApiQueryTest {
    private static String driverName = "org.apache.hive.jdbc.HiveDriver";

    public static void main(String[] args){
        try {
            Class.forName(driverName);
            Connection con = null;
            con = DriverManager.getConnection("jdbc:hive2://node01:10000/ods_test_schema1", "root", "123456");
            Statement stmt = con.createStatement();
            ResultSet res = null;
            String sql = "select * from t1";
            System.out.println("Running: " + sql);
            res = stmt.executeQuery(sql);
            System.out.println("ok");
            while (res.next()) {
                System.out.println(res.getString(1) + "\t" + res.getString(2));
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Linux执行结果：

[root@node01 huiq]# java -cp hivejdbc-jar-with-dependencies.jar com.huiq.bigdata.test.ApiQueryTest
Running: select * from t1
ok
-103829163_116	116
-1406838352_113	113
-1511909649_130	130
-1789481461_141	141

IDEA在Windows中执行结果：
在这里插入图片描述
所需依赖：

        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>3.1.0</version>
        </dependency>

2.Spark Sql：

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.hive.HiveContext;

/**
 * @author: huiq
 * @createTime: 2021/8/27 13:55
 * @description: 连接hive测试
 */
public class SparkSqlHiveTest
{
    public static void main( String[] args )
    {
        SparkConf sparkConf = new SparkConf().setAppName("SparkHive").setMaster("local[2]");
        JavaSparkContext sc = new JavaSparkContext(sparkConf);
        HiveContext hiveContext = new HiveContext(sc);
        //不要使用SQLContext,部署异常找不到数据库和表
//        SQLContext sqlContext = new SQLContext(sc);
        //查询表前10条数据
//        hiveContext.sql("select * from datawarehouse_ods_db.ods_ct_taglimit 10").show();

		spark.sql("use datawarehouse_dws_db");
		spark.sql("create table if not exists member_reg_num_day_tmp as \nselect \n    to_date(reg_time) date_time,\n    count(1) num \nfrom \n    datawarehouse_dwd_db.dwd_us_user \nwhere \n    reg_time!='' \ngroup by \n    to_date(reg_time)");
		
        sc.stop();
    }
}

所需依赖：

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>2.3.1</version>
        </dependency>

执行命令：

spark-submit --master local[2] --class com.huiq.bigdata.online.SparkSqlHiveTest hbase-jar-with-dependencies.jar

注：SQLContext和HiveContext区别和联系
Spark SQL程序的主入口是SQLContext类或它的子类。创建一个基本的SQLContext，你只需要SparkContext
除了基本的SQLContext，也可以创建HiveContext。SQLContext和HiveContext区别与联系为：
SQLContext现在只支持SQL语法解析器（SQL-92语法）
HiveContext现在支持SQL语法解析器和HiveSQL语法解析器，默认为HiveSQL语法解析器，用户可以通过配置切换成SQL语法解析器，来运行HiveSQL不支持的语法。
使用HiveContext可以使用Hive的UDF，读写Hive表数据等Hive操作。SQLContext不可以对Hive进行操作。
Spark SQL未来的版本会不断丰富SQLContext的功能，做到SQLContext和HiveContext的功能容和，最终可能两者会统一成一个Context
HiveContext包装了Hive的依赖包，把HiveContext单独拿出来，可以在部署基本的Spark的时候就不需要Hive的依赖包，需要使用HiveContext时再把Hive的各种依赖包加进来。
SQL的解析器可以通过配置spark.sql.dialect参数进行配置。在SQLContext中只能使用Spark SQL提供的”sql“解析器。在HiveContext中默认解析器为”hiveql“，也支持”sql“解析器。

二、Scala

1.Spark Sql：

import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

/**
 * @Auther: huiq
 * @Date: 2021/7/23
 * @Description: 连接hive测试
 */
object SparkSqlHiveTest {

  def main(args: Array[String]): Unit = {

    //初始化spark
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName(this.getClass.getSimpleName)
    val spark: SparkSession = SparkSession.builder().enableHiveSupport().config(sparkConf).getOrCreate()

    //选择Hive库
    spark.sql("use datawarehouse_dws_db");
	spark.sql("create table if not exists member_reg_num_day_tmp as \nselect \n    to_date(reg_time) date_time,\n    count(1) num \nfrom \n    datawarehouse_dwd_db.dwd_us_user \nwhere \n    reg_time!='' \ngroup by \n    to_date(reg_time)");
	
    spark.stop();
  }
}

所需依赖：

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>2.3.1</version>
        </dependency>

执行命令：

spark-submit --master local[2] --class com.huiq.bigdata.online.SparkSqlHiveTest hbase-jar-with-dependencies.jar

注：如果要在IDEA中纯maven项目中执行的话需要把服务器上的hive-site.xml、hdfs-site.xml、core-site.xml文件放到resource目录下；但是在springboot项目下整合却报错：Exception in thread "main" java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT，可能是springboot内部有什么处理导致依赖冲突，可能需要重新编译spark-hive的jar包，思路可参考spark hive java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
在这里插入图片描述
遇到的问题：org.apache.thrift.TApplicationException: Required field 'filesAdded' is unset! Struct:InsertEventRequestData(filesAdded:null)

解决：把sql语句转换成这样的可以成功：spark.sql("select \n to_date(reg_time) date_time,\n count(1) num \nfrom \n datawarehouse_dwd_db.dwd_us_user \nwhere \n reg_time!='' \ngroup by \n to_date(reg_time)").write.mode(SaveMode.Overwrite).saveAsTable("datawarehouse_dws_db.member_reg_num_day_tmp")
Java代码的话为：hiveContext.sql("select \n to_date(reg_time) date_time,\n count(1) num \nfrom \n rongrong_datawarehouse_dwd_db.dwd_rongrong_us_user \nwhere \n reg_time!='' \ngroup by \n to_date(reg_time)").write().mode(SaveMode.Overwrite).saveAsTable("rongrong_datawarehouse_dws_db.member_reg_num_day_tmp");

注：所遇到的问题上面Spark Sql的java版本同样适用。

三、踩坑之路

1. Required field ‘client_protocol’ is unset!

log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://node01:10000/heheda_ods_db: Could not establish connection to jdbc:hive2://node01:10000/heheda_ods_db: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=heheda_ods_db})
        at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:252)
        at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:247)
Caused by: java.sql.SQLException: Could not establish connection to jdbc:hive2://node01:10000/heheda_ods_db: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=heheda_ods_db})
        at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:734)
        at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:228)
        ... 4 more
Caused by: org.apache.thrift.TApplicationException: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=heheda_ods_db})
        at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
        at org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:176)
        at org.apache.hive.service.rpc.thrift.TCLIService$Client.OpenSession(TCLIService.java:163)
        at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:715)
        ... 5 more

原因：项目的 Pom 文件中 hive-jdbc 版本号没有与 hive Server 的version要保持一致

一开始的配置：
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>2.1.0</version>
        </dependency>
修改后的配置：
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>3.1.0</version>
        </dependency>

参考：
hive客户端连接服务端报错“Required field ‘client_protocol’ is unset! ”异常解决
 Required field ‘client_protocol‘ is unset 原因探究

小强签名设计

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
Hive 之 API

JDBC 连接hive2, 程序挺简单, 跟其他数据库查询类似, 连接/执行查询/得到结果：package com.huiq.bigdata.test;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.Statement;public class ApiQueryTest { private static String driverNam
复制链接

扫一扫