java通过JDBC连接hiveserver

最新推荐文章于 2024-05-05 16:40:24 发布

nickname_oo

最新推荐文章于 2024-05-05 16:40:24 发布

阅读量467

点赞数

分类专栏： java 数据库

java 同时被 2 个专栏收录

38 篇文章 1 订阅

订阅专栏

数据库

2 篇文章 0 订阅

订阅专栏

最近需要用到这个，先放这里吧，用的时候可以直接用了

1、hive依赖hadoop，将hdfs当作文件存储介质，那是否意味着hive需要知道namenode的地址？

实际上在hive的hive-env.sh 中配置了 HADOOP_HOME=/home/install/hadoop-2.5.1

2、hive的本地模式和远程模式有什么区别？

hive本质上是将sql语法解析为mapreduce的过程，既然如此它就必须提交mapreduce任务到resoucemanager，那么它如何提交？就是通过hadoop提供的命令hadoop jar命令来提交。

本地模式：简单的理解，hive客户端仅供本地使用，直接使用hive命令，不需要指定IP 端口

远程模式：简单的理解，将hive发布成一个服务进程，通过hive --service hiveserver命令，那么其他hive客户端就可以连接hive的服务进程

其他客户端可以是jdbc方式、hive提供的beeline命令等，既然要连接远端的hive服务进程，那自然需要指定 IP 端口，这里的IP指的是hive服务进程所在的IP，端口自然也是，也自然与hadoop无关。所以不要混淆

HiveServer2提供了一个新的命令行工具Beeline，它是基于SQLLine CLI的JDBC客户端。

Beeline工作模式有两种，即本地嵌入模式和远程模式。嵌入模式情况下，它返回一个嵌入式的Hive（类似于Hive CLI）。而远程模式则是通过Thrift协议与某个单独的HiveServer2进程进行连接通信。

hive的三种连接方式

1、hive 命令行模式，直接输入/hive/bin/hive的执行程序，或者输入 hive --service cli
       用于linux平台命令行查询，查询语句基本跟mysql查询语句类似
2、 hive web界面的 (端口号9999) 启动方式
       hive –service hwi &
       用于通过浏览器来访问hive，感觉没多大用途
3、 hive 远程服务 (端口号10000) 启动方式
       hive --service hiveserver & 
       或者
       hive --service hiveserver 10000>/dev/null 2>/dev/null &
    beeline方式连接：beeline -u jdbc:hive2//localhost:10000/default -n root -p 123456 
    或者
    java client方式连接
备注：
       连接Hive JDBC URL：jdbc:hive://192.168.6.116:10000/default     （Hive默认端口：10000  默认数据库名：default）

第一步：开启hive 远程服务

 
  bin/hive --service hiveserver -p  
  10002 
 
 
     Starting Hive Thrift Server 
 

   第二步：添加依赖 
 

    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common --> 
  
    <dependency> 
  
        <groupId>org.apache.hadoop</groupId> 
  
        <artifactId>hadoop-common</artifactId> 
  
        <version>2.7.1</version> 
  
    </dependency> 
  
     
  
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client --> 
  
    <dependency> 
  
        <groupId>org.apache.hadoop</groupId> 
  
        <artifactId>hadoop-client</artifactId> 
  
        <version>2.7.1</version> 
  
    </dependency> 
 

   <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc --> 
  
    <dependency> 
  
        <groupId>org.apache.hive</groupId> 
  
        <artifactId>hive-jdbc</artifactId> 
  
        <version>1.2.1</version> 
  
    </dependency> 
  
     
  
    <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-metastore --> 
  
    <dependency> 
  
        <groupId>org.apache.hive</groupId> 
  
        <artifactId>hive-metastore</artifactId> 
  
        <version>1.2.1</version> 
  
    </dependency> 
  
     
  
    <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-metastore --> 
  
    <dependency> 
  
        <groupId>org.apache.hive</groupId> 
  
        <artifactId>hive-exec</artifactId> 
  
        <version>1.2.1</version> 
  
    </dependency> 
  
    
 

     
 

   第三步： 
 
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
 
public class HiveJdbcTest {
     
    private static String driverName =
                   "org.apache.hadoop.hive.jdbc.HiveDriver";
   
    public static void main(String[] args)
                            throws SQLException {
        try {
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
            System.exit(1);
        }
 
        Connection con = DriverManager.getConnection(
                           "jdbc:hive2://localhost:10002/default", "wyp", "");
        Statement stmt = con.createStatement();
        String tableName = "wyphao";
        stmt.execute("drop table if exists " + tableName);
        stmt.execute("create table " + tableName +
                                     " (key int, value string)");
        System.out.println("Create table success!");
        // show tables
        String sql = "show tables '" + tableName + "'";
        System.out.println("Running: " + sql);
        ResultSet res = stmt.executeQuery(sql);
        if (res.next()) {
            System.out.println(res.getString(1));
        }
 
        // describe table
        sql = "describe " + tableName;
        System.out.println("Running: " + sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(res.getString(1) + "\t" + res.getString(2));
        }
 
 
        sql = "select * from " + tableName;
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(String.valueOf(res.getInt(1)) + "\t"
                                               + res.getString(2));
        }
 
        sql = "select count(1) from " + tableName;
        System.out.println("Running: " + sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(res.getString(1));
        }
    }
}
下面说一下hive sql的DDL一：hive建表语句 
create table page_view  
(  
page_id bigint comment '页面ID',  
page_name string comment '页面名称',  
page_url string comment '页面URL'  
)  
comment '页面视图'  
partitioned by (ds string comment '当前时间，用于分区字段')  
row format delimited  
stored as rcfile  
location '/user/hive/test'; 
 
这里需要说下stored as 关键词，hive目前支持三种方式: 1:就是最普通的textfile，数据不做压缩，磁盘开销大，解析开销也大 2:SquenceFIle,hadoop api提供的一种二进制API方式，其具有使用方便、可分割、可压缩等特点。 3:rcfile行列存储结合的方式，它会首先将数据进行分块，保证同一个record在一个分块上，避免读一次记录需要读多个块。其次块数据列式存储，便于数据存储和快速的列存取。 RCFILE由于采用是的列式存储，所以加载时候开销较大，但具有很好的查询响应、较好的压缩比。 如果建立的表需要加上分区，则语句如下: 这里partitioned by 表示按什么字段进行分割，通常来说是按时间 
 
    create table test_ds  
(  
  id int comment '用户ID',  
  name string comment '用户名称'  
)  
comment '测试分区表'  
partitioned by(ds string comment '时间分区字段')  
clustered by(id) sorted by(name) into 32 buckets  
row format delimited   
fields terminated by '\t'  
stored as rcfile;  
 
   
 
如果需要对某些字段进行聚类存储，方便对hive集群列进行采样，则应该这样编写SQL:
 
     
create table test_ds  
(  
  id int comment '用户ID',  
  name string comment '用户名称'  
)  
comment '测试分区表'  
partitioned by(ds string comment '时间分区字段')  
clustered by(id) sorted by(name) into 32 buckets      
row format delimited   
fields terminated by '\t'  
stored as rcfile;  
 这里表示将id按照name进行排序，聚类汇总，然后分区划分到32个散列桶中。
如果想改变表在hdfs中的位置，则应该使用location字段显式的指定:
 
     
create table test_another_location  
(  
   id int,   
   name string,  
   url string  
)  
comment '测试另外一个位置'  
row format delimited  
fields terminated by '\t'  
stored as textfile  
location '/tmp/test_location';  
 其中/tmp/test_location可不必先创建

https://www.iteblog.com/archives/846.html

nickname_oo

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java通过JDBC连接hiveserver

最近需要用到这个，先放这里吧，用的时候可以直接用了1、hive依赖hadoop，将hdfs当作文件存储介质，那是否意味着hive需要知道namenode的地址？实际上在hive的hive-env.sh 中配置了 HADOOP_HOME=/home/install/hadoop-2.5.12、hive的本地模式和远程模式有什么区别？ hive本质上是将sql语法解析为mapredu...
复制链接

扫一扫

专栏目录