[21]-Presto Hive view Query

最新推荐文章于 2024-05-30 10:28:24 发布

hjw199089

最新推荐文章于 2024-05-30 10:28:24 发布

阅读量2.9k

点赞数

分类专栏： [23]Presto

本文链接：https://blog.csdn.net/hjw199089/article/details/81415444

版权

[23]Presto 专栏收录该内容

36 篇文章 3 订阅

订阅专栏

##修改源码支持Hive view
参考Add Support for Compatible Hive Views #9031

修改以下类，支持简单hive view的query与create

HiveMetadata.java
HiveUtil.java
ThriftHiveMetastore.java
MetadataManager.java
LocalQueryRunner.java
ConnectorViewDefinition.java

修改文件后

mvn clean install DskipTests

重启服务，即可通过CLI命令行查询简单hive view。
./presto --server localhost:8080 --catalog hive --schema default
备注
（1）直接在IDEA通过run

 presto/presto-cli/target/presto-cli-0.192-SNAPSHOT-executable.jar

查询会报.
（2）通过CLI命令行启动一次catalog hive 摄入一次元数据
./presto --server localhost:8080 --catalog hive --schema default

SELECT * FROM hive.hivedb.tabe_view;
Query 20180804_100950_00004_r42jc failed: Table 'hivedb.tabe_view' not found

##从presto向Hive建立View，然后查询

(1)presto向Hive建立View

presto:default> create view hive.hive_test.student_view_createdby_presto as select id, count(1) as cnt_num from hive.hive_test.student group by id;
CREATE VIEW

(2)Hive侧检查建立成功

hive> show tables;
OK
student_view_createdby_presto
...

(3)由presto查询View

presto:default> select
             -> * from hive.hive_test.student_view_createdby_presto;
  id  | cnt_num
------+---------
 1003 |       1
...

(4)执行计划.

presto:default> explain select * from hive.hive_test.student_view_createdby_presto;

 - Output[id, cnt_num] => [id:integer, count:bigint]
         cnt_num := count
     - RemoteExchange[GATHER] => id:integer, count:bigint
         - Project[] => [id:integer, count:bigint]
             - Aggregate(FINAL)[id][$hashvalue] => [id:integer, $hashvalue:bigint, count:bigint]
                     count := "count"("count_19")
                 - LocalExchange[HASH][$hashvalue] ("id") => id:integer, count_19:bigint, $hashvalue:bigint
                     - RemoteExchange[REPARTITION][$hashvalue_20] => id:integer, count_19:bigint, $hashvalue_20:bigint
                         - Aggregate(PARTIAL)[id][$hashvalue_21] => [id:integer, $hashvalue_21:bigint, count_19:bigint]
                                 count_19 := "count"(*)
                             - ScanProject[table = hive:hive_test:student, originalConstraint = true] => [id:integer, $hashvalue_21:bigint]
                                     Cost: {rows: 0, bytes: ?}/{rows: 0, bytes: ?}
                                     $hashvalue_21 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("id"), 0))
                                     LAYOUT: com.facebook.presto.hive.HiveTableLayoutHandle@515d54e8
                                     id := HiveColumnHandle{name=id, hiveType=int, hiveColumnIndex=0, columnType=REGULAR}.

(5) 查询本身是select * ，看执行计划怎么是 * 中每个列是按照view中定义的方式来查询（即在presto做内存查询，如cnt_num—>count(1)，不是在hive侧计算），具体原因在于，查询本身类似于用建立view的sql替换了，从hive中实际拉取的依然是行列数据

Analyzer::analyze(Statement statement, boolean isDescribe)
analyzer.analyze(rewrittenStatement, Optional.empty()) 
--->....--->visitQuerySpecification::analyzeFrom(node, scope)---> ....
--->visitTable

Optional<ViewDefinition> optionalView = metadata.getView(session, name)

ViewDefinition中建立view时的sql
SELECT
  id
, "count"(1) cnt_num
FROM
  hive.hive_test.student
GROUP BY id

(6)问题
（1）后面基于这个viewsql处理。。。，相当于基于 viewsql这个查询，因此最后在connector中是先抽取出 viewsql 拿出来所有列到对应的数据源拉取行列数据。。。
（2）同时这里也暴露出一个问题，基于这个viewsql处理，也就是和presto sql 不一致的HSQL，是表达不出来的，会在后面的分析过程就报错。

hjw199089

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
[21]-Presto Hive view Query

参考Add Support for Compatible Hive Views #9031修改以下类，支持简单hive view的query与createHiveMetadata.javaHiveUtil.javaThriftHiveMetastore.javaMetadataManager.javaLocalQueryRunner.javaConnectorViewDefini...
复制链接

扫一扫

专栏目录