##修改源码支持Hive view
参考Add Support for Compatible Hive Views #9031
修改以下类,支持简单hive view的query与create
HiveMetadata.java
HiveUtil.java
ThriftHiveMetastore.java
MetadataManager.java
LocalQueryRunner.java
ConnectorViewDefinition.java
修改文件后
mvn clean install DskipTests
重启服务,即可通过CLI命令行查询简单hive view。
./presto --server localhost:8080 --catalog hive --schema default
备注
(1)直接在IDEA通过run
presto/presto-cli/target/presto-cli-0.192-SNAPSHOT-executable.jar
查询会报.
(2)通过CLI命令行启动一次catalog hive 摄入一次元数据
./presto --server localhost:8080 --catalog hive --schema default
SELECT * FROM hive.hivedb.tabe_view;
Query 20180804_100950_00004_r42jc failed: Table 'hivedb.tabe_view' not found
##从presto向Hive建立View,然后查询
(1)presto向Hive建立View
presto:default> create view hive.hive_test.student_view_createdby_presto as select id, count(1) as cnt_num from hive.hive_test.student group by id;
CREATE VIEW
(2)Hive侧检查建立成功
hive> show tables;
OK
student_view_createdby_presto
...
(3)由presto查询View
presto:default> select
-> * from hive.hive_test.student_view_createdby_presto;
id | cnt_num
------+---------
1003 | 1
...
(4)执行计划.
presto:default> explain select * from hive.hive_test.student_view_createdby_presto;
- Output[id, cnt_num] => [id:integer, count:bigint]
cnt_num := count
- RemoteExchange[GATHER] => id:integer, count:bigint
- Project[] => [id:integer, count:bigint]
- Aggregate(FINAL)[id][$hashvalue] => [id:integer, $hashvalue:bigint, count:bigint]
count := "count"("count_19")
- LocalExchange[HASH][$hashvalue] ("id") => id:integer, count_19:bigint, $hashvalue:bigint
- RemoteExchange[REPARTITION][$hashvalue_20] => id:integer, count_19:bigint, $hashvalue_20:bigint
- Aggregate(PARTIAL)[id][$hashvalue_21] => [id:integer, $hashvalue_21:bigint, count_19:bigint]
count_19 := "count"(*)
- ScanProject[table = hive:hive_test:student, originalConstraint = true] => [id:integer, $hashvalue_21:bigint]
Cost: {rows: 0, bytes: ?}/{rows: 0, bytes: ?}
$hashvalue_21 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("id"), 0))
LAYOUT: com.facebook.presto.hive.HiveTableLayoutHandle@515d54e8
id := HiveColumnHandle{name=id, hiveType=int, hiveColumnIndex=0, columnType=REGULAR}.
(5) 查询本身是select * ,看执行计划怎么是 * 中每个列是按照view中定义的方式来查询(即在presto做内存查询,如cnt_num—>count(1),不是在hive侧计算),具体原因在于,查询本身类似于用建立view的sql替换了,从hive中实际拉取的依然是行列数据
Analyzer::analyze(Statement statement, boolean isDescribe)
analyzer.analyze(rewrittenStatement, Optional.empty())
--->....--->visitQuerySpecification::analyzeFrom(node, scope)---> ....
--->visitTable
Optional<ViewDefinition> optionalView = metadata.getView(session, name)
ViewDefinition中建立view时的sql
SELECT
id
, "count"(1) cnt_num
FROM
hive.hive_test.student
GROUP BY id
(6)问题
(1)后面基于这个viewsql处理。。。,相当于基于 viewsql这个查询,因此最后在connector中是先抽取出 viewsql 拿出来所有列到对应的数据源拉取行列数据。。。
(2)同时这里也暴露出一个问题,基于这个viewsql处理,也就是和presto sql 不一致的HSQL,是表达不出来的,会在后面的分析过程就报错。