1.概述
Presto 只使用:hive中的 data 和 metadata(hive Metastore)。不使用HiveQL或Hive执行环境的任何部分。
2.支持的文件类型
Hive connector 支持以下文件类型:
- ORC
- Parquet
- Avro
- RCText (RCFile using ColumnarSerDe)
- RCBinary (RCFile using LazyBinaryColumnarSerDe)
- SequenceFile
- JSON (using org.apache.hive.hcatalog.data.JsonSerDe)
- CSV (using org.apache.hadoop.hive.serde2.OpenCSVSerde)
- TextFile
3.Avro类型Metastore配置
为了在使用 Hive3.x 时对 Avro 表提供一流的支持。需要将以下属性定义添加到 Hive metastore 配置文件(hive-site.xml)中(并重新启动metastore服务):
<property>
<!-- https://community.hortonworks.com/content/supportkb/247055/errorjavalangunsupportedoperationexception-storage.html -->
<name>metastore.storage.schema.reader.impl</name>
<value>org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader</value>
</property>
登录Ambari平台进行设置:
添加完成之后进行重启Hive。
4.支持表类型
事务表和ACID表。
连接到 Hive metastore 3.x 版本, Hive connector 支持只读和写入插入表和ACID表,完全支持分区和分桶操作。ACID表支持行级删除。
不支持使用 Hive Streaming Ingest 创建的ACID表。
5.配置
cd /opt/presto-server-345
vim etc/catalog/hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://example.net:9083
bin/launcher restart
6.操作
6.1 hive
su - hive
cat user.txt
1,abc,25,13188888888888
2,test,30,13888888888888
3,adsfsadf,34,899314121
hadoop fs -put /hive/user.txt /tmp
hive
create database hivetest;
use hivetest;
create table tgm_test
(id int,
name string,
age string,
tel string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
load data inpath 'user.txt' into table tgm_test;
select * from tgm_test;
+--------------+----------------+---------------+-----------------+
| tgm_test.id | tgm_test.name | tgm_test.age | tgm_test.tel |
+--------------+----------------+---------------+-----------------+
| 1 | abc | 25 | 13188888888888 |
| 2 | test | 30 | 13888888888888 |
| 3 | adsfsadf | 34 | 899314121 |
+--------------+----------------+---------------+-----------------+
6.2 Presto
./presto-cli --server manager.presto:8080 --catalog hive
presto> show schemas from hive;
Schema
--------------------
default
hivetest
information_schema
(3 rows)
Query 20201117_011102_00000_b3d8m, FINISHED, 3 nodes
Splits: 53 total, 53 done (100.00%)
1.22 [3 rows, 48B] [2 rows/s, 39B/s]
presto> show tables from hive.hivetest;
Table
----------
tgm_test
(1 row)
Query 20201117_011151_00002_b3d8m, FINISHED, 3 nodes
Splits: 53 total, 53 done (100.00%)
0.49 [1 rows, 26B] [2 rows/s, 53B/s]
presto> desc hive.hivetest.tgm_test;
Column | Type | Extra | Comment
--------+---------+-------+---------
id | integer | |
name | varchar | |
age | varchar | |
tel | varchar | |
(4 rows)
Query 20201117_011709_00004_b3d8m, FINISHED, 3 nodes
Splits: 53 total, 53 done (100.00%)
0.24 [4 rows, 260B] [16 rows/s, 1.05KB/s]
presto> select * from hive.hivetest.tgm_test;
Query 20201117_011207_00003_b3d8m failed: Failed to list directory: hdfs://manager93.bigdata:8020/warehouse/tablespace/managed/hive/hivetest.db/tgm_test
# 注意:该问题是由于远程Ambari集群,HDFS路径文件没有访问权限,设置为:777权限,可以执行查询。
# 设想远程Ambari集群,开启Ranger权限控制也可以授权presto用户进行操作。
presto> select * from hive.hivetest.tgm_test;
id | name | age | tel
----+----------+-----+----------------
1 | abc | 25 | 13188888888888
2 | test | 30 | 13888888888888
3 | adsfsadf | 34 | 899314121
(3 rows)
Query 20201117_012244_00005_b3d8m, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0.36 [3 rows, 73B] [8 rows/s, 203B/s]