1. connector
在presto中,可以对接多种类型的数据源,今天以http 服务器数据为例,简单介绍如何接入presto。
2. 搭建http数据数据源
2.1 http数据源的schema
在http服务器上,提供一个文件,文件内容是数据源的格式。 一个文件是json格式,顶层是schema的名称,schema类似数据的database。schema之下是一个表的list。每张表要提供列的名称和类型,以及数据的地址,即http地址,见一个样例:
{
"schema":[{
"name":"table1",
"columns":[
{
"name":"key1",
"type":"bigint"
},
{
"name":"key2",
"type":"varchar"
}
],
"sources":[
"http://localhost:9080/data.csv"
]
}
]
}
2.2 提供数据:
http数据是一个csv格式,例如上文提到的data.csv的内容是:
10,b
1,d
2.3 配置presto
接下来配置presto,使得presto知道http 数据源的存在,创建文件etc/catalog/http.properties ,在文件中指定schema的地址:
connector.name=example-http
metadata-uri=http://localhost:9080/schema.json
2.4 查看查询效果:
2.4.1 展示http catalog中的schema
presto> show schemas from http;
Schema
--------------------
information_schema
schema
(2 rows)
Query 20180510_030439_00002_58j4x, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [2 rows, 34B] [15 rows/s, 263B/s]
2.4.2 展示http catalog的schema库中的表内容
presto> show tables from http.schema;
Table
--------
table1
(1 row)
Query 20180510_030453_00003_58j4x, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [1 rows, 22B] [4 rows/s, 108B/s]
2.4.3 展示表的格式
presto> describe http.schema.table1;
Column | Type | Extra | Comment
--------+---------+-------+---------
key1 | bigint | |
key2 | varchar | |
(2 rows)
Query 20180510_030507_00004_58j4x, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [2 rows, 123B] [9 rows/s, 603B/s]
2.4.4 获取表的数据
presto> select * from http.schema.table1;
Query 20180510_031258_00005_58j4x, FAILED, 1 node
Splits: 17 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20180510_031258_00005_58j4x failed: For input string: "a"
presto> select * from http.schema.table1;
key1 | key2
------+------
10 | b
1 | d
(2 rows)
Query 20180510_031315_00006_58j4x, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [2 rows, 0B] [41 rows/s, 0B/s]