写在前面
本文内容基本来源于官网,然后我结合我自己实际工作中的一些场景 把我认为 重要 且常用 的一些操作给记录下来了。目前presto是拆分为了两个项目 prestodb 和 prestosql,有兴趣的小伙伴可以看一下,这里选用的是prestodb。
如果有任何不对的地方,欢迎小伙伴指正!笔芯~~
咱们开始把 !
#01 概述
不是数据库!()
Presto是一种分布式高效 大量数据的即席查询工具,简单来说就是对于 TB 或 PB 级数据,能够比较快的得到查询结果。
数据可以是传统关系型数据库,也可以是Hive表数据。
也可以在Join多个 不同的库的数据。
是Facebook开源和社区共同维护发展的。
目前在公司的场景中,对于一些临时的数据需求,可以很快方便的得到查询结果,
这里是区别于Kylin这个框架,Kylin是提前把所有维度的情况的结果 提前计算好,需要使用的时候,直接查询的是结果 所以会比较快。
#02 核心概念
There are two types of Presto servers: coordinators and workers
1.Coordinators# :协调器 主节点
Coordinators communicate with workers and clients using a REST API.
用途:
0.接收客户端提交过来的语句
1.parsing statements(解析语句)
2.planning queries (生成查询计划)
3.managing Presto worker nodes(管理从节点)
具体:
1.creates a logical model(query involving a series of stages )
创建一个逻辑模型(涉及一系列的 stages )
2.translated into a series of connected tasks running on a cluster of Presto workers
转化为一系列的 tasks 运行在 worker 节点上
3.fetching results from the workers and returning the final results to the client
2.Worker# :工作节点 从节点
Workers communicate with other workers and Presto coordinators using a REST API.
用途:
executing tasks and processing data.
starts up,启动从节点之后,它会自爆自己使得主节点发现它 ,这样主节点才能管理到该从节点
具体:
- fetch data from connectors and exchange intermediate(中间) data with each other
3.Data Sources
整个Presto的数据源模型有以下概念:
connector, catalog, schema, and table.
4.Connector
1.You can think of a connector the same way you think of a driver for a database
(简单可以类比为比如MySQL数据库的驱动程序)
2.Presto contains several built-in connectors
(内置连接器)
3.Many third-party developers have contributed connectors so that Presto can access data in a variety of data sources.
(第三方开发的connectors )
4.Every catalog is associated with a specific connector.
catalog configuration file(contains property connector.name )
你会发现每个catalog的配置文件都会包含 connector.name 这个属性
---------------pro.college_pro.test_table-------------
5.Catalog
catalog contains schemas and references a data source via a connector
包含 schemas 并且通过 连接器反射 得到数据库的数据
Schema/Table
When a statement is executed, Presto creates a query along with a query plan that is then distributed across a series of Presto workers.
#03 Data Types
1.Boolean
values true and false.
2.Integer
TINYINT ==> minimum value of -2^7 and a maximum value of 2^7 - 1
SMALLINT ==> minimum value of -2^15 and a maximum value of 2^15 - 1
INTEGER/INT ==> minimum value of -2^31 and a maximum value of 2^31 - 1
BIGINT ==> minimum value of -2^63 and a maximum value of 2^63 - 1
3.Floating-Point 浮点型
REAL 不常用
DOUBLE
4.Fixed-Precision 固定精度
DECIMAL -> A fixed precision decimal number. Precision up to 38 digits is supported but performance is best up to 18 digits.
DECIMAL(10,3), DECIMAL(20) 总位数 小数后面几位
5.String
VARCHAR -> Variable length character data
CHAR 不常用
VARBINARY -> Variable length binary data.
6.JSON
JSON value type, which can be a JSON object, a JSON array, a JSON number, a JSON string, true, false or null.
7.Date and Time
DATE -> DATE '2001-08-22'
TIME -> TIME '01:02:03.456'
TIME WITH TIME ZONE -> TIME '01:02:03.456 America/Los_Angeles'
TIMESTAMP -> TIMESTAMP '2001-08-22 03:04:05.321'
TIMESTAMP WITH TIME ZONE -> TIMESTAMP '2001-08-22 03:04:05.321 America/Los_Angeles'
INTERVAL '3' MONTH
INTERVAL '2' DAY
8.Structural
ARRAY -> ARRAY[1, 2,