第一:Presto基本介绍
- Facebook开源的
- 完全基于内存的分布式SQL查询引擎
- 不支持存储,只包含计算引擎
- 支持任意数据源
- MPP架构
- 多线程处理优
- 优化的ORC和Parquet Reader
- 运行环境需要Hadoop(hdfs)和Hive(metastore)
- …
第二:Presto基本架构
- 数据处理流程
Client–Presto–MetaStore,HDFS - 三种服务
- coordinate
- 解析SQL语句
- 生成执行计划
- 分发执行任务给Worker节点执行
- Worker
- 负责实际执行查询任务
- Discovery service
- Worker节点启动后向Discovery Service服务注册
- Coordinator从Discovery Service获得Worder节点
- coordinate
- 通过connector plugin获取数据和元信息
- Presto不是一个数据
- Presto为其它数据存储系统提供了SQL能力
- Presto支持的数据源和存储格式
- Hadoop/Hive connector与存储格式
- HDFS
- ORC,RCFILE,Parquet,SequenceFile,TextFile
- 开源数据存储系统
- Mysql&PostgreSQL
- Cassandra
- Kafka
- Redis
- 其它
- MongoDB
- ElasticSearch
- HBase
- Hadoop/Hive connector与存储格式
第三:Presto部署
1.环境准备
- Linux
- HDFS
- Java8,64-bit
- Python 2.4+
- Hive Connector
2.解压安装包
[hadoop@hadoopa ~]$ tar -zxvf presto-server-0.157.tar.gz
3.修改配置文档
- hive.properties
[hadoop@hadoopa catalog]$ pwd
/home/hadoop/presto-server-0.157/etc/catalog
[hadoop@hadoopa catalog]$ ls
hive.properties
[hadoop@hadoopa catalog]$ cat hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hadoopA:9083
hive.config.resources=/home/hadoop/hadoop-2.7.3/etc/hadoop/core-site.xml,=/home/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml
- config.properties
[hadoop@hadoopa etc]$ pwd
/home/hadoop/presto-server-0.157/etc
[hadoop@hadoopa etc]$ cat config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=512MB
query.max-memory-per-node=512MB
discovery-server.enabled=true
discovery.uri=http://hadoopA:8080
- jvm.config
[hadoop@hadoopa etc]$ pwd
/home/hadoop/presto-server-0.157/etc
[hadoop@hadoopa etc]$ cat jvm.config
-server
-Xmx1G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
- node.properties
[hadoop@hadoopa etc]$ pwd
/home/hadoop/presto-server-0.157/etc
[hadoop@hadoopa etc]$ cat node.properties
node.environment=production
node.id=hadoopA
node.data-dir=/home/hadoop/presto-server-0.157/presto_data
4.拷贝presto客户端jar包(presto-cli-0.157-executable.jar)至presto目录。并重命名为presto
[hadoop@hadoopa bin]$ pwd
/home/hadoop/presto-server-0.157/bin
[hadoop@hadoopa bin]$ ls
launcher launcher.properties launcher.py lejfowl presto procname
5.增加presto其执行权限
[hadoop@hadoopa bin]$ chmod a+x presto
6.启动Presto
[hadoop@hadoopa bin]$ ./launcher start
7.连接Presto
[hadoop@hadoopa bin]$ presto --server localhost:8080 --catalog hive --schema default
第四:项目实践
一、项目介绍
有三张表(brand_dimension_orc,user_dimension_orc,record_orc)以ORC格式存储在Hive中。现在使用Presto进行一些常规查询
presto:default> show tables;
Table
---------------------
brand_dimension
brand_dimension_orc
demo2
employees
employees_part
record
record_orc
user_dimension
user_dimension_orc
(9 rows)
Query 20170315_044955_00012_ewrag, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:01 [9 rows, 263B] [9 rows/s, 285B/s]
二、查询一些数据
- 统计各年龄段用户消费总额
presto:default> select cast((year(CURRENT_DATE)-year(birth)) as integer) as age,sum(price) as totalPrice
-> from record join user_dimension on record.uid=user_dimension.uid
-> group by cast((year(CURRENT_DATE)-year(birth)) as integer)
-> order by totalPrice desc;
age | totalPrice
-----+------------
47 | 11499
44 | 9390
10 | 8658
15 | 8168
38 | 7517
16 | 7475
40 | 7112
11 | 6884
45 | 6875
29 | 6235
3 | 6094
23 | 6072
19 | 6048
22 | 5929
2 | 5728
31 | 5639
12 | 5576
30 | 5391
7 | 5353
21 | 5196
24 | 5050
13 | 4860
36 | 4816
41 | 4815
14 | 4755
42 | 4597
34 | 4528
8 | 4228
4 | 4195
20 | 4187
37 | 4017
27 | 3988
17 | 3830
43 | 3812
26 | 3297
32 | 3247
33 | 3208
9 | 3192
18 | 3060
39 | 3057
35 | 2467
6 | 2161
25 | 2072
46 | 1906
28 | 1477
5 | 1463
0 | 826
1 | 641
(48 rows)
- 查询各品牌销售总额
presto:default> select brand,sum(price) as totalPrice
-> from record join brand_dimension on record.bid=brand_dimension.bid
-> group by brand_dimension.brand
-> order by totalPrice desc;
brand | totalPrice
------------+------------
SKYWORTH | 11992
SAMSUNG | 10240
YILI | 9872
TCL | 6741
OLAY | 6442
MEIZU | 6345
ASUS | 5705
PEAK | 5431
APPLE | 5213
MOUTAI | 4772
SHARP | 4721
PEACEBIRD | 4680
MIZUNO | 4599
DHC | 4585
NIULANSHAN | 4582
CAMEL | 4569
NIKE | 4358
SEPTWOLVES | 4345
OPPO | 4306
NB | 4237
KAPPA | 4092
ZARA | 4068
GUANGMING | 4054
HP | 4043
HISENSE | 3995
HLA | 3963
HUAWEI | 3927
KANS | 3884
LANGJIU | 3857
NIVEA | 3579
LINING | 3559
CLINIQUE | 3552
LENOVO | 3534
PUMA | 3531
HTC | 3405
GXG | 3322
UNIQLO | 3271
HAIER | 3106
LOREAL | 2948
WULIANGYE | 2912
ADIDAS | 2847
MOTOROLA | 2819
VIVO | 2809
DELL | 2785
SANYUAN | 2776
LANCOME | 2714
SELECTED | 2633
INNISFREE | 2452
SONY | 2353
ACER | 2339
XIAOMI | 2260
HONGXING | 2113
ANTA | 1990
MENGNIU | 1776
IPHONE | 1628
SEMIR | 1589
PHILIPS | 1205
361 | 718
MEIFUBAO | 448