1 集群部署
1.1 集群环境
1.1.1 系统需求
Mac OS X or Linux(测试使用的Centos7.2)
Java 8 Update 92 or higher (8u92+), 64-bit(测试使用的1.8.0_121,64-bit)
1.1.2 组件版本
Presto版本0.172,下载链接。
Hadoop版本:Apache Hadoop2.6.4
Hive版本:Apache Hive 1.2.1
MongoDB版本:mongodb-linux-x86_64-rhel70-3.4.2
1.2 集群配置
1.2.1 软件部署
下载安装包至目录/opt/beh/core,解压缩,创建软连接
cd /opt/beh/core
tar zxf presto-server-0.172.tar.gz
ln -s presto-server-0.172 presto
cd presto
1.2.2 集群配置
创建配置目录,并且创建相关配置文件。
cd /opt/beh/core/presto
mkdir data
mkdir etc
cd etc
touch config.properties
touch jvm.config
touch node.properties
touch log.properties
备注:
Config Properties: configuration for the Presto server
JVM Config: command line options for the Java Virtual Machine
Node Properties: environmental configuration specific to each node
Catalog Properties: configuration for Connectors (data sources)
创建data目录对应的是Node Properties 的参数node.data-dir。
- config.properties
coordinator=true
discovery-server.enabled=true
discovery.uri=http://master:8080
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=60GB
query.max-memory-per-node=20GB
备注:
These properties require some explanation:
coordinator: Allow this Presto instance to function as a coordinator (accept queries from clients and manage query execution).
node-scheduler.include-coordinator: Allow scheduling work on the coordinator. For larger clusters, processing work on the coordinator can impact query performance because the machine’s resources are not available for the critical task of scheduling, managing and monitoring query execution.
http-server.http.port: Specifies the port for the HTTP server. Presto uses HTTP for all communication, internal and external.
query.max-memory: The maximum amount of distributed memory that a query may use.
query.max-memory-per-node: The maximum amount of memory that a query may use on any one machine.
discovery-server.enabled: Presto uses the Discovery service to find all the nodes in the cluster. Every Presto instance will register itself with the Discovery service on startup. In order to simplify deployment and avoid running an additional service, the Presto coordinator can run an embedded version of the Discovery service. It shares the HTTP server with Presto and thus uses the same port.
discovery.uri: The URI to the Discovery server. Because we have enabled the embedded version of Discovery in the Presto coordinator, this should be the URI of the Presto coordinator. Replace master:8080 to match the host and port of the Presto coordinator. This URI must not end in a slash.
- jvm.config
-server
-Xmx40G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
备注:
- node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-fffffffffff1
node.data-dir=/opt/beh/core/presto/data
备注:
The above properties are described below:
node.environment: The name of the environment. All Presto nodes in a cluster must have the same environment name.
node.id: The unique identifier for this installation of Presto. This must be unique for every node. This identifier should remain consistent across reboots or upgrades of Presto. If running multiple installations of Presto on a single machine (i.e. multiple nodes on the same machine), each installation must have a unique identifier.
node.data-dir: The location (filesystem path) of the data directory. Presto will store logs and other data here,Two softlink for directory “etc” and “plugin”, and var/run will store server pid file,var/log store log.
- log.properties
com.facebook.presto=INFO
com.facebook.presto.server=INFO
com.facebook.presto.hive=INFO
备注:
The default minimum level is INFO (thus the above example does not actually change anything). There are four levels: DEBUG, INFO, WARN and ERROR.
1.2.3 连接器配置
创建连接器配置目录,并且配置相关连接器配置
cd /opt/beh/core/presto/etc
mkdir catalog
cd catalog
touch hive.properties
touch jmx.properties
touch mongodb.properties
touch mysql.properties
备注:
- hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
hive.config.resources=/opt/beh/core/hadoop/etc/hadoop/core-site.xml,/opt/beh/core/hadoop/etc/hadoop/hdfs-site.xml
- jmx.properties
connector.name=jmx
jmx.dump-tables=java.lang:type=Runtime,com.facebook.presto.execution.scheduler:name=NodeScheduler
jmx.dump-period=10s
jmx.max-entries=86400
- mongodb.properties
connector.name=mongodb
mongodb.seeds=hadoop001:37025,hadoop002:37025,hadoop003:37025
- mysql.properties
connector.name=mysql
connection-url=jdbc:mysql://mysqlhost:3306
connection-user=mysqluser
connection-password=mysqlpassword
1.2.4 服务启动
- 环境变量配置
export PRESTO_HOME=/opt/beh/core/presto
export PATH=$PATH:$PRESTO_HOME/bin
命令行执行,或者添加到/opt/beh/conf/beh_env中
- 启动命令
cd /opt/beh/core/presto
./bin/launcher start
备注:
The installation directory contains the launcher script in bin/launcher. Presto can be started as a daemon by running the following:
bin/launcher start
Alternatively, it can be run in the foreground, with the logs and other output being written to stdout/stderr (both streams should be captured if using a supervision system like daemontools):
bin/launcher run
Run the launcher with --help to see the supported commands and command line options. In particular, the --verbose option is very useful for debugging the installation.
日志:
After launching, you can find the log files in var/log:
launcher.log: This log is created by the launcher and is connected to the stdout and stderr streams of the server. It will contain a few log messages that occur while the server logging is being initialized and any errors or diagnostics produced by the JVM.
server.log: This is the main log file used by Presto. It will typically contain the relevant information if the server fails during initialization. It is automatically rotated and compressed.
http-request.log: This is the HTTP request log which contains every HTTP request received by the server. It is automatically rotated and compressed.
2 命令行接口
下载命令行接口程序拷贝至/opt/beh/core/presto/bin:下载地址
cd /opt/beh/core/presto/bin
chmod -x presto-cli-0.172-executable.jar
ln -s presto-cli-0.172-executable.jar presto
测试连接:
./presto --server localhost:8080 --catalog hive --schema default
[hadoop@sparktest bin]$ ./presto --server localhost:8580 --catalog hive --schema default
presto:default> HELP
Supported commands:
QUIT
EXPLAIN [ ( option [, ...] ) ] <query>
options: FORMAT { TEXT | GRAPHVIZ }
TYPE { LOGICAL | DISTRIBUTED }
DESCRIBE <table>
SHOW COLUMNS FROM <table>
SHOW FUNCTIONS
SHOW CATALOGS [LIKE <pattern>]
SHOW SCHEMAS [FROM <catalog>] [LIKE <pattern>]
SHOW TABLES [FROM <schema>] [LIKE <pattern>]
SHOW PARTITIONS FROM <table> [WHERE ...] [ORDER BY ...] [LIMIT n]
USE [<catalog>.]<schema>
presto:default> SHOW CATALOGS;
Catalog
---------
hive
jmx
mongodb
mysql
system
(5 rows)
Query 20170418_121353_00035_yr3tu, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
presto:default> SHOW SCHEMAS FROM HIVE;
Schema
--------------------
default
information_schema
tmp
tpc100g
(4 rows)
Query 20170418_121409_00036_yr3tu, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:00 [4 rows, 55B] [43 rows/s, 601B/s]
presto:default> USE hive.tmp;
presto:tmp> show tables;
Table
-------------
date_dim
item
store_sales
(3 rows)
Query 20170418_121459_00040_yr3tu, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:00 [3 rows, 62B] [40 rows/s, 830B/s]
presto:tmp> select count(*) from item;
_col0
--------
204000
(1 row)
Query 20170418_121540_00041_yr3tu, FINISHED, 3 nodes
Splits: 20 total, 20 done (100.00%)
0:02 [204K rows, 11.8MB] [81.8K rows/s, 4.74MB/s]
presto:tmp> quit