Presto 单机和集群环境部署教程

闲人编程

已于 2024-08-11 10:04:28 修改

阅读量1.1k

点赞数 12

分类专栏：大数据文章标签：大数据部署集群单机 python presto

于 2024-08-11 09:42:56 首次发布

本文链接：https://blog.csdn.net/qq_42568323/article/details/141101572

版权

大数据专栏收录该内容

14 篇文章 0 订阅

订阅专栏

以下是 Presto 的单机和集群环境部署教程，以及部署过程中的注意事项和 Java、Python 的使用案例。

一、Presto 单机环境部署

1. 环境准备

操作系统：Linux（推荐 Ubuntu 20.04 或 CentOS 7）
Java：Presto 需要 Java 环境，推荐使用 OpenJDK 8 或 11。

2. 安装 Java

在 Ubuntu 中：

sudo apt update
sudo apt install openjdk-11-jdk

在 CentOS 中：

sudo yum install java-11-openjdk

验证 Java 安装：

java -version

3. 下载并解压 Presto

访问 Presto 官方网站下载最新版本的 Presto。

wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.276/presto-server-0.276.tar.gz
tar -xzvf presto-server-0.276.tar.gz
mv presto-server-0.276 /usr/local/presto

4. 配置 Presto

创建配置目录：
```
mkdir -p /usr/local/presto/etc
```

配置 node.properties 文件：

cat <<EOF > /usr/local/presto/etc/node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/usr/local/presto/data
EOF

配置 jvm.config 文件：

cat <<EOF > /usr/local/presto/etc/jvm.config
-server
-Xmx16G
-Xms16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
EOF

配置 config.properties 文件：

cat <<EOF > /usr/local/presto/etc/config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=4GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
EOF

配置 catalog/hive.properties 文件：

mkdir -p /usr/local/presto/etc/catalog
cat <<EOF > /usr/local/presto/etc/catalog/hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
EOF

5. 启动 Presto

启动 Presto 服务：
```
/usr/local/presto/bin/launcher start
```
验证 Presto 是否启动成功：

访问 Presto 的 web 界面：http://localhost:8080
停止 Presto 服务：
```
/usr/local/presto/bin/launcher stop
```

6. Presto 单机部署的注意事项

Java 版本：确保使用兼容的 Java 版本。
内存配置：根据机器的配置调整 jvm.config 文件中的内存参数。
配置文件路径：确保所有配置文件路径正确，特别是 hive.properties 文件中涉及的 Hadoop 配置文件路径。

二、Presto 集群环境部署

1. 环境准备

多台服务器：至少 2 台（推荐 3 台及以上）
操作系统：Linux（推荐 Ubuntu 20.04 或 CentOS 7）
Java：在所有节点上安装 Java

2. 配置 Presto 集群

2.1 安装 Presto

在每台服务器上安装 Presto（参考单机环境部署的步骤）。

2.2 配置 Coordinator 和 Worker 节点

Coordinator 节点配置：

在 config.properties 中：

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=4GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://coordinator:8080

Worker 节点配置：

在 config.properties 中：

coordinator=false
http-server.http.port=8080
query.max-memory-per-node=1GB
query.max-memory=4GB
discovery.uri=http://coordinator:8080

启动各节点：

在每个节点上执行：
```
/usr/local/presto/bin/launcher start
```

3. 验证 Presto 集群状态

访问 Coordinator 的 Web UI：

访问 http://coordinator:8080，可以查看集群中所有 Worker 节点的状态。

4. Presto 集群部署的注意事项

配置文件一致性：确保所有节点的配置文件正确，特别是 discovery.uri 指向正确的 Coordinator。
网络和端口：确保所有节点之间的网络连接正常，端口 8080 开放。
资源管理：根据每个节点的硬件配置合理分配内存和 CPU 资源。
监控和日志管理：通过监控工具（如 Prometheus、Grafana）和日志分析工具对 Presto 集群进行监控和故障排除。

三、Presto 使用案例

1. Java 示例：通过 JDBC 连接 Presto 执行查询

1.1 添加 Presto JDBC 依赖

在 pom.xml 中添加 Presto JDBC 依赖：

<dependency>
    <groupId>io.prestosql</groupId>
    <artifactId>presto-jdbc</artifactId>
    <version>0.276</version>
</dependency>

1.2 使用 Java 代码连接 Presto 并执行查询

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class PrestoJDBCExample {
    public static void main(String[] args) {
        String url = "jdbc:presto://localhost:8080/hive/default";
        String user = "user";
        String password = "";

        try (Connection connection = DriverManager.getConnection(url, user, password);
             Statement statement = connection.createStatement()) {

            String sql = "SELECT * FROM sample_table LIMIT 10";
            ResultSet resultSet = statement.executeQuery(sql);

            while (resultSet.next()) {
                System.out.println(resultSet.getString(1) + "\t" + resultSet.getString(2));
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

2. Python 示例：通过 `presto-python-client` 连接 Presto 执行查询

2.1 安装 `presto-python-client` 库

pip install presto-python-client

2.2 使用 Python 代码连接 Presto 并执行查询

import prestodb

# 创建 Presto 连接
conn = prestodb.dbapi.connect(
    host='localhost',
    port=8080,
    user='user',
    catalog='hive',
    schema='default',
)

# 创建游标
cursor = conn.cursor()

# 执行查询
cursor.execute('SELECT * FROM sample_table LIMIT 10')

# 获取查询结果
rows = cursor.fetchall()

# 打印结果
for row in rows:
    print(row)