presto-mongodb-jupyter环境搭建

最新推荐文章于 2023-01-03 17:09:23 发布

ghostyusheng

最新推荐文章于 2023-01-03 17:09:23 发布

阅读量561

点赞数

分类专栏：运维文章标签： devops 运维

本文链接：https://blog.csdn.net/ghostyusheng/article/details/104824241

版权

运维专栏收录该内容

18 篇文章 0 订阅

订阅专栏

presto配置

1.presto 安装包下载
https://prestosql.io/download.html

2.cd presto-server-xxx/etc
3.mkdir catalog
4.确保存在以下文件，没有就创建
(192.168.201.31 换成自己主结点的 IP)

FILE: jvm.config

-server
-Xmx20G
-XX:+UseConcMarkSweepGC
-XX:+ExplicitGCInvokesConcurrent
-XX:+CMSClassUnloadingEnabled
-XX:+AggressiveOpts
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:ReservedCodeCacheSize=150M
-XX:CMSInitiatingOccupancyFraction=70

FILE:log.properties

com.facebook.presto=INFO

FILE:config.properties

主结点机器：
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=30GB
query.max-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://192.168.201.31:8080

从结点机器:
coordinator=false
http-server.http.port=8080
query.max-memory=30GB
query.max-memory-per-node=2GB
discovery.uri=http://192.168.201.31:8080

FILE:node.properties

node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/root/soft/presto                 # 确保这个目录存在

FILE: catalog/jmx.properties

connector.name=jmx

FILE: catalog/mongodb.properties

connector.name=mongodb
mongodb.seeds=192.168.201.40:27017
mongodb.schema-collection=ods
mongodb.credentials=用户@密码@admin

5.~/soft/presto-server-0.149/bin/launcher start

细节参数，自己看文档微调.

jupyter安装

pip3 install --upgrade jupyter matplotlib numpy pandas scipy scikit-learn jupyter_contrib_nbextensions

jupyter notebook

presto client组件(jupyter在master结点)

1.pip3 install presto-python-client
2.用这个类debug，具体的话，看文档 https://github.com/prestodb/presto-python-client

import prestodb

class Presto:
    
    conn=prestodb.dbapi.connect(
        host='127.0.0.1',
        port=8080,
        user='root',
        catalog='mongodb',
        schema='ods',          ### 对应你mongodb的数据库
    )
    
    statistics_conn = prestodb.dbapi.connect(
        host='127.0.0.1',
        port=8080,
        user='root',
        catalog='mongodb',
        schema='app',
    )
    eng = None
    
    @classmethod
    def query(cls, sql):
        print(sql)
        cur = Presto.conn.cursor()
        cur.execute(sql)
        return cur.fetchall()

补充：presto拉大量数据(mongo)慢的解决方案

慢是因为你拉大量数据，内网传输慢，所以可以压缩mongo，减少数据体积

db.createCollection( "email", { storageEngine: {
wiredTiger: { configString: 'block_compressor=zlib' }}})

经过测试，上亿的数据，presto+mongo比较轻松，甚至比hive快，如果更大数据库估计还是要用hive。

ghostyusheng

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
presto-mongodb-jupyter环境搭建

presto配置1.presto 安装包下载https://prestosql.io/download.html2.cd presto-server-xxx/etc3.mkdir catalog4.确保存在以下文件，没有就创建(192.168.201.31 换成自己主结点的 IP)FILE: jvm.config-server-Xmx20G-XX:+UseConcMarkSw...
复制链接

扫一扫