centos7安装使用presto

最新推荐文章于 2022-07-18 14:22:04 发布

星辰丶晟妍

最新推荐文章于 2022-07-18 14:22:04 发布

阅读量476

点赞数

分类专栏： Linux presto 文章标签： linux presto

本文链接：https://blog.csdn.net/qq_34648165/article/details/113365476

版权

Linux 同时被 2 个专栏收录

25 篇文章 0 订阅

订阅专栏

presto

1 篇文章 0 订阅

订阅专栏

参考链接

https://blog.csdn.net/wangpei1949/article/details/79952539

参考链接是18年的文章，可以用用
基于ambari平台安装
恢复ambari平台服务

做一下节点规划

master --ambari server --Presto coordinator
node01 --ambari node01 --Presto worker与Client
node02 --ambari node02 --Presto worker与Client

[all]安装presto的结点需要禁用requiretty,编辑/etc/sudoers文件，注释掉Defaults requiretty

[master]下载ambari-presto-service

wget https://github.com/prestodb/ambari-presto-service/releases/download/v1.2/ambari-presto-1.2.tar.gz
mkdir /var/lib/ambari-server/resources/stacks/HDP/2.6/services/PRESTO
tar xvf ambari-presto-1.2.tar.gz --strip-components 1 -C /var/lib/ambari-server/resources/stacks/HDP/2.6/services/PRESTO
chmod -R +x /var/lib/ambari-server/resources/stacks/HDP/2.6/services/PRESTO/*

[master]下载presto离线安装包

wget http://search.maven.org/remotecontent?filepath=com/facebook/presto/presto-server-rpm/0.198/presto-server-rpm-0.198.rpm
wget http://search.maven.org/remotecontent?filepath=com/facebook/presto/presto-cli/0.198/presto-cli-0.198-executable.jar

[master]配置仓库(需要到rpm和jar的目录下执行)

yum install createrepo -y
createrepo ${PWD}
python -m SimpleHTTPServer 8081

[master]增加repo文件

vim /etc/yum.repos.d/presto.repo
[presto_repo]
name=presto_repo
baseurl=http://172.28.128.11:8082
enable=1
gpgcheck=0

[master]修改presto版本和安装包下载路径

vim /var/lib/ambari-server/resources/stacks/HDP/2.6/services/PRESTO/metainfo.xml
        <version>0.198</version>

vim /var/lib/ambari-server/resources/stacks/HDP/2.6/services/PRESTO/package/scripts/download.ini       
        [download]
        presto_rpm_url = http://172.28.128.11:8082/presto-server-rpm-0.198.rpm
        presto_cli_url = http://172.28.128.11:8082/presto-cli-0.198-executable.jar

[master]Ambari安装Presto

ambari-server restart
按Ambari正常安装服务即可完成安装。discovery.uri:http://172.28.128.11:8285

安装报错如下

Error occured during stack advisor command invocation: Cannot create /var/run/ambari-server/stack-recommendations

执行如下名解决
```
chmod 777 /var/run/ambari-server/
```
剩下的安装步骤参考链接里的即可
安装完成
如何使用呢？

Ambari=>Presto=>Configs=>Connectors添加如下Connectors

{
  'hive':[
    'connector.name=hive-fake-table',
    'hive.metastore.uri=thrift://172.28.128.12:9083'
  ]
}

造假数据

# 建hive表
create database fake_db;
use fake_db;
CREATE TABLE fake_table(
   uuid string,
   name string,  
   phone_number string,
   address string,
   email string,
   cardno string,
   company string
)
comment 'the table which store test data.'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
    "separatorChar" = ",",
    "quoteChar" = "\"",
    "escapeChar" = "\\"
)
STORED AS TEXTFILE;
load data local inpath '/opt/tmp-dir/2020-10-15/output_step_1.csv' into table fake_db.fake_table;

【master】看看presto执行命令

# 打开客户端
/usr/lib/presto/bin/presto-cli --server 172.28.128.11:8285 --catalog hive --schema default

注意，presto的worker和coordinator不能安装在同一个节点上
配置可视化界面
```
https://www.jianshu.com/p/28fa8f867380
```

界面重启presto报错如下

Error connecting to presto server at: localhost:8285

解决办法

cd /var/lib/ambari-agent/cache/stacks/HDP/2.6/services/PRESTO/package/scripts
vim presto_coordinator.py 
# 注释掉如下代码
        smoketest_presto(
            PrestoClient('localhost','root',
                         config_properties['http-server.http.port']),
            all_hosts)

和mysql联查一下

对应的properties怎么配置

mysql.properties
connector.name=mysql
connection-url=jdbc:mysql://172.28.128.11:3306
connection-user=root
connection-password=123456

即：

{
  'hive':[
    'connector.name=hive-hadoop2',
    'hive.metastore.uri=thrift://172.28.128.12:9083'
  ],
  'mysql':[
    'connector.name=mysql',
    'connection-url=jdbc:mysql://172.28.128.11:3306',
    'connection-user=test_db',
    'connection-password=123456'
  ]
}

创建测试库表

create database test_db character set utf8 ;  
CREATE USER 'test_db'@'%'IDENTIFIED BY '123456';
GRANT ALL PRIVILEGES ON *.* TO 'test_db'@'%';
FLUSH PRIVILEGES;
create table test_table(
 _id VARCHAR(100) NOT NULL,
 field_1 longtext NOT NULL,
 field_2 longtext NOT NULL,
 field_3 longtext NOT NULL,
 PRIMARY KEY ( _id )
);

导入测试数据

INSERT INTO test_db.test_table (_id, field_1, field_2, field_3) VALUES (uuid(), "451323199505170811", "陈勇_mysql", "18896586067_mysql");

客户端查询比较一下效率

测试数据量  100W条
执行sql：
select count(1) from fake_db.fake_table;
hive mapreduce查询时间： 19.455 seconds
presto查询时间：         2 seconds
执行sql：
select count(distinct company) from fake_db.fake_table where cardno like '%19950517%';
hive mapreduce查询时间： 19.571 seconds
presto查询时间：         3 seconds

测试数据量： 1000W条
执行sql：
select count(1) from fake_db.fake_table;
hive mapreduce查询时间： 35.986 seconds
presto查询时间：         19 seconds
执行sql：
select count(distinct company) from fake_db.fake_table where cardno like '%32319950517%';
hive mapreduce查询时间： 45.522 seconds
presto查询时间：         23 seconds

代码连接查询数据测试

测试结果

测试数据量： 1000W条
执行sql：select * from fake_db.fake_table where cardno like '%32319950517%';
python presto查询时间：      15.718753576278687
python sparksql查询时间:     32.25914740562439
执行sql：select count(distinct company) from fake_db.fake_table where cardno like '%32319950517%';
python presto查询时间：      15.594711780548096
python sparksql查询时间:     30.081300973892212

造一个大表，里面存储数据大于内存，看看处理能力

测试数据量： 9000W条(占用14G空间，内存空余13G)
客户端查询：
    执行sql：
    	select count(1) from fake_db.fake_table;
    hive mapreduce查询时间： 
    	253.912 seconds
    presto查询时间：         
    	180 seconds
    执行sql：
    	select count(distinct company) from fake_db.fake_table where cardno like '%32319950517%';
    hive mapreduce查询时间： 
    	207.097 seconds
    presto查询时间：         
    	200 seconds
代码查询：
    执行sql：
    	select count(1) from fake_db.fake_table;
    hive mapreduce查询时间： 
    	199.2008821964264
    presto查询时间：         
    	176.18870639801025
    执行sql：
    	select count(distinct company) from fake_db.fake_table where cardno like '%32319950517%';
    hive mapreduce查询时间： 
    	216.40228271484375
    presto查询时间：         
    	196.51006293296814

两表联查测试比较

测试数据量： fake_db.fake_table：9000W条 fake_db.fake_table_little: 100W条
客户端查询：
    执行sql：
    	select distinct A.name hive_name,B.field_2 mysql_name from fake_db.fake_table as A,fake_db.fake_table_little as B where A.cardno=B.field_1 and A.cardno like "%32319950517%";
    hive mapreduce查询时间： 
    	432.075 seconds
    presto查询时间：         
    	197 seconds
代码查询：
    执行sql：
    	select distinct A.name hive_name,B.field_2 mysql_name from fake_db.fake_table as A,fake_db.fake_table_little as B where A.cardno=B.field_1 and A.cardno like "%32319950517%";
    hive mapreduce查询时间： 
    	203.07808113098145
    presto查询时间：         
    	197.08233761787415

用大表和mysql联查试一下

hive表数据量：  9000W
mysql表数据量： 100W
异构数据源两表查询语句：
select distinct A.name hive_name,B.field_2 mysql_name from hive.fake_db.fake_table as A,mysql.test_db.test_table as B where A.cardno=B.field_1 and A.cardno like '%32319950517%';
客户端： 195秒
代码:    195.43368768692017

星辰丶晟妍

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
centos7安装使用presto

参考链接https://blog.csdn.net/wangpei1949/article/details/79952539参考链接是18年的文章，可以用用基于ambari平台安装恢复ambari平台服务做一下节点规划master --ambari server --Presto coordinatornode01 --ambari node01 --Presto worker与Clientnode02 --ambari node02 --Presto worker与C..
复制链接

扫一扫