-
参考链接
https://blog.csdn.net/wangpei1949/article/details/79952539
-
参考链接是18年的文章,可以用用
-
基于ambari平台安装
-
恢复ambari平台服务
-
做一下节点规划
master --ambari server --Presto coordinator node01 --ambari node01 --Presto worker与Client node02 --ambari node02 --Presto worker与Client
-
[all]安装presto的结点需要禁用requiretty,编辑/etc/sudoers文件,注释掉Defaults requiretty
-
[master]下载ambari-presto-service
wget https://github.com/prestodb/ambari-presto-service/releases/download/v1.2/ambari-presto-1.2.tar.gz mkdir /var/lib/ambari-server/resources/stacks/HDP/2.6/services/PRESTO tar xvf ambari-presto-1.2.tar.gz --strip-components 1 -C /var/lib/ambari-server/resources/stacks/HDP/2.6/services/PRESTO chmod -R +x /var/lib/ambari-server/resources/stacks/HDP/2.6/services/PRESTO/*
-
[master]下载presto离线安装包
wget http://search.maven.org/remotecontent?filepath=com/facebook/presto/presto-server-rpm/0.198/presto-server-rpm-0.198.rpm wget http://search.maven.org/remotecontent?filepath=com/facebook/presto/presto-cli/0.198/presto-cli-0.198-executable.jar
-
[master]配置仓库(需要到rpm和jar的目录下执行)
yum install createrepo -y createrepo ${PWD} python -m SimpleHTTPServer 8081
-
[master]增加repo文件
vim /etc/yum.repos.d/presto.repo [presto_repo] name=presto_repo baseurl=http://172.28.128.11:8082 enable=1 gpgcheck=0
-
[master]修改presto版本和安装包下载路径
vim /var/lib/ambari-server/resources/stacks/HDP/2.6/services/PRESTO/metainfo.xml <version>0.198</version> vim /var/lib/ambari-server/resources/stacks/HDP/2.6/services/PRESTO/package/scripts/download.ini [download] presto_rpm_url = http://172.28.128.11:8082/presto-server-rpm-0.198.rpm presto_cli_url = http://172.28.128.11:8082/presto-cli-0.198-executable.jar
-
[master]Ambari安装Presto
ambari-server restart 按Ambari正常安装服务即可完成安装。discovery.uri:http://172.28.128.11:8285
-
安装报错如下
Error occured during stack advisor command invocation: Cannot create /var/run/ambari-server/stack-recommendations
-
执行如下名解决
chmod 777 /var/run/ambari-server/
-
剩下的安装步骤参考链接里的即可
-
安装完成
-
如何使用呢?
-
Ambari=>Presto=>Configs=>Connectors添加如下Connectors
{ 'hive':[ 'connector.name=hive-fake-table', 'hive.metastore.uri=thrift://172.28.128.12:9083' ] }
-
造假数据
# 建hive表 create database fake_db; use fake_db; CREATE TABLE fake_table( uuid string, name string, phone_number string, address string, email string, cardno string, company string ) comment 'the table which store test data.' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"", "escapeChar" = "\\" ) STORED AS TEXTFILE; load data local inpath '/opt/tmp-dir/2020-10-15/output_step_1.csv' into table fake_db.fake_table;
-
【master】看看presto执行命令
# 打开客户端 /usr/lib/presto/bin/presto-cli --server 172.28.128.11:8285 --catalog hive --schema default
-
注意,presto的worker和coordinator不能安装在同一个节点上
-
配置可视化界面
https://www.jianshu.com/p/28fa8f867380
-
界面重启presto报错如下
Error connecting to presto server at: localhost:8285
-
解决办法
cd /var/lib/ambari-agent/cache/stacks/HDP/2.6/services/PRESTO/package/scripts vim presto_coordinator.py # 注释掉如下代码 smoketest_presto( PrestoClient('localhost','root', config_properties['http-server.http.port']), all_hosts)
-
和mysql联查一下
-
对应的properties怎么配置
mysql.properties connector.name=mysql connection-url=jdbc:mysql://172.28.128.11:3306 connection-user=root connection-password=123456
-
即:
{ 'hive':[ 'connector.name=hive-hadoop2', 'hive.metastore.uri=thrift://172.28.128.12:9083' ], 'mysql':[ 'connector.name=mysql', 'connection-url=jdbc:mysql://172.28.128.11:3306', 'connection-user=test_db', 'connection-password=123456' ] }
-
创建测试库表
create database test_db character set utf8 ; CREATE USER 'test_db'@'%'IDENTIFIED BY '123456'; GRANT ALL PRIVILEGES ON *.* TO 'test_db'@'%'; FLUSH PRIVILEGES; create table test_table( _id VARCHAR(100) NOT NULL, field_1 longtext NOT NULL, field_2 longtext NOT NULL, field_3 longtext NOT NULL, PRIMARY KEY ( _id ) );
-
导入测试数据
INSERT INTO test_db.test_table (_id, field_1, field_2, field_3) VALUES (uuid(), "451323199505170811", "陈勇_mysql", "18896586067_mysql");
-
-
客户端查询比较一下效率
测试数据量 100W条 执行sql: select count(1) from fake_db.fake_table; hive mapreduce查询时间: 19.455 seconds presto查询时间: 2 seconds 执行sql: select count(distinct company) from fake_db.fake_table where cardno like '%19950517%'; hive mapreduce查询时间: 19.571 seconds presto查询时间: 3 seconds 测试数据量: 1000W条 执行sql: select count(1) from fake_db.fake_table; hive mapreduce查询时间: 35.986 seconds presto查询时间: 19 seconds 执行sql: select count(distinct company) from fake_db.fake_table where cardno like '%32319950517%'; hive mapreduce查询时间: 45.522 seconds presto查询时间: 23 seconds
-
代码连接查询数据测试
-
测试结果
测试数据量: 1000W条 执行sql:select * from fake_db.fake_table where cardno like '%32319950517%'; python presto查询时间: 15.718753576278687 python sparksql查询时间: 32.25914740562439 执行sql:select count(distinct company) from fake_db.fake_table where cardno like '%32319950517%'; python presto查询时间: 15.594711780548096 python sparksql查询时间: 30.081300973892212
-
造一个大表,里面存储数据大于内存,看看处理能力
测试数据量: 9000W条(占用14G空间,内存空余13G) 客户端查询: 执行sql: select count(1) from fake_db.fake_table; hive mapreduce查询时间: 253.912 seconds presto查询时间: 180 seconds 执行sql: select count(distinct company) from fake_db.fake_table where cardno like '%32319950517%'; hive mapreduce查询时间: 207.097 seconds presto查询时间: 200 seconds 代码查询: 执行sql: select count(1) from fake_db.fake_table; hive mapreduce查询时间: 199.2008821964264 presto查询时间: 176.18870639801025 执行sql: select count(distinct company) from fake_db.fake_table where cardno like '%32319950517%'; hive mapreduce查询时间: 216.40228271484375 presto查询时间: 196.51006293296814
-
两表联查测试比较
测试数据量: fake_db.fake_table:9000W条 fake_db.fake_table_little: 100W条 客户端查询: 执行sql: select distinct A.name hive_name,B.field_2 mysql_name from fake_db.fake_table as A,fake_db.fake_table_little as B where A.cardno=B.field_1 and A.cardno like "%32319950517%"; hive mapreduce查询时间: 432.075 seconds presto查询时间: 197 seconds 代码查询: 执行sql: select distinct A.name hive_name,B.field_2 mysql_name from fake_db.fake_table as A,fake_db.fake_table_little as B where A.cardno=B.field_1 and A.cardno like "%32319950517%"; hive mapreduce查询时间: 203.07808113098145 presto查询时间: 197.08233761787415
-
用大表和mysql联查试一下
hive表数据量: 9000W mysql表数据量: 100W 异构数据源两表查询语句: select distinct A.name hive_name,B.field_2 mysql_name from hive.fake_db.fake_table as A,mysql.test_db.test_table as B where A.cardno=B.field_1 and A.cardno like '%32319950517%'; 客户端: 195秒 代码: 195.43368768692017
centos7安装使用presto
最新推荐文章于 2022-07-18 14:22:04 发布