前言:
以目前的使用体验的话,Greenplum(以下简称GP)的实时性确实比较高,从存储层到计算层,数据吞吐效率比类Hadoop生态圈的sql工具要好得多。
伴随性能的提升,同时加深的是gp对硬件的要求。
就目前的GP集群的硬件配置情况来说:
5台22线程,64G内存,2T硬盘,千兆网卡机器(整体情况是110线程,320GB内存,disk IO 150MB/s,网络 IO 150MB/s)
与现今的spark集群相比(10台22线程,128G内存,30T硬盘,千兆网卡),sql查询性能提高50%-300%。以下是水星线上任务在GP和spark上运行
的对比表:
-------------------------------------------------------------------------------------------
Sql1: select count(*) from mercury.url_keyword where (keyword rlike '汽车' or keyword rlike '宝马') ;
-------------------------------------------------------------------------------------------
Sql2: select count(1) from mercury.mds_mercury_gid_dsp_c where dt='work' and Cbehe=1 and Cbiddingx=1;
--------------------------------------------------------------------------------------------
Sql3: select count(1) from mercury.url_tag_raw where dt='work' and tiyu=1 and keji=1;
-------------------------------------------------------------------------------------------
Sql4: select D.view_cnt,count(*) as gid_cnt
from (
select if(C.cnt<30,C.cnt,20) as view_cnt
from (
select B.gid,count(*) as cnt
from
(select url,keyword from url_keyword where (keyword rlike '汽车' or keyword rlike '宝马')) A
join mercury.sds_mercury_gid_cid_url B
on A.url=B.url group by B.gid
) C
) D group by D.view_cnt;
Sql
|
spark用时
|
gp用时
|
---|