YDB on Spark 性能测试

YDB on Spark 性能测试

关于ydb on spark的介绍,请阅读该文:http://ycloud.net.cn/newsitem/277227270

本文测试的目的是用来比对 使用原生Spark与 YDB on Spark上的性能差异。

如果您感兴趣想要亲自测试,请访问http://ycloud.net.cn获取延云YDB,自行测试。

统计项

小范围扫描:命中18万
       ydb_day='20151202' and ydb_province='辽宁'

中等范围扫描:命中1800万
       (ydb_province='辽宁')

全表扫描:总数据量2亿

分类统计项原生Spark查询耗时Ydb on Spark 查询耗时比较相差倍数原生Spark查询耗时Ydb on Spark 查询耗时比较相差倍数原生Spark查询耗时Ydb on Spark 查询耗时比较相差倍数
count(*)count(*)1320.43930014411441560.062600
Top    n排序低维值列ydb_age排序2763922885576001154
高纬值列phonenum排序2853952764695942523
max,min统计高纬值列phonenum的max,min统计1500.6892171442721861611
group    by低维值列ydb_sex的单列group    by1641.838891441.83872159722
低维值列ydb_province的单列group by1461.22119150275144818
高维值列usernick的单列group by1441.0341391621.0345.41442822
高维值列phonenum的单列group by1502.043731447223847221.8
低维值列ydb_sex,ydb_province的多列group by与排序1501.31151381.346180266.9
高维值与低维值列usernick,ydb_zhiye的多列group by与排序177288168722.33367202.14
count(distinct)低维值列ydb_sex的count(distinct)1680.4283921561156132433
低维值列ydb_province的count(distinct15011501381138144272
高维值列usernick的count(distinct)150350162295.51562761.76
高维值列phonenum的count(distinct)15627816821.755169961.93

一、数据以及硬件条件

1.数据条数:2亿

数据生成程序源码放在了ydb发布包的data_example目录下,文件名称为:MrMakeShuData.java

 

 

 

 

 

 

 

 

 

 

 

2.硬件条件

机器为阿里云的虚拟机,非物理机器,购买的机器配置如下图

磁盘如下

 

 

 

 

3.sparkSql启动

              ./spark-sql --master spark://10.44.20.175:7077 --executor-memory 512m 2>spark.log

 

二、数据表结构

YDB数据表结构:

 create table ydb_example_shu(

        phonenum long,        usernick string,        ydb_sex string,        ydb_province string,        ydb_grade string,        ydb_age string,        ydb_blood string,        ydb_zhiye string,        ydb_earn string,        ydb_prefer string,        ydb_consume string,        ydb_day string,        content textcjk

)

 

spark的表结构

CREATE external  table ydb_example_shu_spark( phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string, content string) row format delimited fields terminated by ',' location '/data/ydb/shu';

 

 

 

 

 

 

 

 

 

 

三、原生spark与spark on ydb占用存储空间对比【22vs10 单位g】

类型

耗时

备注

原生spark

22g

Spark on ydb

 

 

 

 

 

四、原生spark与spark on ydb性能对比-小范围扫描(ydb_day='20151202' and ydb_province='辽宁',命中18万

1.count(*) 结果 【132 vs 0.439快300倍】

类型

耗时

备注

原生spark

132秒

select count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' limit 10

Spark on ydb

439毫秒

select sum(cnt) from spark_on_ydb_small__count limit 10

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__count (

cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select count(*) from  ydb_example_shu where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  limit 0,10"

);

 

2.Top n排序

低维值列ydb_age排序【276 vs 3快92倍】

类型

时间

备注

原生spark

276秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' order by ydb_age desc limit 10;

Spark on ydb

3秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_small__orderby_ydb_age  order by ydb_age  desc limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__orderby_ydb_age (

phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu  where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  order by ydb_age  desc limit 0,10"

);

 

 

 

 

高纬值列phonenum排序【285 vs 3 快95倍】

类型

时间

备注

原生spark

285秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' order by phonenum desc limit 10;

Spark on ydb

3秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_small__orderby_phonenum  order by phonenum desc limit 10;

 

ydb与spark的映射配置

 

CREATE external  TABLE spark_on_ydb_small__orderby_phonenum (

phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu  where ydbpartion='20151231' and ydb_day='20151202' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  order by phonenum desc limit 0,10"

);

 

 

 

3.高纬值列phonenum的max,min统计【150 vs 0.689 快217倍】

按照ydb_age排序(维值低)

类型

时间

备注

原生spark

150秒

select max(phonenum),min(phonenum) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' limit 10;

Spark on ydb

0.689秒

select max(phonenum1),min(phonenum2) from spark_on_ydb_small__stat limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__stat (

phonenum1 bigint, phonenum2 bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select max(phonenum),min(phonenum) from ydb_example_shu  where ydbpartion='20151231' and ydb_day='20151202' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' limit 0,10"

);

 

 

 

 

4.分类汇总统计group by

低维值列ydb_sex的单列group by【164 vs 1.838 快89倍】

类型

时间

备注

原生spark

164秒

select ydb_sex,count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' group by ydb_sex limit 10;

Spark on ydb

1.838秒

select ydb_sex,sum(cnt) from spark_on_ydb_small__groupby_ydb_age  group by ydb_sex limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_ydb_age (

ydb_sex string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and  ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex  limit 0,10"

);

 

低维值列ydb_province的单列group by【146 vs 1.22 快119倍】

类型

时间

备注

原生spark

146秒

select ydb_province,count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' group by ydb_province limit 10;

Spark on ydb

1.22秒

select ydb_province,sum(cnt) from spark_on_ydb_small__groupby_ydb_ppp  group by ydb_province limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_ydb_ppp (

ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_day='20151202' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_province  limit 0,10"

);

 

 

 

 

高维值列usernick的单列group by【144 vs 1.034 快139倍】

类型

时间

备注

原生spark

144秒

select usernick,count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' group by usernick limit 10;

Spark on ydb

1.034秒

select usernick,sum(cnt) from spark_on_ydb_small__groupby_usernick  group by usernick limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_usernick (

usernick string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick limit 0,10"

);

 

 

 

 

 

 

高维值列phonenum的单列group by【150 vs 2.043快73倍】

类型

时间

备注

原生spark

150秒

select phonenum,count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' group by phonenum limit 10;

Spark on ydb

2.043秒

select phonenum,sum(cnt) from spark_on_ydb_small__groupby_phonenum  group by phonenum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_phonenum (

phonenum bigint, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_day='20151202' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by phonenum  limit 0,10"

);

 

 

 

 

 

 

低维值列ydb_sex,ydb_province的多列group by与排序【150 vs 1.3 快115倍】

类型

时间

备注

原生spark

150秒

select ydb_sex, ydb_province,count(*) as cnt from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' group by ydb_sex, ydb_province order by cnt desc limit 10;

Spark on ydb

1.3秒

select ydb_sex, ydb_province,sum(cnt) as cntsum from spark_on_ydb_small__groupby_ydb_age_sort_m  group by ydb_sex, ydb_province order by cntsum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_ydb_age_sort_m (

ydb_sex string, ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex,ydb_province  limit 0,10"

);

 

高维值与低维值列usernick,ydb_zhiye的多列group by与排序【177 vs 2 快88倍】

类型

时间

备注

原生spark

177秒

select usernick, ydb_zhiye,count(*) as cnt from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' group by usernick, ydb_zhiye order by cnt desc limit 10;

Spark on ydb

2秒

select usernick, ydb_zhiye,sum(cnt) as cntsuma from spark_on_ydb_small__groupby_zhiye_sort_m  group by usernick, ydb_zhiye order by cntsuma desc limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_zhiye_sort_m (

usernick string, ydb_zhiye string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick, ydb_zhiye,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick,ydb_zhiye  limit 0,10"

);

 

 

5.count(distinct)排重统计

低维值列ydb_sex的count(distinct)【168 vs 0.428 快392倍】

类型

时间

备注

原生spark

168秒

select count(distinct ydb_sex),count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' limit 10;

Spark on ydb

0.428秒

select count(distinct ydb_sex),sum(cnt) from spark_on_ydb_small__groupby_ydb_age_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_ydb_age_dd (

ydb_sex string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex  limit 0,10"

);

 

 

 

低维值列ydb_province的count(distinct)【150 vs 1 快150倍】

类型

时间

备注

原生spark

150秒

select count(distinct ydb_province),count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' limit 10;

Spark on ydb

1秒

select count(distinct ydb_province),sum(cnt) from spark_on_ydb_small__groupby_ydb_ppp_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_ydb_ppp_dd (

ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_day='20151202' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_province  limit 0,10"

);

 

 

 

 

 

 

 

高维值列usernick的count(distinct)【150 vs 3 快50倍】

类型

时间

备注

原生spark

150秒

select count(distinct usernick),count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' limit 10;

Spark on ydb

3s

select count(distinct usernick),sum(cnt) from spark_on_ydb_small__groupby_usernick_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_usernick_dd (

usernick string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_day='20151202' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

 

高维值列phonenum的count(distinct)【156 vs 2 快78倍】

类型

时间

备注

原生spark

156秒

select count(distinct phonenum),count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' limit 10;

Spark on ydb

2秒

select count(distinct phonenum),sum(cnt) from spark_on_ydb_small__groupby_phonenum_dd  limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_phonenum_dd (

phonenum bigint, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by phonenum  limit 0,10"

);

 

 

 

 

 

 

 

 

五、原生spark与spark on ydb性能对比-中等范围扫描(ydb_province='辽宁'命中1800万

1.count(*) 结果 【144 vs 1快144倍】

类型

耗时

备注

原生spark

144秒

select count(*) from ydb_example_shu_spark where ydb_province='辽宁' limit 10

Spark on ydb

1秒

select sum(cnt) from spark_on_ydb_middle__count limit 10

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__count (

cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select count(*) from  ydb_example_shu where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  limit 0,10"

);

 

 

 

 

 

 

2.Top n排序

低维值列ydb_age排序【288 vs 5快57倍】

类型

时间

备注

原生spark

288秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark where ydb_province='辽宁' order by ydb_age desc limit 10;

Spark on ydb

5秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_middle__orderby_ydb_age  order by ydb_age  desc limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__orderby_ydb_age (

phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu  where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  order by ydb_age  desc limit 0,10"

);

 

 

 

 

高纬值列phonenum排序【276 vs 4 快69倍】

类型

时间

备注

原生spark

276秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark where ydb_province='辽宁' order by phonenum desc limit 10;

Spark on ydb

4秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_middle__orderby_phonenum  order by phonenum desc limit 10;

 

ydb与spark的映射配置

 

CREATE external  TABLE spark_on_ydb_middle__orderby_phonenum (

phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu  where ydbpartion='20151231' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  order by phonenum desc limit 0,10"

);

 

 

 

3.高纬值列phonenum的max,min统计【144 vs 2 快72倍】

类型

时间

备注

原生spark

144秒

select max(phonenum),min(phonenum) from ydb_example_shu_spark where ydb_province='辽宁' limit 10;

Spark on ydb

2秒

select max(phonenum1),min(phonenum2) from spark_on_ydb_middle__stat limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__stat (

phonenum1 bigint, phonenum2 bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select max(phonenum),min(phonenum) from ydb_example_shu  where ydbpartion='20151231' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' limit 0,10"

);

 

 

 

 

4.分类汇总统计group by

低维值列ydb_sex的单列group by【144 vs 1.838 快72倍】

类型

时间

备注

原生spark

144秒

select ydb_sex,count(*) from ydb_example_shu_spark where ydb_province='辽宁' group by ydb_sex limit 10;

Spark on ydb

2秒

select ydb_sex,sum(cnt) from spark_on_ydb_middle__groupby_ydb_age  group by ydb_sex limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_ydb_age (

ydb_sex string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and  ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex  limit 0,10"

);

 

 

 

 

 

 

 

 

低维值列ydb_province的单列group by【150 vs 2 快75倍】

类型

时间

备注

原生spark

150秒

select ydb_province,count(*) from ydb_example_shu_spark where ydb_province='辽宁' group by ydb_province limit 10;

Spark on ydb

2秒

select ydb_province,sum(cnt) from spark_on_ydb_middle__groupby_ydb_ppp  group by ydb_province limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_ydb_ppp (

ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_province  limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

高维值列usernick的单列group by【162 vs 1.034 快5.4倍】

类型

时间

备注

原生spark

162秒

select usernick,count(*) from ydb_example_shu_spark where ydb_province='辽宁' group by usernick limit 10;

Spark on ydb

30秒

select usernick,sum(cnt) from spark_on_ydb_middle__groupby_usernick  group by usernick limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_usernick (

usernick string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

高维值列phonenum的单列group by【144 vs 72快2倍】

类型

时间

备注

原生spark

144秒

select phonenum,count(*) from ydb_example_shu_spark where ydb_province='辽宁' group by phonenum limit 10;

Spark on ydb

72秒

select phonenum,sum(cnt) from spark_on_ydb_middle__groupby_phonenum  group by phonenum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_phonenum (

phonenum bigint, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by phonenum  limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

低维值列ydb_sex,ydb_province的多列group by与排序【138 vs 1.3 快46倍】

类型

时间

备注

原生spark

138秒

select ydb_sex, ydb_province,count(*) as cnt from ydb_example_shu_spark where ydb_province='辽宁' group by ydb_sex, ydb_province order by cnt desc limit 10;

Spark on ydb

3秒

select ydb_sex, ydb_province,sum(cnt) as cntsum from spark_on_ydb_middle__groupby_ydb_age_sort_m  group by ydb_sex, ydb_province order by cntsum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_ydb_age_sort_m (

ydb_sex string, ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex,ydb_province  limit 0,10"

);

 

 

 

 

 

 

 

高维值与低维值列usernick,ydb_zhiye的多列group by与排序【168 vs 72 快2.3倍】

类型

时间

备注

原生spark

168秒

select usernick, ydb_zhiye,count(*) as cnt from ydb_example_shu_spark where ydb_province='辽宁' group by usernick, ydb_zhiye order by cnt desc limit 10;

Spark on ydb

72秒

select usernick, ydb_zhiye,sum(cnt) as cntsuma from spark_on_ydb_middle__groupby_zhiye_sort_m  group by usernick, ydb_zhiye order by cntsuma desc limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_zhiye_sort_m (

usernick string, ydb_zhiye string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick, ydb_zhiye,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick,ydb_zhiye  limit 0,10"

);

 

 

 

 

 

5.count(distinct)排重统计

低维值列ydb_sex的count(distinct)【156 vs 1 快156倍】

类型

时间

备注

原生spark

156秒

select count(distinct ydb_sex),count(*) from ydb_example_shu_spark where ydb_province='辽宁' limit 10;

Spark on ydb

1秒

select count(distinct ydb_sex),sum(cnt) from spark_on_ydb_middle__groupby_ydb_age_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_ydb_age_dd (

ydb_sex string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex  limit 0,10"

);

 

 

 

 

 

 

 

 

低维值列ydb_zhiye的count(distinct)【138 vs 1 快138倍】

类型

时间

备注

原生spark

138秒

select count(distinct ydb_zhiye),count(*) from ydb_example_shu_spark where ydb_province='辽宁' limit 10;

Spark on ydb

1秒

select count(distinct ydb_zhiye),sum(cnt) from spark_on_ydb_middle__groupby_ydb_ppp_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_ydb_ppp_dd (

ydb_zhiye string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_zhiye,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_zhiye  limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

高维值列usernick的count(distinct)【162 vs 29 快5.5倍】

类型

时间

备注

原生spark

162秒

select count(distinct usernick),count(*) from ydb_example_shu_spark where ydb_province='辽宁' limit 10;

Spark on ydb

29秒

select count(distinct usernick),sum(cnt) from spark_on_ydb_middle__groupby_usernick_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_usernick_dd (

usernick string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

高维值列phonenum的count(distinct)【168 vs 2 快1.75倍】

类型

时间

备注

原生spark

168秒

select count(distinct phonenum),count(*) from ydb_example_shu_spark where ydb_province='辽宁' limit 10;

Spark on ydb

96秒

select count(distinct phonenum),sum(cnt) from spark_on_ydb_middle__groupby_phonenum_dd  limit 10;

 

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_phonenum_dd (

phonenum bigint, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by phonenum  limit 0,10"

);

 

 

 

 

 

 

 

 

 

六、原生spark与spark on ydb性能对比-全表扫描

1.count(*) 结果 【156 vs 0.06快2600倍】

类型

耗时

备注

原生spark

156秒

select count(*) from ydb_example_shu_spark limit 10

Spark on ydb

60毫秒

select sum(cnt) from spark_on_ydb_count limit 10

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_count (

cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select count(*) from  ydb_example_shu where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  limit 0,10"

);

 

 

 

 

 

2.Top n排序

低维值列ydb_age排序【600 vs 11 快54倍】

类型

时间

备注

原生spark

600秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark order by ydb_age desc limit 10;

Spark on ydb

11秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_orderby_ydb_age  order by ydb_age  desc limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_orderby_ydb_age (

phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  order by ydb_age  desc limit 0,10"

);

 

 

 

 

高纬值列phonenum排序【594 vs 25 快23倍】

类型

时间

备注

原生spark

594秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark order by phonenum desc limit 10;

Spark on ydb

25秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_orderby_phonenum  order by phonenum desc limit 10;

 

ydb与spark的映射配置

 

CREATE external  TABLE spark_on_ydb_orderby_phonenum (

phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  order by phonenum desc limit 0,10"

);

 

 

 

3.高纬值列phonenum的max,min统计【186 vs 16 快11倍】

按照ydb_age排序(维值低)

类型

时间

备注

原生spark

186秒

select max(phonenum),min(phonenum) from ydb_example_shu_spark limit 10;

Spark on ydb

16秒

select max(phonenum1),min(phonenum2) from spark_on_ydb_stat limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_stat (

phonenum1 bigint, phonenum2 bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select max(phonenum),min(phonenum) from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' limit 0,10"

);

 

 

 

 

 

 

 

4.分类汇总统计group by

低维值列ydb_sex的单列group by【159 vs 7 快22倍】

类型

时间

备注

原生spark

159秒

select ydb_sex,count(*) from ydb_example_shu_spark group by ydb_sex limit 10;

Spark on ydb

7秒

select ydb_sex,sum(cnt) from spark_on_ydb_groupby_ydb_age  group by ydb_sex limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_ydb_age (

ydb_sex string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex  limit 0,10"

);

 

 

低维值列ydb_province的单列group by【144 vs 8 快18倍】

类型

时间

备注

原生spark

144秒

select ydb_province,count(*) from ydb_example_shu_spark group by ydb_province limit 10;

Spark on ydb

8秒

select ydb_province,sum(cnt) from spark_on_ydb_groupby_ydb_ppp  group by ydb_province limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_ydb_ppp (

ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_province  limit 0,10"

);

 

 

 

 

 

高维值列usernick的单列group by【144 vs 282 慢2倍】

类型

时间

备注

原生spark

144秒

select usernick,count(*) from ydb_example_shu_spark group by usernick limit 10;

Spark on ydb

282s

select usernick,sum(cnt) from spark_on_ydb_groupby_usernick  group by usernick limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_usernick (

usernick string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick limit 0,10"

);

 

 

 

 

 

 

 

 

 

高维值列phonenum的单列group by【384 vs 722慢1.8倍】

类型

时间

备注

原生spark

384秒

select phonenum,count(*) from ydb_example_shu_spark group by phonenum limit 10;

Spark on ydb

722秒

select phonenum,sum(cnt) from spark_on_ydb_groupby_phonenum  group by phonenum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_phonenum (

phonenum bigint, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by phonenum  limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

 

低维值列ydb_sex,ydb_province的多列group by与排序【180 vs 26 快6.9倍】

类型

时间

备注

原生spark

180秒

select ydb_sex, ydb_province,count(*) as cnt from ydb_example_shu_spark group by ydb_sex, ydb_province order by cnt desc limit 10;

Spark on ydb

26秒

select ydb_sex, ydb_province,sum(cnt) as cntsum from spark_on_ydb_groupby_ydb_age_sort_m  group by ydb_sex, ydb_province order by cntsum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_ydb_age_sort_m (

ydb_sex string, ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex,ydb_province  limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

高维值与低维值列usernick,ydb_zhiye的多列group by与排序【336 vs 720慢2.14倍】

类型

时间

备注

原生spark

336秒

select usernick, ydb_zhiye,count(*) as cnt from ydb_example_shu_spark group by usernick, ydb_zhiye order by cnt desc limit 10;

Spark on ydb

720秒

select usernick, ydb_zhiye,sum(cnt) as cntsum from spark_on_ydb_groupby_zhiye_sort_m  group by usernick, ydb_zhiye order by cntsum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_zhiye_sort_m (

usernick string, ydb_zhiye string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick, ydb_zhiye,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick,ydb_zhiye  limit 0,10"

);

 

 

 

 

 

 

 

 

5.count(distinct)排重统计

低维值列ydb_sex的count(distinct)【132 vs 4 快33倍】

类型

时间

备注

原生spark

132秒

select count(distinct ydb_sex),count(*) from ydb_example_shu_spark limit 10;

Spark on ydb

4秒

select count(distinct ydb_sex),sum(cnt) from spark_on_ydb_groupby_ydb_age_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_ydb_age_dd (

ydb_sex string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex  limit 0,10"

);

 

 

 

 

 

 

低维值列ydb_province的count(distinct)【144 vs 2 快72倍】

类型

时间

备注

原生spark

144秒

select count(distinct ydb_province),count(*) from ydb_example_shu_spark limit 10;

Spark on ydb

2秒

select count(distinct ydb_province),sum(cnt) from spark_on_ydb_groupby_ydb_ppp_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_ydb_ppp_dd (

ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_province  limit 0,10"

);

 

 

 

 

 

 

 

 

 

高维值列usernick的count(distinct)【156 vs 276 慢1.76倍】

类型

时间

备注

原生spark

156秒

select count(distinct usernick),count(*) from ydb_example_shu_spark limit 10;

Spark on ydb

276s

select count(distinct usernick),sum(cnt) from spark_on_ydb_groupby_usernick_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_usernick_dd (

usernick string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick limit 0,10"

);

 

 

 

 

 

 

 

 

 

高维值列phonenum的count(distinct)【516 vs 996 慢1.93倍】

类型

时间

备注

原生spark

516秒

select count(distinct phonenum),count(*) from ydb_example_shu_spark limit 10;

 

Spark on ydb

996秒

select count(distinct phonenum),sum(cnt) from spark_on_ydb_groupby_phonenum_dd  limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_phonenum_dd (

phonenum bigint, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by phonenum  limit 0,10"

);

 

 

 

 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值