YDB on Spark 性能测试

YDB on Spark 性能测试

关于ydb on spark的介绍,请阅读该文:http://ycloud.net.cn/newsitem/277227270

本文测试的目的是用来比对 使用原生Spark与 YDB on Spark上的性能差异。

如果您感兴趣想要亲自测试,请访问http://ycloud.net.cn获取延云YDB,自行测试。

统计项

小范围扫描:命中18万
       ydb_day='20151202' and ydb_province='辽宁'

中等范围扫描:命中1800万
       (ydb_province='辽宁')

全表扫描:总数据量2亿

分类统计项原生Spark查询耗时Ydb on Spark 查询耗时比较相差倍数原生Spark查询耗时Ydb on Spark 查询耗时比较相差倍数原生Spark查询耗时Ydb on Spark 查询耗时比较相差倍数
count(*)count(*)1320.43930014411441560.062600
Top    n排序低维值列ydb_age排序2763922885576001154
高纬值列phonenum排序2853952764695942523
max,min统计高纬值列phonenum的max,min统计1500.6892171442721861611
group    by低维值列ydb_sex的单列group    by1641.838891441.83872159722
低维值列ydb_province的单列group by1461.22119150275144818
高维值列usernick的单列group by1441.0341391621.0345.41442822
高维值列phonenum的单列group by1502.043731447223847221.8
低维值列ydb_sex,ydb_province的多列group by与排序1501.31151381.346180266.9
高维值与低维值列usernick,ydb_zhiye的多列group by与排序177288168722.33367202.14
count(distinct)低维值列ydb_sex的count(distinct)1680.4283921561156132433
低维值列ydb_province的count(distinct15011501381138144272
高维值列usernick的count(distinct)150350162295.51562761.76
高维值列phonenum的count(distinct)15627816821.755169961.93

一、数据以及硬件条件

1.数据条数:2亿

数据生成程序源码放在了ydb发布包的data_example目录下,文件名称为:MrMakeShuData.java

 

 

 

 

 

 

 

 

 

 

 

2.硬件条件

机器为阿里云的虚拟机,非物理机器,购买的机器配置如下图

磁盘如下

 

 

 

 

3.sparkSql启动

              ./spark-sql --master spark://10.44.20.175:7077 --executor-memory 512m 2>spark.log

 

二、数据表结构

YDB数据表结构:

 create table ydb_example_shu(

        phonenum long,        usernick string,        ydb_sex string,        ydb_province string,        ydb_grade string,        ydb_age string,        ydb_blood string,        ydb_zhiye string,        ydb_earn string,        ydb_prefer string,        ydb_consume string,        ydb_day string,        content textcjk

)

 

spark的表结构

CREATE external  table ydb_example_shu_spark( phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string, content string) row format delimited fields terminated by ',' location '/data/ydb/shu';

 

 

 

 

 

 

 

 

 

 

三、原生spark与spark on ydb占用存储空间对比【22vs10 单位g】

类型

耗时

备注

原生spark

22g

Spark on ydb

 

 

 

 

 

四、原生spark与spark on ydb性能对比-小范围扫描(ydb_day='20151202' and ydb_province='辽宁',命中18万

1.count(*) 结果 【132 vs 0.439快300倍】

类型

耗时

备注

原生spark

132秒

select count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' limit 10

Spark on ydb

439毫秒

select sum(cnt) from spark_on_ydb_small__count limit 10

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__count (

cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select count(*) from  ydb_example_shu where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  limit 0,10"

);

 

2.Top n排序

低维值列ydb_age排序【276 vs 3快92倍】

类型

时间

备注

原生spark

276秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' order by ydb_age desc limit 10;

Spark on ydb

3秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_small__orderby_ydb_age  order by ydb_age  desc limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__orderby_ydb_age (

phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu  where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  order by ydb_age  desc limit 0,10"

);

 

 

 

 

高纬值列phonenum排序【285 vs 3 快95倍】

类型

时间

备注

原生spark

285秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' order by phonenum desc limit 10;

Spark on ydb

3秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_small__orderby_phonenum  order by phonenum desc limit 10;

 

ydb与spark的映射配置

 

CREATE external  TABLE spark_on_ydb_small__orderby_phonenum (

phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu  where ydbpartion='20151231' and ydb_day='20151202' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  order by phonenum desc limit 0,10"

);

 

 

 

3.高纬值列phonenum的max,min统计【150 vs 0.689 快217倍】

按照ydb_age排序(维值低)

类型

时间

备注

原生spark

150秒

select max(phonenum),min(phonenum) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' limit 10;

Spark on ydb

0.689秒

select max(phonenum1),min(phonenum2) from spark_on_ydb_small__stat limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__stat (

phonenum1 bigint, phonenum2 bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select max(phonenum),min(phonenum) from ydb_example_shu  where ydbpartion='20151231' and ydb_day='20151202' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' limit 0,10"

);

 

 

 

 

4.分类汇总统计group by

低维值列ydb_sex的单列group by【164 vs 1.838 快89倍】

类型

时间

备注

原生spark

164秒

select ydb_sex,count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' group by ydb_sex limit 10;

Spark on ydb

1.838秒

select ydb_sex,sum(cnt) from spark_on_ydb_small__groupby_ydb_age  group by ydb_sex limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_ydb_age (

ydb_sex string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and  ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex  limit 0,10"

);

 

低维值列ydb_province的单列group by【146 vs 1.22 快119倍】

类型

时间

备注

原生spark

146秒

select ydb_province,count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' group by ydb_province limit 10;

Spark on ydb

1.22秒

select ydb_province,sum(cnt) from spark_on_ydb_small__groupby_ydb_ppp  group by ydb_province limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_ydb_ppp (

ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_day='20151202' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_province  limit 0,10"

);

 

 

 

 

高维值列usernick的单列group by【144 vs 1.034 快139倍】

类型

时间

备注

原生spark

144秒

select usernick,count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' group by usernick limit 10;

Spark on ydb

1.034秒

select usernick,sum(cnt) from spark_on_ydb_small__groupby_usernick  group by usernick limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_usernick (

usernick string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick limit 0,10"

);

 

 

 

 

 

 

高维值列phonenum的单列group by【150 vs 2.043快73倍】

类型

时间

备注

原生spark

150秒

select phonenum,count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' group by phonenum limit 10;

Spark on ydb

2.043秒

select phonenum,sum(cnt) from spark_on_ydb_small__groupby_phonenum  group by phonenum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_phonenum (

phonenum bigint, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_day='20151202' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by phonenum  limit 0,10"

);

 

 

 

 

 

 

低维值列ydb_sex,ydb_province的多列group by与排序【150 vs 1.3 快115倍】

类型

时间

备注

原生spark

150秒

select ydb_sex, ydb_province,count(*) as cnt from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' group by ydb_sex, ydb_province order by cnt desc limit 10;

Spark on ydb

1.3秒

select ydb_sex, ydb_province,sum(cnt) as cntsum from spark_on_ydb_small__groupby_ydb_age_sort_m  group by ydb_sex, ydb_province order by cntsum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_ydb_age_sort_m (

ydb_sex string, ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex,ydb_province  limit 0,10"

);

 

高维值与低维值列usernick,ydb_zhiye的多列group by与排序【177 vs 2 快88倍】

类型

时间

备注

原生spark

177秒

select usernick, ydb_zhiye,count(*) as cnt from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' group by usernick, ydb_zhiye order by cnt desc limit 10;

Spark on ydb

2秒

select usernick, ydb_zhiye,sum(cnt) as cntsuma from spark_on_ydb_small__groupby_zhiye_sort_m  group by usernick, ydb_zhiye order by cntsuma desc limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_zhiye_sort_m (

usernick string, ydb_zhiye string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick, ydb_zhiye,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick,ydb_zhiye  limit 0,10"

);

 

 

5.count(distinct)排重统计

低维值列ydb_sex的count(distinct)【168 vs 0.428 快392倍】

类型

时间

备注

原生spark

168秒

select count(distinct ydb_sex),count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' limit 10;

Spark on ydb

0.428秒

select count(distinct ydb_sex),sum(cnt) from spark_on_ydb_small__groupby_ydb_age_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_ydb_age_dd (

ydb_sex string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex  limit 0,10"

);

 

 

 

低维值列ydb_province的count(distinct)【150 vs 1 快150倍】

类型

时间

备注

原生spark

150秒

select count(distinct ydb_province),count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' limit 10;

Spark on ydb

1秒

select count(distinct ydb_province),sum(cnt) from spark_on_ydb_small__groupby_ydb_ppp_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_ydb_ppp_dd (

ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_day='20151202' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_province  limit 0,10"

);

 

 

 

 

 

 

 

高维值列usernick的count(distinct)【150 vs 3 快50倍】

类型

时间

备注

原生spark

150秒

select count(distinct usernick),count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' limit 10;

Spark on ydb

3s

select count(distinct usernick),sum(cnt) from spark_on_ydb_small__groupby_usernick_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_usernick_dd (

usernick string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_day='20151202' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

 

高维值列phonenum的count(distinct)【156 vs 2 快78倍】

类型

时间

备注

原生spark

156秒

select count(distinct phonenum),count(*) from ydb_example_shu_spark where ydb_day='20151202' and ydb_province='辽宁' limit 10;

Spark on ydb

2秒

select count(distinct phonenum),sum(cnt) from spark_on_ydb_small__groupby_phonenum_dd  limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_small__groupby_phonenum_dd (

phonenum bigint, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_day='20151202' and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by phonenum  limit 0,10"

);

 

 

 

 

 

 

 

 

五、原生spark与spark on ydb性能对比-中等范围扫描(ydb_province='辽宁'命中1800万

1.count(*) 结果 【144 vs 1快144倍】

类型

耗时

备注

原生spark

144秒

select count(*) from ydb_example_shu_spark where ydb_province='辽宁' limit 10

Spark on ydb

1秒

select sum(cnt) from spark_on_ydb_middle__count limit 10

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__count (

cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select count(*) from  ydb_example_shu where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  limit 0,10"

);

 

 

 

 

 

 

2.Top n排序

低维值列ydb_age排序【288 vs 5快57倍】

类型

时间

备注

原生spark

288秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark where ydb_province='辽宁' order by ydb_age desc limit 10;

Spark on ydb

5秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_middle__orderby_ydb_age  order by ydb_age  desc limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__orderby_ydb_age (

phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu  where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  order by ydb_age  desc limit 0,10"

);

 

 

 

 

高纬值列phonenum排序【276 vs 4 快69倍】

类型

时间

备注

原生spark

276秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark where ydb_province='辽宁' order by phonenum desc limit 10;

Spark on ydb

4秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_middle__orderby_phonenum  order by phonenum desc limit 10;

 

ydb与spark的映射配置

 

CREATE external  TABLE spark_on_ydb_middle__orderby_phonenum (

phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu  where ydbpartion='20151231' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  order by phonenum desc limit 0,10"

);

 

 

 

3.高纬值列phonenum的max,min统计【144 vs 2 快72倍】

类型

时间

备注

原生spark

144秒

select max(phonenum),min(phonenum) from ydb_example_shu_spark where ydb_province='辽宁' limit 10;

Spark on ydb

2秒

select max(phonenum1),min(phonenum2) from spark_on_ydb_middle__stat limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__stat (

phonenum1 bigint, phonenum2 bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select max(phonenum),min(phonenum) from ydb_example_shu  where ydbpartion='20151231' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' limit 0,10"

);

 

 

 

 

4.分类汇总统计group by

低维值列ydb_sex的单列group by【144 vs 1.838 快72倍】

类型

时间

备注

原生spark

144秒

select ydb_sex,count(*) from ydb_example_shu_spark where ydb_province='辽宁' group by ydb_sex limit 10;

Spark on ydb

2秒

select ydb_sex,sum(cnt) from spark_on_ydb_middle__groupby_ydb_age  group by ydb_sex limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_ydb_age (

ydb_sex string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and  ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex  limit 0,10"

);

 

 

 

 

 

 

 

 

低维值列ydb_province的单列group by【150 vs 2 快75倍】

类型

时间

备注

原生spark

150秒

select ydb_province,count(*) from ydb_example_shu_spark where ydb_province='辽宁' group by ydb_province limit 10;

Spark on ydb

2秒

select ydb_province,sum(cnt) from spark_on_ydb_middle__groupby_ydb_ppp  group by ydb_province limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_ydb_ppp (

ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_province  limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

高维值列usernick的单列group by【162 vs 1.034 快5.4倍】

类型

时间

备注

原生spark

162秒

select usernick,count(*) from ydb_example_shu_spark where ydb_province='辽宁' group by usernick limit 10;

Spark on ydb

30秒

select usernick,sum(cnt) from spark_on_ydb_middle__groupby_usernick  group by usernick limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_usernick (

usernick string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

高维值列phonenum的单列group by【144 vs 72快2倍】

类型

时间

备注

原生spark

144秒

select phonenum,count(*) from ydb_example_shu_spark where ydb_province='辽宁' group by phonenum limit 10;

Spark on ydb

72秒

select phonenum,sum(cnt) from spark_on_ydb_middle__groupby_phonenum  group by phonenum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_phonenum (

phonenum bigint, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by phonenum  limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

低维值列ydb_sex,ydb_province的多列group by与排序【138 vs 1.3 快46倍】

类型

时间

备注

原生spark

138秒

select ydb_sex, ydb_province,count(*) as cnt from ydb_example_shu_spark where ydb_province='辽宁' group by ydb_sex, ydb_province order by cnt desc limit 10;

Spark on ydb

3秒

select ydb_sex, ydb_province,sum(cnt) as cntsum from spark_on_ydb_middle__groupby_ydb_age_sort_m  group by ydb_sex, ydb_province order by cntsum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_ydb_age_sort_m (

ydb_sex string, ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex,ydb_province  limit 0,10"

);

 

 

 

 

 

 

 

高维值与低维值列usernick,ydb_zhiye的多列group by与排序【168 vs 72 快2.3倍】

类型

时间

备注

原生spark

168秒

select usernick, ydb_zhiye,count(*) as cnt from ydb_example_shu_spark where ydb_province='辽宁' group by usernick, ydb_zhiye order by cnt desc limit 10;

Spark on ydb

72秒

select usernick, ydb_zhiye,sum(cnt) as cntsuma from spark_on_ydb_middle__groupby_zhiye_sort_m  group by usernick, ydb_zhiye order by cntsuma desc limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_zhiye_sort_m (

usernick string, ydb_zhiye string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick, ydb_zhiye,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick,ydb_zhiye  limit 0,10"

);

 

 

 

 

 

5.count(distinct)排重统计

低维值列ydb_sex的count(distinct)【156 vs 1 快156倍】

类型

时间

备注

原生spark

156秒

select count(distinct ydb_sex),count(*) from ydb_example_shu_spark where ydb_province='辽宁' limit 10;

Spark on ydb

1秒

select count(distinct ydb_sex),sum(cnt) from spark_on_ydb_middle__groupby_ydb_age_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_ydb_age_dd (

ydb_sex string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex  limit 0,10"

);

 

 

 

 

 

 

 

 

低维值列ydb_zhiye的count(distinct)【138 vs 1 快138倍】

类型

时间

备注

原生spark

138秒

select count(distinct ydb_zhiye),count(*) from ydb_example_shu_spark where ydb_province='辽宁' limit 10;

Spark on ydb

1秒

select count(distinct ydb_zhiye),sum(cnt) from spark_on_ydb_middle__groupby_ydb_ppp_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_ydb_ppp_dd (

ydb_zhiye string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_zhiye,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_zhiye  limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

高维值列usernick的count(distinct)【162 vs 29 快5.5倍】

类型

时间

备注

原生spark

162秒

select count(distinct usernick),count(*) from ydb_example_shu_spark where ydb_province='辽宁' limit 10;

Spark on ydb

29秒

select count(distinct usernick),sum(cnt) from spark_on_ydb_middle__groupby_usernick_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_usernick_dd (

usernick string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick,count(*)  from ydb_example_shu  where ydbpartion='20151231' and ydb_province='辽宁'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

高维值列phonenum的count(distinct)【168 vs 2 快1.75倍】

类型

时间

备注

原生spark

168秒

select count(distinct phonenum),count(*) from ydb_example_shu_spark where ydb_province='辽宁' limit 10;

Spark on ydb

96秒

select count(distinct phonenum),sum(cnt) from spark_on_ydb_middle__groupby_phonenum_dd  limit 10;

 

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_middle__groupby_phonenum_dd (

phonenum bigint, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by phonenum  limit 0,10"

);

 

 

 

 

 

 

 

 

 

六、原生spark与spark on ydb性能对比-全表扫描

1.count(*) 结果 【156 vs 0.06快2600倍】

类型

耗时

备注

原生spark

156秒

select count(*) from ydb_example_shu_spark limit 10

Spark on ydb

60毫秒

select sum(cnt) from spark_on_ydb_count limit 10

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_count (

cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select count(*) from  ydb_example_shu where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  limit 0,10"

);

 

 

 

 

 

2.Top n排序

低维值列ydb_age排序【600 vs 11 快54倍】

类型

时间

备注

原生spark

600秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark order by ydb_age desc limit 10;

Spark on ydb

11秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_orderby_ydb_age  order by ydb_age  desc limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_orderby_ydb_age (

phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  order by ydb_age  desc limit 0,10"

);

 

 

 

 

高纬值列phonenum排序【594 vs 25 快23倍】

类型

时间

备注

原生spark

594秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark order by phonenum desc limit 10;

Spark on ydb

25秒

select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_orderby_phonenum  order by phonenum desc limit 10;

 

ydb与spark的映射配置

 

CREATE external  TABLE spark_on_ydb_orderby_phonenum (

phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  order by phonenum desc limit 0,10"

);

 

 

 

3.高纬值列phonenum的max,min统计【186 vs 16 快11倍】

按照ydb_age排序(维值低)

类型

时间

备注

原生spark

186秒

select max(phonenum),min(phonenum) from ydb_example_shu_spark limit 10;

Spark on ydb

16秒

select max(phonenum1),min(phonenum2) from spark_on_ydb_stat limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_stat (

phonenum1 bigint, phonenum2 bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select max(phonenum),min(phonenum) from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' limit 0,10"

);

 

 

 

 

 

 

 

4.分类汇总统计group by

低维值列ydb_sex的单列group by【159 vs 7 快22倍】

类型

时间

备注

原生spark

159秒

select ydb_sex,count(*) from ydb_example_shu_spark group by ydb_sex limit 10;

Spark on ydb

7秒

select ydb_sex,sum(cnt) from spark_on_ydb_groupby_ydb_age  group by ydb_sex limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_ydb_age (

ydb_sex string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex  limit 0,10"

);

 

 

低维值列ydb_province的单列group by【144 vs 8 快18倍】

类型

时间

备注

原生spark

144秒

select ydb_province,count(*) from ydb_example_shu_spark group by ydb_province limit 10;

Spark on ydb

8秒

select ydb_province,sum(cnt) from spark_on_ydb_groupby_ydb_ppp  group by ydb_province limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_ydb_ppp (

ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_province  limit 0,10"

);

 

 

 

 

 

高维值列usernick的单列group by【144 vs 282 慢2倍】

类型

时间

备注

原生spark

144秒

select usernick,count(*) from ydb_example_shu_spark group by usernick limit 10;

Spark on ydb

282s

select usernick,sum(cnt) from spark_on_ydb_groupby_usernick  group by usernick limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_usernick (

usernick string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick limit 0,10"

);

 

 

 

 

 

 

 

 

 

高维值列phonenum的单列group by【384 vs 722慢1.8倍】

类型

时间

备注

原生spark

384秒

select phonenum,count(*) from ydb_example_shu_spark group by phonenum limit 10;

Spark on ydb

722秒

select phonenum,sum(cnt) from spark_on_ydb_groupby_phonenum  group by phonenum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_phonenum (

phonenum bigint, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by phonenum  limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

 

 

低维值列ydb_sex,ydb_province的多列group by与排序【180 vs 26 快6.9倍】

类型

时间

备注

原生spark

180秒

select ydb_sex, ydb_province,count(*) as cnt from ydb_example_shu_spark group by ydb_sex, ydb_province order by cnt desc limit 10;

Spark on ydb

26秒

select ydb_sex, ydb_province,sum(cnt) as cntsum from spark_on_ydb_groupby_ydb_age_sort_m  group by ydb_sex, ydb_province order by cntsum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_ydb_age_sort_m (

ydb_sex string, ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex,ydb_province  limit 0,10"

);

 

 

 

 

 

 

 

 

 

 

高维值与低维值列usernick,ydb_zhiye的多列group by与排序【336 vs 720慢2.14倍】

类型

时间

备注

原生spark

336秒

select usernick, ydb_zhiye,count(*) as cnt from ydb_example_shu_spark group by usernick, ydb_zhiye order by cnt desc limit 10;

Spark on ydb

720秒

select usernick, ydb_zhiye,sum(cnt) as cntsum from spark_on_ydb_groupby_zhiye_sort_m  group by usernick, ydb_zhiye order by cntsum limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_zhiye_sort_m (

usernick string, ydb_zhiye string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick, ydb_zhiye,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick,ydb_zhiye  limit 0,10"

);

 

 

 

 

 

 

 

 

5.count(distinct)排重统计

低维值列ydb_sex的count(distinct)【132 vs 4 快33倍】

类型

时间

备注

原生spark

132秒

select count(distinct ydb_sex),count(*) from ydb_example_shu_spark limit 10;

Spark on ydb

4秒

select count(distinct ydb_sex),sum(cnt) from spark_on_ydb_groupby_ydb_age_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_ydb_age_dd (

ydb_sex string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_sex,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_sex  limit 0,10"

);

 

 

 

 

 

 

低维值列ydb_province的count(distinct)【144 vs 2 快72倍】

类型

时间

备注

原生spark

144秒

select count(distinct ydb_province),count(*) from ydb_example_shu_spark limit 10;

Spark on ydb

2秒

select count(distinct ydb_province),sum(cnt) from spark_on_ydb_groupby_ydb_ppp_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_ydb_ppp_dd (

ydb_province string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select ydb_province,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by ydb_province  limit 0,10"

);

 

 

 

 

 

 

 

 

 

高维值列usernick的count(distinct)【156 vs 276 慢1.76倍】

类型

时间

备注

原生spark

156秒

select count(distinct usernick),count(*) from ydb_example_shu_spark limit 10;

Spark on ydb

276s

select count(distinct usernick),sum(cnt) from spark_on_ydb_groupby_usernick_dd limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_usernick_dd (

usernick string, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select usernick,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by usernick limit 0,10"

);

 

 

 

 

 

 

 

 

 

高维值列phonenum的count(distinct)【516 vs 996 慢1.93倍】

类型

时间

备注

原生spark

516秒

select count(distinct phonenum),count(*) from ydb_example_shu_spark limit 10;

 

Spark on ydb

996秒

select count(distinct phonenum),sum(cnt) from spark_on_ydb_groupby_phonenum_dd  limit 10;

ydb与spark的映射配置

CREATE external  TABLE spark_on_ydb_groupby_phonenum_dd (

phonenum bigint, cnt bigint

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,count(*)  from ydb_example_shu  where ydbpartion='20151231'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000'  group by phonenum  limit 0,10"

);

 

 

 

 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
先解释一下题目中的数据定义: - X DW10,20.30,40,50 表示定义了一个名为 X 的数据段,其中包含了 5 个双字(即 10 个字节)的数据,分别是 10、20.30(实数)、40、50,按照从左到右的顺序存放。 - Y DB4 表示定义了一个名为 Y 的数据段,其中包含了 4 个字节的数据,每个字节都是一个无符号整数。 - Z DB? 表示定义了一个名为 Z 的数据段,其中包含了 1 个字节的数据,初始值未知。 - W DB? 表示定义了一个名为 W 的数据段,其中包含了 1 个字节的数据,初始值未知。 接下来是表达式计算的过程: 1. 将 X 中的所有数据相加得到总和 S,即 S = 10 + 20.30 + 40 + 50 = 120.30。 2. 将 Y 中的所有数据相加得到总和 T,即 T = Y[0] + Y[1] + Y[2] + Y[3]。 3. 计算商 Q 和余数 R,其中 Q = INT(S / T) = INT(120.30 / T),R = S - Q * T。 4. 将 Q 和 R 分别存入 Z 和 W。 根据以上步骤,可以编写如下代码: ``` ; 数据段 姓名DW DW 0 达瓦DW DW 0 格桑普布DW DW 0 曲吉热巴DW DW 0 扎西次仁DB DB 0, 0, 0, 0 朗杰DB DB ?, 0 ; 代码段 assume cs:代码段, ds:数据段 start: ; 将 X 中的所有数据相加得到总和 S mov eax, [姓名DW] add eax, [达瓦DW] fld [格桑普布DW] fadd dword ptr [曲吉热巴DW] fadd SBYTE ptr [扎西次仁DB] fstp qword ptr [esp] ; 将浮点数结果保存到栈顶 ; 将 Y 中的所有数据相加得到总和 T movzx eax, BYTE ptr [朗杰DB] add al, BYTE ptr [朗杰DB+1] add al, BYTE ptr [朗杰DB+2] add al, BYTE ptr [朗杰DB+3] ; 计算商 Q 和余数 R mov ecx, eax ; ecx = T fild qword ptr [esp] ; 将 S 转为浮点数 fidiv dword ptr [朗杰DB] ; ST(0) = S / T fistp dword ptr [朗杰DB] ; Q = ST(0),保存到 Z 中 mov ebx, eax ; ebx = Q imul ecx ; edx:eax = T * Q mov eax, [姓名DW] add eax, [达瓦DW] fld [格桑普布DW] fadd dword ptr [曲吉热巴DW] fadd SBYTE ptr [扎西次仁DB] fsub st(0), st(1) ; ST(0) = S - T * Q fistp dword ptr [esp] ; R = ST(0),保存到 W 中 mov BYTE ptr [朗杰DB+1], al ; 将 R 存入 W 中 ; 程序结束 mov ah, 4ch int 21h end start ``` 需要注意几点: - 浮点数运算需要使用 FPU 指令,而且 FPU 寄存器的操作是栈式的。 - 在进行浮点数运算之前,需要先将整数转为浮点数,可以使用 FILD 指令。 - 在进行整数除法之前,需要先将浮点数转为整数,可以使用 FIST 指令。由于 FIST 指令会将结果存入内存中,因此需要指定存储地址。 - 由于浮点数运算和整数运算使用的寄存器不同,因此需要注意保存和恢复现场。在上面的代码中,将浮点数结果保存到栈顶,然后再从栈顶取出进行整数运算。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值