YDB on Spark 性能测试
关于ydb on spark的介绍,请阅读该文:http://ycloud.net.cn/newsitem/277227270
本文测试的目的是用来比对 使用原生Spark与 YDB on Spark上的性能差异。
如果您感兴趣想要亲自测试,请访问http://ycloud.net.cn获取延云YDB,自行测试。
统计项 | 小范围扫描:命中18万 | 中等范围扫描:命中1800万 | 全表扫描:总数据量2亿 | ||||||||||
分类 | 统计项 | 原生Spark查询耗时 | Ydb on Spark 查询耗时 | 比较 | 相差倍数 | 原生Spark查询耗时 | Ydb on Spark 查询耗时 | 比较 | 相差倍数 | 原生Spark查询耗时 | Ydb on Spark 查询耗时 | 比较 | 相差倍数 |
count(*) | count(*) | 132 | 0.439 | 快 | 300 | 144 | 1 | 快 | 144 | 156 | 0.06 | 快 | 2600 |
Top n排序 | 低维值列ydb_age排序 | 276 | 3 | 快 | 92 | 288 | 5 | 快 | 57 | 600 | 11 | 快 | 54 |
高纬值列phonenum排序 | 285 | 3 | 快 | 95 | 276 | 4 | 快 | 69 | 594 | 25 | 快 | 23 | |
max,min统计 | 高纬值列phonenum的max,min统计 | 150 | 0.689 | 快 | 217 | 144 | 2 | 快 | 72 | 186 | 16 | 快 | 11 |
group by | 低维值列ydb_sex的单列group by | 164 | 1.838 | 快 | 89 | 144 | 1.838 | 快 | 72 | 159 | 7 | 快 | 22 |
低维值列ydb_province的单列group by | 146 | 1.22 | 快 | 119 | 150 | 2 | 快 | 75 | 144 | 8 | 快 | 18 | |
高维值列usernick的单列group by | 144 | 1.034 | 快 | 139 | 162 | 1.034 | 快 | 5.4 | 144 | 282 | 慢 | 2 | |
高维值列phonenum的单列group by | 150 | 2.043 | 快 | 73 | 144 | 72 | 快 | 2 | 384 | 722 | 慢 | 1.8 | |
低维值列ydb_sex,ydb_province的多列group by与排序 | 150 | 1.3 | 快 | 115 | 138 | 1.3 | 快 | 46 | 180 | 26 | 快 | 6.9 | |
高维值与低维值列usernick,ydb_zhiye的多列group by与排序 | 177 | 2 | 快 | 88 | 168 | 72 | 快 | 2.3 | 336 | 720 | 慢 | 2.14 | |
count(distinct) | 低维值列ydb_sex的count(distinct) | 168 | 0.428 | 快 | 392 | 156 | 1 | 快 | 156 | 132 | 4 | 快 | 33 |
低维值列ydb_province的count(distinct | 150 | 1 | 快 | 150 | 138 | 1 | 快 | 138 | 144 | 2 | 快 | 72 | |
高维值列usernick的count(distinct) | 150 | 3 | 快 | 50 | 162 | 29 | 快 | 5.5 | 156 | 276 | 慢 | 1.76 | |
高维值列phonenum的count(distinct) | 156 | 2 | 快 | 78 | 168 | 2 | 快 | 1.75 | 516 | 996 | 慢 | 1.93 |
一、数据以及硬件条件
1.数据条数:2亿
数据生成程序源码放在了ydb发布包的data_example目录下,文件名称为:MrMakeShuData.java
2.硬件条件
机器为阿里云的虚拟机,非物理机器,购买的机器配置如下图
磁盘如下
3.sparkSql启动
./spark-sql --master spark://10.44.20.175:7077 --executor-memory 512m 2>spark.log
二、数据表结构
YDB数据表结构:
create table ydb_example_shu(
phonenum long, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string, content textcjk
)
spark的表结构
CREATE external table ydb_example_shu_spark( phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string, content string) row format delimited fields terminated by ',' location '/data/ydb/shu';
三、原生spark与spark on ydb占用存储空间对比【22vs10 单位g】
类型 | 耗时 | 备注 |
原生spark | 22g |
|
Spark on ydb |
|
|
四、原生spark与spark on ydb性能对比-小范围扫描(ydb_day
='
20151202'
and
ydb_province='
辽宁
'
,命中18万
)
1.count(*) 结果 【132 vs 0.439快300倍】
类型 | 耗时 | 备注 |
原生spark | 132秒 | select count(*) from ydb_example_shu_spark where
|
Spark on ydb | 439毫秒 | select sum(cnt) from spark_on_ydb_small__count limit 10 |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__count ( cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select count(*) from ydb_example_shu where ydbpartion='20151231' and );
|
2.Top n排序
低维值列ydb_age排序【276 vs 3快92倍】
类型 | 时间 | 备注 |
原生spark | 276秒 | select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark where
|
Spark on ydb | 3秒 | select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_small__orderby_ydb_age order by ydb_age desc limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__orderby_ydb_age ( phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu where ydbpartion='20151231' and ); |
高纬值列phonenum排序【285 vs 3 快95倍】
类型 | 时间 | 备注 |
原生spark | 285秒 | select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark where
|
Spark on ydb | 3秒 | select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_small__orderby_phonenum order by phonenum desc limit 10;
|
ydb与spark的映射配置 |
CREATE external TABLE spark_on_ydb_small__orderby_phonenum ( phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu where ydbpartion='20151231' and ); |
3.高纬值列phonenum的max,min统计【150 vs 0.689 快217倍】
按照ydb_age排序(维值低)
类型 | 时间 | 备注 |
原生spark | 150秒 | select max(phonenum),min(phonenum) from ydb_example_shu_spark where
|
Spark on ydb | 0.689秒 | select max(phonenum1),min(phonenum2) from spark_on_ydb_small__stat limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__stat ( phonenum1 bigint, phonenum2 bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select max(phonenum),min(phonenum) from ydb_example_shu where ydbpartion='20151231' and ); |
4.分类汇总统计group by
低维值列ydb_sex的单列group by【164 vs 1.838 快89倍】
类型 | 时间 | 备注 |
原生spark | 164秒 | select ydb_sex,count(*) from ydb_example_shu_spark where
|
Spark on ydb | 1.838秒 | select ydb_sex,sum(cnt) from spark_on_ydb_small__groupby_ydb_age group by ydb_sex limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__groupby_ydb_age ( ydb_sex string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_sex,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
低维值列ydb_province的单列group by【146 vs 1.22 快119倍】
类型 | 时间 | 备注 |
原生spark | 146秒 | select ydb_province,count(*) from ydb_example_shu_spark where
|
Spark on ydb | 1.22秒 | select ydb_province,sum(cnt) from spark_on_ydb_small__groupby_ydb_ppp group by ydb_province limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__groupby_ydb_ppp ( ydb_province string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_province,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
高维值列usernick的单列group by【144 vs 1.034 快139倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | select usernick,count(*) from ydb_example_shu_spark where
|
Spark on ydb | 1.034秒 | select usernick,sum(cnt) from spark_on_ydb_small__groupby_usernick group by usernick limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__groupby_usernick ( usernick string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select usernick,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
高维值列phonenum的单列group by【150 vs 2.043快73倍】
类型 | 时间 | 备注 |
原生spark | 150秒 | select phonenum,count(*) from ydb_example_shu_spark where
|
Spark on ydb | 2.043秒 | select phonenum,sum(cnt) from spark_on_ydb_small__groupby_phonenum group by phonenum limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__groupby_phonenum ( phonenum bigint, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select phonenum,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
低维值列ydb_sex,ydb_province的多列group by与排序【150 vs 1.3 快115倍】
类型 | 时间 | 备注 |
原生spark | 150秒 | select ydb_sex, ydb_province,count(*) as cnt from ydb_example_shu_spark where
|
Spark on ydb | 1.3秒 | select ydb_sex, ydb_province,sum(cnt) as cntsum from spark_on_ydb_small__groupby_ydb_age_sort_m group by ydb_sex, ydb_province order by cntsum limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__groupby_ydb_age_sort_m ( ydb_sex string, ydb_province string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_sex,ydb_province,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
高维值与低维值列usernick,ydb_zhiye的多列group by与排序【177 vs 2 快88倍】
类型 | 时间 | 备注 |
原生spark | 177秒 | select usernick, ydb_zhiye,count(*) as cnt from ydb_example_shu_spark where
|
Spark on ydb | 2秒 | select usernick, ydb_zhiye,sum(cnt) as cntsuma from spark_on_ydb_small__groupby_zhiye_sort_m group by usernick, ydb_zhiye order by cntsuma desc limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__groupby_zhiye_sort_m ( usernick string, ydb_zhiye string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select usernick, ydb_zhiye,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
5.count(distinct)排重统计
低维值列ydb_sex的count(distinct)【168 vs 0.428 快392倍】
类型 | 时间 | 备注 |
原生spark | 168秒 | select count(distinct ydb_sex),count(*) from ydb_example_shu_spark where
|
Spark on ydb | 0.428秒 | select count(distinct ydb_sex),sum(cnt) from spark_on_ydb_small__groupby_ydb_age_dd limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__groupby_ydb_age_dd ( ydb_sex string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_sex,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
低维值列ydb_province的count(distinct)【150 vs 1 快150倍】
类型 | 时间 | 备注 |
原生spark | 150秒 | select count(distinct ydb_province),count(*) from ydb_example_shu_spark where
|
Spark on ydb | 1秒 | select count(distinct ydb_province),sum(cnt) from spark_on_ydb_small__groupby_ydb_ppp_dd limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__groupby_ydb_ppp_dd ( ydb_province string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_province,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
高维值列usernick的count(distinct)【150 vs 3 快50倍】
类型 | 时间 | 备注 |
原生spark | 150秒 | select count(distinct usernick),count(*) from ydb_example_shu_spark where
|
Spark on ydb | 3s | select count(distinct usernick),sum(cnt) from spark_on_ydb_small__groupby_usernick_dd limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__groupby_usernick_dd ( usernick string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select usernick,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
高维值列phonenum的count(distinct)【156 vs 2 快78倍】
类型 | 时间 | 备注 |
原生spark | 156秒 | select count(distinct phonenum),count(*) from ydb_example_shu_spark where
|
Spark on ydb | 2秒 | select count(distinct phonenum),sum(cnt) from spark_on_ydb_small__groupby_phonenum_dd limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_small__groupby_phonenum_dd ( phonenum bigint, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select phonenum,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
五、原生spark与spark on ydb性能对比-中等范围扫描(ydb_province='辽宁'
命中1800万
)
1.count(*) 结果 【144 vs 1快144倍】
类型 | 耗时 | 备注 |
原生spark | 144秒 | select count(*) from ydb_example_shu_spark where
|
Spark on ydb | 1秒 | select sum(cnt) from spark_on_ydb_middle__count limit 10 |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__count ( cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select count(*) from ydb_example_shu where ydbpartion='20151231' and );
|
2.Top n排序
低维值列ydb_age排序【288 vs 5快57倍】
类型 | 时间 | 备注 |
原生spark | 288秒 | select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark where
|
Spark on ydb | 5秒 | select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_middle__orderby_ydb_age order by ydb_age desc limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__orderby_ydb_age ( phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu where ydbpartion='20151231' and ); |
高纬值列phonenum排序【276 vs 4 快69倍】
类型 | 时间 | 备注 |
原生spark | 276秒 | select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark where
|
Spark on ydb | 4秒 | select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_middle__orderby_phonenum order by phonenum desc limit 10;
|
ydb与spark的映射配置 |
CREATE external TABLE spark_on_ydb_middle__orderby_phonenum ( phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu where ydbpartion='20151231' and ); |
3.高纬值列phonenum的max,min统计【144 vs 2 快72倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | select max(phonenum),min(phonenum) from ydb_example_shu_spark where
|
Spark on ydb | 2秒 | select max(phonenum1),min(phonenum2) from spark_on_ydb_middle__stat limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__stat ( phonenum1 bigint, phonenum2 bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select max(phonenum),min(phonenum) from ydb_example_shu where ydbpartion='20151231' and ); |
4.分类汇总统计group by
低维值列ydb_sex的单列group by【144 vs 1.838 快72倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | select ydb_sex,count(*) from ydb_example_shu_spark where
|
Spark on ydb | 2秒 | select ydb_sex,sum(cnt) from spark_on_ydb_middle__groupby_ydb_age group by ydb_sex limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__groupby_ydb_age ( ydb_sex string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_sex,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
低维值列ydb_province的单列group by【150 vs 2 快75倍】
类型 | 时间 | 备注 |
原生spark | 150秒 | select ydb_province,count(*) from ydb_example_shu_spark where
|
Spark on ydb | 2秒 | select ydb_province,sum(cnt) from spark_on_ydb_middle__groupby_ydb_ppp group by ydb_province limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__groupby_ydb_ppp ( ydb_province string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_province,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
高维值列usernick的单列group by【162 vs 1.034 快5.4倍】
类型 | 时间 | 备注 |
原生spark | 162秒 | select usernick,count(*) from ydb_example_shu_spark where
|
Spark on ydb | 30秒 | select usernick,sum(cnt) from spark_on_ydb_middle__groupby_usernick group by usernick limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__groupby_usernick ( usernick string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select usernick,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
高维值列phonenum的单列group by【144 vs 72快2倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | select phonenum,count(*) from ydb_example_shu_spark where
|
Spark on ydb | 72秒 | select phonenum,sum(cnt) from spark_on_ydb_middle__groupby_phonenum group by phonenum limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__groupby_phonenum ( phonenum bigint, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select phonenum,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
低维值列ydb_sex,ydb_province的多列group by与排序【138 vs 1.3 快46倍】
类型 | 时间 | 备注 |
原生spark | 138秒 | select ydb_sex, ydb_province,count(*) as cnt from ydb_example_shu_spark where
|
Spark on ydb | 3秒 | select ydb_sex, ydb_province,sum(cnt) as cntsum from spark_on_ydb_middle__groupby_ydb_age_sort_m group by ydb_sex, ydb_province order by cntsum limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__groupby_ydb_age_sort_m ( ydb_sex string, ydb_province string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_sex,ydb_province,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
高维值与低维值列usernick,ydb_zhiye的多列group by与排序【168 vs 72 快2.3倍】
类型 | 时间 | 备注 |
原生spark | 168秒 | select usernick, ydb_zhiye,count(*) as cnt from ydb_example_shu_spark where
|
Spark on ydb | 72秒 | select usernick, ydb_zhiye,sum(cnt) as cntsuma from spark_on_ydb_middle__groupby_zhiye_sort_m group by usernick, ydb_zhiye order by cntsuma desc limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__groupby_zhiye_sort_m ( usernick string, ydb_zhiye string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select usernick, ydb_zhiye,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
5.count(distinct)排重统计
低维值列ydb_sex的count(distinct)【156 vs 1 快156倍】
类型 | 时间 | 备注 |
原生spark | 156秒 | select count(distinct ydb_sex),count(*) from ydb_example_shu_spark where
|
Spark on ydb | 1秒 | select count(distinct ydb_sex),sum(cnt) from spark_on_ydb_middle__groupby_ydb_age_dd limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__groupby_ydb_age_dd ( ydb_sex string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_sex,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
低维值列ydb_zhiye的count(distinct)【138 vs 1 快138倍】
类型 | 时间 | 备注 |
原生spark | 138秒 | select count(distinct ydb_zhiye),count(*) from ydb_example_shu_spark where
|
Spark on ydb | 1秒 | select count(distinct ydb_zhiye),sum(cnt) from spark_on_ydb_middle__groupby_ydb_ppp_dd limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__groupby_ydb_ppp_dd ( ydb_zhiye string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_zhiye,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
高维值列usernick的count(distinct)【162 vs 29 快5.5倍】
类型 | 时间 | 备注 |
原生spark | 162秒 | select count(distinct usernick),count(*) from ydb_example_shu_spark where
|
Spark on ydb | 29秒 | select count(distinct usernick),sum(cnt) from spark_on_ydb_middle__groupby_usernick_dd limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__groupby_usernick_dd ( usernick string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select usernick,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
高维值列phonenum的count(distinct)【168 vs 2 快1.75倍】
类型 | 时间 | 备注 |
原生spark | 168秒 | select count(distinct phonenum),count(*) from ydb_example_shu_spark where
|
Spark on ydb | 96秒 | select count(distinct phonenum),sum(cnt) from spark_on_ydb_middle__groupby_phonenum_dd limit 10;
|
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_middle__groupby_phonenum_dd ( phonenum bigint, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select phonenum,count(*) from ydb_example_shu where ydbpartion='20151231' and ); |
六、原生spark与spark on ydb性能对比-全表扫描
1.count(*) 结果 【156 vs 0.06快2600倍】
类型 | 耗时 | 备注 |
原生spark | 156秒 | select count(*) from ydb_example_shu_spark limit 10
|
Spark on ydb | 60毫秒 | select sum(cnt) from spark_on_ydb_count limit 10 |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_count ( cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select count(*) from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' limit 0,10" );
|
2.Top n排序
低维值列ydb_age排序【600 vs 11 快54倍】
类型 | 时间 | 备注 |
原生spark | 600秒 | select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark order by ydb_age desc limit 10;
|
Spark on ydb | 11秒 | select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_orderby_ydb_age order by ydb_age desc limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_orderby_ydb_age ( phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' order by ydb_age desc limit 0,10" ); |
高纬值列phonenum排序【594 vs 25 快23倍】
类型 | 时间 | 备注 |
原生spark | 594秒 | select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu_spark order by phonenum desc limit 10;
|
Spark on ydb | 25秒 | select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from spark_on_ydb_orderby_phonenum order by phonenum desc limit 10;
|
ydb与spark的映射配置 |
CREATE external TABLE spark_on_ydb_orderby_phonenum ( phonenum bigint, usernick string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string, ydb_blood string, ydb_zhiye string, ydb_earn string, ydb_prefer string, ydb_consume string, ydb_day string ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' order by phonenum desc limit 0,10" ); |
3.高纬值列phonenum的max,min统计【186 vs 16 快11倍】
按照ydb_age排序(维值低)
类型 | 时间 | 备注 |
原生spark | 186秒 | select max(phonenum),min(phonenum) from ydb_example_shu_spark limit 10;
|
Spark on ydb | 16秒 | select max(phonenum1),min(phonenum2) from spark_on_ydb_stat limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_stat ( phonenum1 bigint, phonenum2 bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select max(phonenum),min(phonenum) from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' limit 0,10" ); |
4.分类汇总统计group by
低维值列ydb_sex的单列group by【159 vs 7 快22倍】
类型 | 时间 | 备注 |
原生spark | 159秒 | select ydb_sex,count(*) from ydb_example_shu_spark group by ydb_sex limit 10;
|
Spark on ydb | 7秒 | select ydb_sex,sum(cnt) from spark_on_ydb_groupby_ydb_age group by ydb_sex limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_groupby_ydb_age ( ydb_sex string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_sex,count(*) from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' group by ydb_sex limit 0,10" ); |
低维值列ydb_province的单列group by【144 vs 8 快18倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | select ydb_province,count(*) from ydb_example_shu_spark group by ydb_province limit 10;
|
Spark on ydb | 8秒 | select ydb_province,sum(cnt) from spark_on_ydb_groupby_ydb_ppp group by ydb_province limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_groupby_ydb_ppp ( ydb_province string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_province,count(*) from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' group by ydb_province limit 0,10" ); |
高维值列usernick的单列group by【144 vs 282 慢2倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | select usernick,count(*) from ydb_example_shu_spark group by usernick limit 10;
|
Spark on ydb | 282s | select usernick,sum(cnt) from spark_on_ydb_groupby_usernick group by usernick limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_groupby_usernick ( usernick string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select usernick,count(*) from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' group by usernick limit 0,10" ); |
高维值列phonenum的单列group by【384 vs 722慢1.8倍】
类型 | 时间 | 备注 |
原生spark | 384秒 | select phonenum,count(*) from ydb_example_shu_spark group by phonenum limit 10;
|
Spark on ydb | 722秒 | select phonenum,sum(cnt) from spark_on_ydb_groupby_phonenum group by phonenum limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_groupby_phonenum ( phonenum bigint, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select phonenum,count(*) from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' group by phonenum limit 0,10" ); |
低维值列ydb_sex,ydb_province的多列group by与排序【180 vs 26 快6.9倍】
类型 | 时间 | 备注 |
原生spark | 180秒 | select ydb_sex, ydb_province,count(*) as cnt from ydb_example_shu_spark group by ydb_sex, ydb_province order by cnt desc limit 10;
|
Spark on ydb | 26秒 | select ydb_sex, ydb_province,sum(cnt) as cntsum from spark_on_ydb_groupby_ydb_age_sort_m group by ydb_sex, ydb_province order by cntsum limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_groupby_ydb_age_sort_m ( ydb_sex string, ydb_province string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_sex,ydb_province,count(*) from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' group by ydb_sex,ydb_province limit 0,10" ); |
高维值与低维值列usernick,ydb_zhiye的多列group by与排序【336 vs 720慢2.14倍】
类型 | 时间 | 备注 |
原生spark | 336秒 | select usernick, ydb_zhiye,count(*) as cnt from ydb_example_shu_spark group by usernick, ydb_zhiye order by cnt desc limit 10;
|
Spark on ydb | 720秒 | select usernick, ydb_zhiye,sum(cnt) as cntsum from spark_on_ydb_groupby_zhiye_sort_m group by usernick, ydb_zhiye order by cntsum limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_groupby_zhiye_sort_m ( usernick string, ydb_zhiye string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select usernick, ydb_zhiye,count(*) from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' group by usernick,ydb_zhiye limit 0,10" ); |
5.count(distinct)排重统计
低维值列ydb_sex的count(distinct)【132 vs 4 快33倍】
类型 | 时间 | 备注 |
原生spark | 132秒 | select count(distinct ydb_sex),count(*) from ydb_example_shu_spark limit 10;
|
Spark on ydb | 4秒 | select count(distinct ydb_sex),sum(cnt) from spark_on_ydb_groupby_ydb_age_dd limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_groupby_ydb_age_dd ( ydb_sex string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_sex,count(*) from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' group by ydb_sex limit 0,10" ); |
低维值列ydb_province的count(distinct)【144 vs 2 快72倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | select count(distinct ydb_province),count(*) from ydb_example_shu_spark limit 10;
|
Spark on ydb | 2秒 | select count(distinct ydb_province),sum(cnt) from spark_on_ydb_groupby_ydb_ppp_dd limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_groupby_ydb_ppp_dd ( ydb_province string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select ydb_province,count(*) from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' group by ydb_province limit 0,10" ); |
高维值列usernick的count(distinct)【156 vs 276 慢1.76倍】
类型 | 时间 | 备注 |
原生spark | 156秒 | select count(distinct usernick),count(*) from ydb_example_shu_spark limit 10;
|
Spark on ydb | 276s | select count(distinct usernick),sum(cnt) from spark_on_ydb_groupby_usernick_dd limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_groupby_usernick_dd ( usernick string, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select usernick,count(*) from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' group by usernick limit 0,10" ); |
高维值列phonenum的count(distinct)【516 vs 996 慢1.93倍】
类型 | 时间 | 备注 |
原生spark | 516秒 | select count(distinct phonenum),count(*) from ydb_example_shu_spark limit 10;
|
Spark on ydb | 996秒 | select count(distinct phonenum),sum(cnt) from spark_on_ydb_groupby_phonenum_dd limit 10; |
ydb与spark的映射配置 | CREATE external TABLE spark_on_ydb_groupby_phonenum_dd ( phonenum bigint, cnt bigint ) STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ( "ydb.handler.hostport"="101.200.130.48:8080", "ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"=" select phonenum,count(*) from ydb_example_shu where ydbpartion='20151231' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100000000' and ydbkv='max.return.docset.size:100000000' group by phonenum limit 0,10" ); |