DorisDB vs ClickHouse SSB对比测试
TL;DR
- 进行本次测试时对DorisDB了解甚微
- 本次测试由于服务器资源有限, 没有严格遵循单一变量原则进行测试
- 本次测试有一定参考意义
数据导入速度
- ClickHouse: 3500s
- DorisDB: 5160s
数据压缩情况(通过磁盘占用空间比较)
- ClickHouse: 85.2G
- DorisDB: 132G
查询速度
单表查询
DorisDB1 | DorisDB2 | ClickHouse1 | ClickHouse2 | |
---|---|---|---|---|
Q1.1 | 350 | 290 | 226 | 195 |
Q1.2 | 270 | 190 | 34 | 63 |
Q1.3 | 310 | 240 | 43 | 22 |
Q2.1 | 410 | 370 | 1,723 | 1,791 |
Q2.2 | 780 | 720 | 1,463 | 1,470 |
Q2.3 | 340 | 280 | 659 | 1,337 |
Q3.1 | 1,560 | 860 | 3,488 | 1,254 |
Q3.2 | 1,080 | 790 | 1,272 | 966 |
Q3.3 | 250 | 290 | 979 | 889 |
Q3.4 | 230 | 260 | 36 | 20 |
Q4.1 | 870 | 720 | 5,067 | 2,791 |
Q4.2 | 720 | 490 | 804 | 752 |
Q4.3 | 510 | 380 | 561 | 482 |
多表查询
DorisDB1 | DorisDB2 | ClickHouse1 | ClickHouse2 | |
---|---|---|---|---|
Q1.1 | 450 | 490 | 1,496 | 1,424 |
Q1.2 | 410 | 450 | 1,366 | 659 |
Q1.3 | 510 | 340 | 678 | 1,377 |
Q2.1 | 1,560 | 1,600 | 4,360 | 2,667 |
Q2.2 | 1,690 | 1,060 | 4,498 | 1,554 |
Q2.3 | 780 | 1,150 | 2,569 | 2,577 |
Q3.1 | 3,480 | 3,700 | 10,190 | 12,960 |
Q3.2 | 1,320 | 1,850 | 5,926 | 5,743 |
Q3.3 | 1,030 | 1,040 | 3,445 | 3,300 |
Q3.4 | 1,330 | 1,170 | 3,455 | 3,330 |
Q4.1 | 3,480 | 3,750 | 15,560 | 9,494 |
Q4.2 | 2,830 | 3,170 | 16,109 | 18,048 |
Q4.3 | 1,560 | 2,140 | 15,685 | 14,838 |
环境信息
ClickHouse: 3台 华为云ECS 高性能计算型 | h3.xlarge.2 | 4vCPUs | 8GB | 超高IO SSD
DorisDB: 3台 华为云ECS 高性能计算型 | h3.2xlarge.4 | 8vCPUs | 32GB | 超高IO SSD
由于资源紧张, DorisDB所在服务器上还部署了rc mysql, 但过年期间无人使用, 实际可用内存16G.
DorisDB: DorisDB-SE-1.12.1 3Fe 3Be, fe和be部署在一起
ClickHouse: 20.10.3.30, 三分片两副本混合部署, 部署方法详见ClickHouse集群多实例部署
注意
实际数据导入后DorsiDB和ClickHouse除lineorder_flat外数据无任何差异
lineorder_flat:
- DorsiDB: 546669614
- ClickHouse: 622259902
DorisDB
部署略
构建数据
首先下载ssb-poc工具包并编译
wget http://dorisdb-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-0.9.zip
unzip ssb-poc-0.9.zip
cd ssb-poc
make && make install
所有相关工具安装到output目录。
进入output目录,生成数据
cd output
bin/gen-ssb.sh 100 data_dir
建表
修改配置文件conf/doris.conf,指定脚本操作的Doris集群地址
# for mysql cmd
mysql_host: 192.168.1.1
mysql_port: 9030
mysql_user: root
mysql_password:
doris_db: ssb
# cluster ports
http_port: 8030
be_heartbeat_port: 9050
broker_port: 8000
...
执行脚本建表
bin/create_db_table.sh ddl_100
我这里建表跑建表脚本报错了, 改为手动建表, 参考http://doc.dorisdb.com/2146807
CREATE TABLE IF NOT EXISTS `lineorder` ( `lo_orderkey` int(11) NOT NULL COMMENT "", `lo_linenumber` int(11) NOT NULL COMMENT "", `lo_custkey` int(11) NOT NULL COMMENT "", `lo_partkey` int(11) NOT NULL COMMENT "", `lo_suppkey` int(11) NOT NULL COMMENT "", `lo_orderdate` int(11) NOT NULL COMMENT "", `lo_orderpriority` varchar(16) NOT NULL COMMENT "", `lo_shippriority` int(11) NOT NULL COMMENT "", `lo_quantity` int(11) NOT NULL COMMENT "", `lo_extendedprice` int(11) NOT NULL COMMENT "", `lo_ordtotalprice` int(11) NOT NULL COMMENT "", `lo_discount` int(11) NOT NULL COMMENT "", `lo_revenue` int(11) NOT NULL COMMENT "", `lo_supplycost` int(11) NOT NULL COMMENT "", `lo_tax` int(11) NOT NULL COMMENT "", `lo_commitdate` int(11) NOT NULL COMMENT "", `lo_shipmode` varchar(11) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`lo_orderkey`) COMMENT "OLAP" DISTRIBUTED BY HASH(`lo_orderkey`) BUCKETS 96 PROPERTIES ( "replication_num" = "1", "colocate_with" = "group1", "in_memory" = "false", "storage_format" = "DEFAULT" ); CREATE TABLE IF NOT EXISTS `customer` ( `c_custkey` int(11) NOT NULL COMMENT "", `c_name` varchar(26) NOT NULL COMMENT "", `c_address` varchar(41) NOT NULL COMMENT "", `c_city` varchar(11) NOT NULL COMMENT "", `c_nation` varchar(16) NOT NULL COMMENT "", `c_region` varchar(13) NOT NULL COMMENT "", `c_phone` varchar(16) NOT NULL COMMENT "", `c_mktsegment` varchar(11) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`c_custkey`) COMMENT "OLAP" DISTRIBUTED BY HASH(`c_custkey`) BUCKETS 12 PROPERTIES ( "replication_num" = "1", "colocate_with" = "groupa2", "in_memory" = "false", "storage_format" = "DEFAULT" ); CREATE TABLE IF NOT EXISTS `dates` ( `d_datekey` int(11) NOT NULL COMMENT "", `d_date` varchar(20) NOT NULL COMMENT "", `d_dayofweek` varchar(10) NOT NULL COMMENT "", `d_month` varchar(11) NOT NULL COMMENT "", `d_year` int(11) NOT NULL COMMENT "", `d_yearmonthnum` int(11) NOT NULL COMMENT "", `d_yearmonth` varchar(9) NOT NULL COMMENT "", `d_daynuminweek` int(11) NOT NULL COMMENT "", `d_daynuminmonth` int(11) NOT NULL COMMENT "", `d_daynuminyear` int(11) NOT NULL COMMENT "", `d_monthnuminyear` int(11) NOT NULL COMMENT "", `d_weeknuminyear` int(11) NOT NULL COMMENT "", `d_sellingseason` varchar(14) NOT NULL COMMENT "", `d_lastdayinweekfl` int(11) NOT NULL COMMENT "", `d_lastdayinmonthfl` int(11) NOT NULL COMMENT "", `d_holidayfl` int(11) NOT NULL COMMENT "", `d_weekdayfl` int(11) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`d_datekey`) COMMENT "OLAP" DISTRIBUTED BY HASH(`d_datekey`) BUCKETS 1 PROPERTIES ( "replication_num" = "1", "in_memory" = "false", "colocate_with" = "groupa3", "storage_format" = "DEFAULT" ); CREATE TABLE IF NOT EXISTS `supplier` ( `s_suppkey` int(11) NOT NULL COMMENT "", `s_name` varchar(26) NOT NULL COMMENT "", `s_address` varchar(26) NOT NULL COMMENT "", `s_city` varchar(11) NOT NULL COMMENT "", `s_nation` varchar(16) NOT NULL COMMENT "", `s_region` varchar(13) NOT NULL COMMENT "", `s_phone` varchar(16) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`s_suppkey`) COMMENT "OLAP" DISTRIBUTED BY HASH(`s_suppkey`) BUCKETS 12 PROPERTIES ( "replication_num" = "1", "colocate_with" = "groupa4", "in_memory" = "false", "storage_format" = "DEFAULT" ); CREATE TABLE IF NOT EXISTS `part` ( `p_partkey` int(11) NOT NULL COMMENT "", `p_name` varchar(23) NOT NULL COMMENT "", `p_mfgr` varchar(7) NOT NULL COMMENT "", `p_category` varchar(8) NOT NULL COMMENT "", `p_brand` varchar(10) NOT NULL COMMENT "", `p_color` varchar(12) NOT NULL COMMENT "", `p_type` varchar(26) NOT NULL COMMENT "", `p_size` int(11) NOT NULL COMMENT "", `p_container` varchar(11) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`p_partkey`) COMMENT "OLAP" DISTRIBUTED BY HASH(`p_partkey`) BUCKETS 12 PROPERTIES ( "replication_num" = "1", "colocate_with" = "groupa5", "in_memory" = "false", "storage_format" = "DEFAULT" ); CREATE TABLE IF NOT EXISTS `lineorder_flat` ( `LO_ORDERKEY` int(11) NOT NULL COMMENT "", `LO_ORDERDATE` date NOT NULL COMMENT "", `LO_LINENUMBER` tinyint(4) NOT NULL COMMENT "", `LO_CUSTKEY` int(11) NOT NULL COMMENT "", `LO_PARTKEY` int(11) NOT NULL COMMENT "", `LO_SUPPKEY` int(11) NOT NULL COMMENT "", `LO_ORDERPRIORITY` varchar(100) NOT NULL COMMENT "", `LO_SHIPPRIORITY` tinyint(4) NOT NULL COMMENT "", `LO_QUANTITY` tinyint(4) NOT NULL COMMENT "", `LO_EXTENDEDPRICE` int(11) NOT NULL COMMENT "", `LO_ORDTOTALPRICE` int(11) NOT NULL COMMENT "", `LO_DISCOUNT` tinyint(4) NOT NULL COMMENT "", `LO_REVENUE` int(11) NOT NULL COMMENT "", `LO_SUPPLYCOST` int(11) NOT NULL COMMENT "", `LO_TAX` tinyint(4) NOT NULL COMMENT "", `LO_COMMITDATE` date NOT NULL COMMENT "", `LO_SHIPMODE` varchar(100) NOT NULL COMMENT "", `C_NAME` varchar(100) NOT NULL COMMENT "", `C_ADDRESS` varchar(100) NOT NULL COMMENT "", `C_CITY` varchar(100) NOT NULL COMMENT "", `C_NATION` varchar(100) NOT NULL COMMENT "", `C_REGION` varchar(100) NOT NULL COMMENT "", `C_PHONE` varchar(100) NOT NULL COMMENT "", `C_MKTSEGMENT` varchar(100) NOT NULL COMMENT "", `S_NAME` varchar(100) NOT NULL COMMENT "", `S_ADDRESS` varchar(100) NOT NULL COMMENT "", `S_CITY` varchar(100) NOT NULL COMMENT "", `S_NATION` varchar(100) NOT NULL COMMENT "", `S_REGION` varchar(100) NOT NULL COMMENT "", `S_PHONE` varchar(100) NOT NULL COMMENT "", `P_NAME` varchar(100) NOT NULL COMMENT "", `P_MFGR` varchar(100) NOT NULL COMMENT "", `P_CATEGORY` varchar(100) NOT NULL COMMENT "", `P_BRAND` varchar(100) NOT NULL COMMENT "", `P_COLOR` varchar(100) NOT NULL COMMENT "", `P_TYPE` varchar(100) NOT NULL COMMENT "", `P_SIZE` tinyint(4) NOT NULL COMMENT "", `P_CONTAINER` varchar(100) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`LO_ORDERKEY`) COMMENT "OLAP" DISTRIBUTED BY HASH(`LO_ORDERKEY`) BUCKETS 192 PROPERTIES ( "replication_num" = "1", "colocate_with" = "groupxx1", "in_memory" = "false", "storage_format" = "DEFAULT" );
连接一个fe执行以上sql即可
导入数据
DorisDB测试中使用python脚本导入数据, 需要安装pymysql库
使用Stream load导入单表数据
(myrecover) [root@bj2-mysql-rc-drill-02 output]# time bin/stream_load.sh data_dir
stream load start. table: lineorder, path: data_dir/lineorder.tbl.1
stream load start. table: lineorder, path: data_dir/lineorder.tbl.2
stream load start. table: lineorder, path: data_dir/lineorder.tbl.3
stream load start. table: lineorder, path: data_dir/lineorder.tbl.4
stream load start. table: lineorder, path: data_dir/lineorder.tbl.5
stream load start. table: lineorder, path: data_dir/lineorder.tbl.6
stream load start. table: lineorder, path: data_dir/lineorder.tbl.7
stream load start. table: lineorder, path: data_dir/lineorder.tbl.8
stream load start. table: lineorder, path: data_dir/lineorder.tbl.9
stream load start. table: lineorder, path: data_dir/lineorder.tbl.10
...
stream load success. table: lineorder, path: data_dir/lineorder.tbl.100
stream load success. table: lineorder, path: data_dir/lineorder.tbl.97
stream load success. table: lineorder, path: data_dir/lineorder.tbl.99
stream load success. table: lineorder, path: data_dir/lineorder.tbl.93
stream load success. table: lineorder, path: data_dir/lineorder.tbl.98
stream load success. table: lineorder, path: data_dir/lineorder.tbl.94
stream load success. table: lineorder, path: data_dir/lineorder.tbl.92
stream load success. table: lineorder, path: data_dir/lineorder.tbl.95
stream load success. table: lineorder, path: data_dir/lineorder.tbl.91
stream load success. table: lineorder, path: data_dir/lineorder.tbl.96
stream load start. table: customer, path: data_dir/customer.tbl
stream load success. table: customer, path: data_dir/customer.tbl
stream load start. table: dates, path: data_dir/dates.tbl
stream load success. table: dates, path: data_dir/dates.tbl
stream load start. table: part, path: data_dir/part.tbl
stream load success. table: part, path: data_dir/part.tbl
stream load start. table: supplier, path: data_dir/supplier.tbl
stream load success. table: supplier, path: data_dir/supplier.tbl
real 82m31.323s
user 0m6.385s
sys 0m36.761s
(myrecover) [root@bj2-mysql-rc-drill-02 output]# time bin/flat_insert.sh
sql: ssb_flat_insert start
sql: ssb_flat_insert success
real 1m36.697s
user 0m0.078s
sys 0m0.022s
插入数据到宽表lineorder_flat
(myrecover) [root@bj2-mysql-rc-drill-02 output]# time bin/flat_insert.sh
sql: ssb_flat_insert start
sql: ssb_flat_insert success
real 2m37.530s
user 0m0.063s
sys 0m0.023s
有一个不理解的现象是, flat_insert.sh已经执行完毕, 但是查看lineorder_flat表行数时, 发现其值是在不断增大
mysql> select count(*) from lineorder_flat; +-----------+ | count(*) | +-----------+ | 182256332 | +-----------+ 1 row in set (0.10 sec) mysql> select count(*) from lineorder_flat; +-----------+ | count(*) | +-----------+ | 364316982 | +-----------+ 1 row in set (0.16 sec) mysql> select count(*) from lineorder_flat; +-----------+ | count(*) | +-----------+ | 546669614 | +-----------+ 1 row in set (0.41 sec)
可以看到DorisDB数据导入耗时约86分钟
最终数据
mysql> select count(*) from customer ;
+----------+
| count(*) |
+----------+
| 3000000 |
+----------+
1 row in set (0.02 sec)
mysql> select count(*) from dates ;
+----------+
| count(*) |
+----------+
| 2556 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from lineorder ;
+-----------+
| count(*) |
+-----------+
| 600037902 |
+-----------+
1 row in set (0.53 sec)
mysql> select count(*) from lineorder_flat ;
+-----------+
| count(*) |
+-----------+
| 546669614 |
+-----------+
1 row in set (0.40 sec)
mysql> select count(*) from part ;
+----------+
| count(*) |
+----------+
| 1400000 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from supplier ;
+----------+
| count(*) |
+----------+
| 200000 |
+----------+
1 row in set (0.01 sec)
数据占用空间
du -sh /data/DorisDB-SE-1.12.1/be/storage/data/
44G /data/DorisDB-SE-1.12.1/be/storage/data/
44*3 = 132G
单表查询测试
set global parallel_fragment_exec_instance_num = 4;
Q1.1
SELECT sum(lo_extendedprice * lo_discount) AS `revenue`
FROM lineorder_flat
WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31' AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;
+----------------+
| revenue |
+----------------+
| 44652567249651 |
+----------------+
1 row in set (0.35 sec)
1 row in set (0.29 sec)
Q1.2
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat
WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35;
+---------------+
| revenue |
+---------------+
| 9624332170119 |
+---------------+
1 row in set (0.27 sec)
1 row in set (0.19 sec)
Q1.3
SELECT sum(lo_extendedprice * lo_discount) AS revenue
FROM lineorder_flat
WHERE weekofyear(lo_orderdate) = 6 AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31'
AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35;
+---------------+
| revenue |
+---------------+
| 2611093671163 |
+---------------+
1 row in set (0.31 sec)
1 row in set (0.24 sec)
Q2.1
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
...
240 rows in set (0.41 sec)
240 rows in set (0.37 sec)
Q2.2
SELECT
sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
48 rows in set (0.78 sec)
48 rows in set (0.72 sec)
Q2.3
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE'
GROUP BY year, p_brand
ORDER BY year, p_brand;
6 rows in set (0.34 sec)
6 rows in set (0.28 sec)
Q3.1
SELECT c_nation, s_nation, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat
WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_nation, s_nation, year
ORDER BY year ASC, revenue DESC;
150 rows in set (1.56 sec)
150 rows in set (0.86 sec)
Q3.2
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
600 rows in set (1.08 sec)
600 rows in set (0.79 sec)
Q3.3
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ( 'UNITED KI1' ,'UNITED KI5') AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
24 rows in set (0.25 sec)
24 rows in set (0.29 sec)
Q3.4
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ( 'UNITED KI1', 'UNITED KI5') AND lo_orderdate >= '1997-12-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
4 rows in set (0.23 sec)
4 rows in set (0.26 sec)
Q4.1
在DorisDB测试文档中对该SQL执行前执行了
set vectorized_engine_enable = FALSE;
但实际发现执行上面语句后反而会慢很多
SELECT year(lo_orderdate) AS year, c_nation, sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, c_nation
ORDER BY year ASC, c_nation ASC;
30 rows in set (1 min 24.46 sec)
30 rows in set (1 min 25.10 sec)
set vectorized_engine_enable = TRUE;
SELECT year(lo_orderdate) AS year, c_nation, sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, c_nation
ORDER BY year ASC, c_nation ASC;
30 rows in set (0.87 sec)
30 rows in set (0.72 sec)
我比对了开启和关闭vectorized_engine_enable
后的查询结果, 发现是一样的, 这里我就不明白为啥要设置vectorized_engine_enable
为False
了
Q4.2
SELECT year(lo_orderdate) AS year,
s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, s_nation, p_category
ORDER BY year ASC, s_nation ASC, p_category ASC;
30 rows in set (0.72 sec)
50 rows in set (0.49 sec)
Q4.3
SELECT year(lo_orderdate) AS year, s_city, p_brand,
sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE s_nation = 'UNITED STATES' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_category = 'MFGR#14'
GROUP BY year, s_city, p_brand
ORDER BY year ASC, s_city ASC, p_brand ASC;
400 rows in set (0.51 sec)
400 rows in set (0.38 sec)
多表关联测试
执行
set global parallel_fragment_exec_instance_num = 8;
Q1.1
select sum(lo_revenue) as revenue
from lineorder join dates on lo_orderdate = d_datekey
where d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25;
1 row in set (0.45 sec)
1 row in set (0.49 sec)
Q1.2
select sum(lo_revenue) as revenue
from lineorder
join dates on lo_orderdate = d_datekey
where d_yearmonthnum = 199401
and lo_discount between 4 and 6
and lo_quantity between 26 and 35;
1 row in set (0.41 sec)
1 row in set (0.45 sec)
Q1.3
select sum(lo_revenue) as revenue
from lineorder
join dates on lo_orderdate = d_datekey
where d_weeknuminyear = 6 and d_year = 1994
and lo_discount between 5 and 7
and lo_quantity between 26 and 35;
1 row in set (0.51 sec)
1 row in set (0.34 sec)
Q2.1
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
inner join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_category = 'MFGR#12' and s_region = 'AMERICA'
group by d_year, p_brand
order by d_year, p_brand;
280 rows in set (1.56 sec)
280 rows in set (1.60 sec)
Q2.2
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA'
group by d_year, p_brand
order by d_year, p_brand;
56 rows in set (1.69 sec)
56 rows in set (1.06 sec)
Q2.3
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_brand = 'MFGR#2239' and s_region = 'EUROPE'
group by d_year, p_brand
order by d_year, p_brand;
7 rows in set (0.78 sec)
7 rows in set (1.15 sec)
Q3.1
select c_nation, s_nation, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where c_region = 'ASIA' and s_region = 'ASIA'and d_year >= 1992 and d_year <= 1997
group by c_nation, s_nation, d_year
order by d_year asc, lo_revenue desc;
150 rows in set (3.48 sec)
150 rows in set (3.70 sec)
Q3.2
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES'
and d_year >= 1992 and d_year <= 1997
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
600 rows in set (1.32 sec)
600 rows in set (1.85 sec)
Q3.3
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where (c_city='UNITED KI1' or c_city='UNITED KI5')
and (s_city='UNITED KI1' or s_city='UNITED KI5')
and d_year >= 1992 and d_year <= 1997
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
24 rows in set (1.03 sec)
24 rows in set (1.04 sec)
Q3.4
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth
= 'Dec1997'
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
4 rows in set (1.33 sec)
4 rows in set (1.17 sec)
Q4.1
select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by d_year, c_nation
order by d_year, c_nation;
35 rows in set (3.48 sec)
35 rows in set (3.75 sec)
Q4.2
select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA'and s_region = 'AMERICA'
and (d_year = 1997 or d_year = 1998)
and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by d_year, s_nation, p_category
order by d_year, s_nation, p_category;
100 rows in set (2.83 sec)
100 rows in set (3.17 sec)
Q4.3
select d_year, s_city, p_brand, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA'and s_nation = 'UNITED STATES'
and (d_year = 1997 or d_year = 1998)
and p_category = 'MFGR#14'
group by d_year, s_city, p_brand
order by d_year, s_city, p_brand;
800 rows in set (1.56 sec)
800 rows in set (2.14 sec)
ClickHouse
构建数据
$ git clone https://github.com/vadimtk/ssb-dbgen.git
$ cd ssb-dbgen
$ make
$ ./dbgen -s 100 -T c
$ ./dbgen -s 100 -T l
$ ./dbgen -s 100 -T p
$ ./dbgen -s 100 -T s
$ ./dbgen -s 100 -T d
建表
CREATE DATABASE ssb on cluster ck_cluster;
CREATE TABLE ssb.customer_local on cluster ck_cluster
(
C_CUSTKEY UInt32,
C_NAME String,
C_ADDRESS String,
C_CITY LowCardinality(String),
C_NATION LowCardinality(String),
C_REGION LowCardinality(String),
C_PHONE String,
C_MKTSEGMENT LowCardinality(String)
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/ssb/tables/{layer}-{shard}/customer',
'{replica}'
) ORDER BY (C_CUSTKEY) SETTINGS index_granularity = 8192;
CREATE TABLE ssb.customer on cluster ck_cluster AS ssb.customer_local ENGINE = Distributed(
ck_cluster,
ssb,
customer_local,
rand()
);
CREATE TABLE ssb.lineorder_local on cluster ck_cluster
(
LO_ORDERKEY UInt32,
LO_LINENUMBER UInt8,
LO_CUSTKEY UInt32,
LO_PARTKEY UInt32,
LO_SUPPKEY UInt32,
LO_ORDERDATE Date,
LO_ORDERPRIORITY LowCardinality(String),
LO_SHIPPRIORITY UInt8,
LO_QUANTITY UInt8,
LO_EXTENDEDPRICE UInt32,
LO_ORDTOTALPRICE UInt32,
LO_DISCOUNT UInt8,
LO_REVENUE UInt32,
LO_SUPPLYCOST UInt32,
LO_TAX UInt8,
LO_COMMITDATE Date,
LO_SHIPMODE LowCardinality(String)
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/ssb/tables/{layer}-{shard}/lineorder',
'{replica}'
) PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY)
SETTINGS index_granularity = 8192;
CREATE TABLE ssb.lineorder on cluster ck_cluster AS ssb.lineorder_local ENGINE = Distributed(
ck_cluster,
ssb,
lineorder_local,
rand()
);
CREATE TABLE ssb.part_local on cluster ck_cluster
(
P_PARTKEY UInt32,
P_NAME String,
P_MFGR LowCardinality(String),
P_CATEGORY LowCardinality(String),
P_BRAND LowCardinality(String),
P_COLOR LowCardinality(String),
P_TYPE LowCardinality(String),
P_SIZE UInt8,
P_CONTAINER LowCardinality(String)
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/ssb/tables/{layer}-{shard}/part',
'{replica}'
) ORDER BY P_PARTKEY SETTINGS index_granularity = 8192;
CREATE TABLE ssb.part on cluster ck_cluster AS ssb.part_local ENGINE = Distributed(
ck_cluster,
ssb,
part_local,
rand()
);
CREATE TABLE ssb.supplier_local on cluster ck_cluster
(
S_SUPPKEY UInt32,
S_NAME String,
S_ADDRESS String,
S_CITY LowCardinality(String),
S_NATION LowCardinality(String),
S_REGION LowCardinality(String),
S_PHONE String
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/ssb/tables/{layer}-{shard}/supplier',
'{replica}'
) ORDER BY S_SUPPKEY SETTINGS index_granularity = 8192;
CREATE TABLE ssb.supplier on cluster ck_cluster AS ssb.supplier_local ENGINE = Distributed(
ck_cluster,
ssb,
supplier_local,
rand()
);
CREATE TABLE ssb.dates_local on cluster ck_cluster
(
D_DATEKEY UInt32,
D_DATE String,
D_DAYOFWEEK String,
D_MONTH String,
D_YEAR UInt32,
D_YEARMONTHNUM UInt32,
D_YEARMONTH String,
D_DAYNUMINWEEK UInt32,
D_DAYNUMINMONTH UInt32,
D_DAYNUMINYEAR UInt32,
D_MONTHNUMINYEAR UInt32,
D_WEEKNUMINYEAR UInt32,
D_SELLINGSEASON String,
D_LASTDAYINWEEKFL UInt32,
D_LASTDAYINMONTHFL UInt32,
D_HOLIDAYFL UInt32,
D_WEEKDAYFL UInt32
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/ssb/tables/{layer}-{shard}/dates',
'{replica}'
) ORDER BY D_DATEKEY SETTINGS index_granularity = 8192;
CREATE TABLE ssb.dates on cluster ck_cluster AS ssb.dates_local ENGINE = Distributed(
ck_cluster,
ssb,
dates_local,
rand()
);
导入数据
cd ssb-dbgen
clickhouse-client --database=ssb --query "INSERT INTO customer FORMAT CSV" < customer.tbl
clickhouse-client --database=ssb --query "INSERT INTO part FORMAT CSV" < part.tbl
clickhouse-client --database=ssb --query "INSERT INTO supplier FORMAT CSV" < supplier.tbl
clickhouse-client --database=ssb --query "INSERT INTO lineorder FORMAT CSV" < lineorder.tbl
clickhouse-client --database=ssb --query "INSERT INTO dates FORMAT CSV" < date.tbl
以上用时不到600s
Converting “star schema” to denormalized “flat schema”:
在一个节点执行
set max_bytes_before_external_group_by=2000000000;
set max_memory_usage=4000000000;
CREATE TABLE lineorder_flat_tmp
ENGINE = MergeTree
PARTITION BY toYear(LO_ORDERDATE)
ORDER BY (LO_ORDERDATE, LO_ORDERKEY) AS
SELECT
l.LO_ORDERKEY AS LO_ORDERKEY,
l.LO_LINENUMBER AS LO_LINENUMBER,
l.LO_CUSTKEY AS LO_CUSTKEY,
l.LO_PARTKEY AS LO_PARTKEY,
l.LO_SUPPKEY AS LO_SUPPKEY,
l.LO_ORDERDATE AS LO_ORDERDATE,
l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,
l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,
l.LO_QUANTITY AS LO_QUANTITY,
l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,
l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,
l.LO_DISCOUNT AS LO_DISCOUNT,
l.LO_REVENUE AS LO_REVENUE,
l.LO_SUPPLYCOST AS LO_SUPPLYCOST,
l.LO_TAX AS LO_TAX,
l.LO_COMMITDATE AS LO_COMMITDATE,
l.LO_SHIPMODE AS LO_SHIPMODE,
c.C_NAME AS C_NAME,
c.C_ADDRESS AS C_ADDRESS,
c.C_CITY AS C_CITY,
c.C_NATION AS C_NATION,
c.C_REGION AS C_REGION,
c.C_PHONE AS C_PHONE,
c.C_MKTSEGMENT AS C_MKTSEGMENT,
s.S_NAME AS S_NAME,
s.S_ADDRESS AS S_ADDRESS,
s.S_CITY AS S_CITY,
s.S_NATION AS S_NATION,
s.S_REGION AS S_REGION,
s.S_PHONE AS S_PHONE,
p.P_NAME AS P_NAME,
p.P_MFGR AS P_MFGR,
p.P_CATEGORY AS P_CATEGORY,
p.P_BRAND AS P_BRAND,
p.P_COLOR AS P_COLOR,
p.P_TYPE AS P_TYPE,
p.P_SIZE AS P_SIZE,
p.P_CONTAINER AS P_CONTAINER
FROM lineorder AS l
INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY
INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY
INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY;
0 rows in set. Elapsed: 2086.025 sec. Processed 604.64 million rows, 26.12 GB (289.85 thousand rows/s., 12.52 MB/s.)
bj2-all-clickhouse-test-01 :) select count(*) from lineorder_flat;
SELECT count(*)
FROM lineorder_flat
┌───count()─┐
│ 600037902 │
└───────────┘
1 rows in set. Elapsed: 0.006 sec.
CREATE TABLE ssb.lineorder_flat_local on cluster ck_cluster
AS ssb.lineorder_flat_tmp
ENGINE = ReplicatedMergeTree(
'/clickhouse/ssb/tables/{layer}-{shard}/lineorder_flat',
'{replica}'
)
PARTITION BY toYear(LO_ORDERDATE)
ORDER BY (LO_ORDERDATE, LO_ORDERKEY)
CREATE TABLE ssb.lineorder_flat on cluster ck_cluster AS ssb.lineorder_flat_local ENGINE = Distributed(
ck_cluster,
ssb,
lineorder_flat_local,
rand()
);
在三个分片执行:
INSERT INTO ssb.lineorder_flat SELECT * from ssb_local.lineorder_flat_tmp;
Ok.
0 rows in set. Elapsed: 747.548 sec. Processed 600.04 million rows, 140.40 GB (802.67 thousand rows/s., 187.82 MB/s.)
clickhouse数据导入用时总计约3500秒
最终数据
bj2-all-clickhouse-test-01 :) select count(*) from customer ;
SELECT count(*)
FROM customer
┌─count()─┐
│ 3000000 │
└─────────┘
1 rows in set. Elapsed: 0.004 sec.
bj2-all-clickhouse-test-01 :) select count(*) from dates ;
SELECT count(*)
FROM dates
┌─count()─┐
│ 2556 │
└─────────┘
1 rows in set. Elapsed: 0.004 sec.
bj2-all-clickhouse-test-01 :) select count(*) from lineorder ;
SELECT count(*)
FROM lineorder
┌───count()─┐
│ 600037902 │
└───────────┘
1 rows in set. Elapsed: 0.003 sec.
bj2-all-clickhouse-test-01 :) select count(*) from lineorder_flat ;
SELECT count(*)
FROM lineorder_flat
┌───count()─┐
│ 622259902 │
└───────────┘
1 rows in set. Elapsed: 0.003 sec.
bj2-all-clickhouse-test-01 :) select count(*) from part ;
SELECT count(*)
FROM part
┌─count()─┐
│ 1400000 │
└─────────┘
1 rows in set. Elapsed: 0.003 sec.
bj2-all-clickhouse-test-01 :) select count(*) from supplier ;
SELECT count(*)
FROM supplier
┌─count()─┐
│ 200000 │
└─────────┘
1 rows in set. Elapsed: 0.004 sec.
数据占用空间
[root@bj2-all-clickhouse-test-01 ssb]# ll
total 48
lrwxrwxrwx 1 root root 69 Feb 9 10:51 customer -> /data/clickhouse/node1/store/a61/a61a9f88-8bbb-4864-bd0a-1001f5ffcc1c
lrwxrwxrwx 1 root root 69 Feb 9 10:51 customer_local -> /data/clickhouse/node1/store/c86/c86c8026-f804-4f93-9a2b-051d0fbc1cc9
lrwxrwxrwx 1 root root 69 Feb 9 12:55 dates -> /data/clickhouse/node1/store/95d/95db2800-91b5-44f3-b378-727bc80d25bc
lrwxrwxrwx 1 root root 69 Feb 9 12:55 dates_local -> /data/clickhouse/node1/store/137/1371f45a-88bc-4d14-9f04-07895fd561ba
lrwxrwxrwx 1 root root 69 Feb 9 11:33 lineorder -> /data/clickhouse/node1/store/8f4/8f40504f-4e94-4d70-a1de-dbcb6d25d616
lrwxrwxrwx 1 root root 69 Feb 9 11:54 lineorder_flat -> /data/clickhouse/node1/store/9fd/9fdb7d2c-d2c2-49a0-ad7e-380fc76dff73
lrwxrwxrwx 1 root root 69 Feb 9 11:49 lineorder_flat_local -> /data/clickhouse/node1/store/883/883a4a97-f34b-491b-a305-398bd717cfb9
lrwxrwxrwx 1 root root 69 Feb 9 10:52 lineorder_local -> /data/clickhouse/node1/store/2d6/2d67810e-dfbc-438d-b7aa-6cbfee4c391f
lrwxrwxrwx 1 root root 69 Feb 9 11:36 part -> /data/clickhouse/node1/store/45b/45b4a779-a2df-4048-a266-efc1481fce68
lrwxrwxrwx 1 root root 69 Feb 9 10:53 part_local -> /data/clickhouse/node1/store/83f/83f3ac5d-296b-424f-900e-f13d67d044ae
lrwxrwxrwx 1 root root 69 Feb 9 11:36 supplier -> /data/clickhouse/node1/store/6a0/6a0b538a-8f79-4d24-bae0-82792fe8af19
lrwxrwxrwx 1 root root 69 Feb 9 10:54 supplier_local -> /data/clickhouse/node1/store/5e2/5e29e02d-e35b-4660-a5b9-1bfa1d9fb97d
[root@bj2-all-clickhouse-test-01 ssb]# ll |awk -F'->' '{print $2}'|xargs du -sh
20K /data/clickhouse/node1/store/a61/a61a9f88-8bbb-4864-bd0a-1001f5ffcc1c
39M /data/clickhouse/node1/store/c86/c86c8026-f804-4f93-9a2b-051d0fbc1cc9
20K /data/clickhouse/node1/store/95d/95db2800-91b5-44f3-b378-727bc80d25bc
64K /data/clickhouse/node1/store/137/1371f45a-88bc-4d14-9f04-07895fd561ba
20K /data/clickhouse/node1/store/8f4/8f40504f-4e94-4d70-a1de-dbcb6d25d616
20K /data/clickhouse/node1/store/9fd/9fdb7d2c-d2c2-49a0-ad7e-380fc76dff73
22G /data/clickhouse/node1/store/883/883a4a97-f34b-491b-a305-398bd717cfb9
6.4G /data/clickhouse/node1/store/2d6/2d67810e-dfbc-438d-b7aa-6cbfee4c391f
20K /data/clickhouse/node1/store/45b/45b4a779-a2df-4048-a266-efc1481fce68
8.3M /data/clickhouse/node1/store/83f/83f3ac5d-296b-424f-900e-f13d67d044ae
20K /data/clickhouse/node1/store/6a0/6a0b538a-8f79-4d24-bae0-82792fe8af19
2.6M /data/clickhouse/node1/store/5e2/5e29e02d-e35b-4660-a5b9-1bfa1d9fb97d
约 28.4G*3=85.2G
单表查询测试
执行
set max_threads=4;
SELECT *
FROM system.settings
WHERE name = 'max_threads'
┌─name────────┬─value─┬─changed─┬─description───────────────────────────────────────────────────────────────────────────────────────┬─min──┬─max──┬─readonly─┬─type───────┐
│ max_threads │ 4 │ 1 │ The maximum number of threads to execute the request. By default, it is determined automatically. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ MaxThreads │
└─────────────┴───────┴─────────┴───────────────────────────────────────────────────────────────────────────────────────────────────┴──────┴──────┴──────────┴────────────┘
Q1.1
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)
┌────────revenue─┐
│ 46310268722641 │
└────────────────┘
1 rows in set. Elapsed: 0.226 sec. Processed 94.38 million rows, 755.01 MB (417.85 million rows/s., 3.34 GB/s.)
1 rows in set. Elapsed: 0.195 sec. Processed 94.38 million rows, 755.01 MB (484.98 million rows/s., 3.88 GB/s.)
Q1.2
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE (toYYYYMM(LO_ORDERDATE) = 199401) AND ((LO_DISCOUNT >= 4) AND (LO_DISCOUNT <= 6)) AND ((LO_QUANTITY >= 26) AND (LO_QUANTITY <= 35))
┌───────revenue─┐
│ 9979491062883 │
└───────────────┘
1 rows in set. Elapsed: 0.034 sec. Processed 8.07 million rows, 64.55 MB (239.67 million rows/s., 1.92 GB/s.)
1 rows in set. Elapsed: 0.063 sec. Processed 8.07 million rows, 64.55 MB (127.89 million rows/s., 1.02 GB/s.)
Q1.3
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE (toISOWeek(LO_ORDERDATE) = 6) AND (toYear(LO_ORDERDATE) = 1994) AND ((LO_DISCOUNT >= 5) AND (LO_DISCOUNT <= 7)) AND ((LO_QUANTITY >= 26) AND (LO_QUANTITY <= 35))
┌───────revenue─┐
│ 2709633167278 │
└───────────────┘
1 rows in set. Elapsed: 0.043 sec. Processed 2.03 million rows, 16.23 MB (46.78 million rows/s., 374.26 MB/s.)
1 rows in set. Elapsed: 0.022 sec. Processed 2.03 million rows, 16.23 MB (91.09 million rows/s., 728.68 MB/s.)
Q2.1
SELECT
sum(LO_REVENUE),
toYear(LO_ORDERDATE) AS year,
P_BRAND
FROM lineorder_flat
WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA'
GROUP BY
year,
P_BRAND
ORDER BY
year,
P_BRAND;
...
280 rows in set. Elapsed: 1.723 sec. Processed 622.26 million rows, 6.42 GB (361.17 million rows/s., 3.73 GB/s.)
280 rows in set. Elapsed: 1.791 sec. Processed 622.26 million rows, 6.42 GB (347.43 million rows/s., 3.59 GB/s.)
Q2.2
SELECT
sum(LO_REVENUE),
toYear(LO_ORDERDATE) AS year,
P_BRAND
FROM lineorder_flat
WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA'
GROUP BY
year,
P_BRAND
ORDER BY
year,
P_BRAND;
...
56 rows in set. Elapsed: 1.463 sec. Processed 622.26 million rows, 5.80 GB (425.29 million rows/s., 3.96 GB/s.)
56 rows in set. Elapsed: 1.470 sec. Processed 622.26 million rows, 5.80 GB (423.20 million rows/s., 3.94 GB/s.)
Q2.3
SELECT
sum(LO_REVENUE),
toYear(LO_ORDERDATE) AS year,
P_BRAND
FROM lineorder_flat
WHERE (P_BRAND = 'MFGR#2239') AND (S_REGION = 'EUROPE')
GROUP BY
year,
P_BRAND
ORDER BY
year ASC,
P_BRAND ASC
┌─sum(LO_REVENUE)─┬─year─┬─P_BRAND───┐
│ 68153063027 │ 1992 │ MFGR#2239 │
│ 67009399598 │ 1993 │ MFGR#2239 │
│ 67185675813 │ 1994 │ MFGR#2239 │
│ 67988181065 │ 1995 │ MFGR#2239 │
│ 67265572303 │ 1996 │ MFGR#2239 │
│ 66922128523 │ 1997 │ MFGR#2239 │
│ 38630664245 │ 1998 │ MFGR#2239 │
└─────────────────┴──────┴───────────┘
7 rows in set. Elapsed: 0.659 sec. Processed 622.26 million rows, 5.80 GB (943.74 million rows/s., 8.79 GB/s.)
7 rows in set. Elapsed: 1.337 sec. Processed 622.26 million rows, 5.80 GB (465.51 million rows/s., 4.34 GB/s.)
Q3.1
SELECT
C_NATION,
S_NATION,
toYear(LO_ORDERDATE) AS year,
sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997
GROUP BY
C_NATION,
S_NATION,
year
ORDER BY
year ASC,
revenue DESC;
...
150 rows in set. Elapsed: 2.488 sec. Processed 566.91 million rows, 5.68 GB (227.84 million rows/s., 2.28 GB/s.)
150 rows in set. Elapsed: 1.254 sec. Processed 566.91 million rows, 5.68 GB (452.10 million rows/s., 4.53 GB/s.)
Q3.2
SELECT
C_CITY,
S_CITY,
toYear(LO_ORDERDATE) AS year,
sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997
GROUP BY
C_CITY,
S_CITY,
year
ORDER BY
year ASC,
revenue DESC;
...
600 rows in set. Elapsed: 1.272 sec. Processed 566.91 million rows, 5.77 GB (445.80 million rows/s., 4.54 GB/s.)
600 rows in set. Elapsed: 0.966 sec. Processed 566.91 million rows, 5.77 GB (587.07 million rows/s., 5.98 GB/s.)
Q3.3
SELECT
C_CITY,
S_CITY,
toYear(LO_ORDERDATE) AS year,
sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE ((C_CITY = 'UNITED KI1') OR (C_CITY = 'UNITED KI5')) AND ((S_CITY = 'UNITED KI1') OR (S_CITY = 'UNITED KI5')) AND (year >= 1992) AND (year <= 1997)
GROUP BY
C_CITY,
S_CITY,
year
ORDER BY
year ASC,
revenue DESC
...
24 rows in set. Elapsed: 0.979 sec. Processed 566.91 million rows, 4.63 GB (579.03 million rows/s., 4.73 GB/s.)
24 rows in set. Elapsed: 0.889 sec. Processed 566.91 million rows, 4.63 GB (637.83 million rows/s., 5.21 GB/s.)
Q3.4
SELECT
C_CITY,
S_CITY,
toYear(LO_ORDERDATE) AS year,
sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE ((C_CITY = 'UNITED KI1') OR (C_CITY = 'UNITED KI5')) AND ((S_CITY = 'UNITED KI1') OR (S_CITY = 'UNITED KI5')) AND (toYYYYMM(LO_ORDERDATE) = 199712)
GROUP BY
C_CITY,
S_CITY,
year
ORDER BY
year ASC,
revenue DESC
┌─C_CITY─────┬─S_CITY─────┬─year─┬───revenue─┐
│ UNITED KI1 │ UNITED KI1 │ 1997 │ 495955110 │
│ UNITED KI5 │ UNITED KI5 │ 1997 │ 421714882 │
│ UNITED KI5 │ UNITED KI1 │ 1997 │ 387387637 │
│ UNITED KI1 │ UNITED KI5 │ 1997 │ 380708672 │
└────────────┴────────────┴──────┴───────────┘
4 rows in set. Elapsed: 0.036 sec. Processed 8.09 million rows, 66.07 MB (221.65 million rows/s., 1.81 GB/s.)
4 rows in set. Elapsed: 0.020 sec. Processed 8.09 million rows, 66.07 MB (402.92 million rows/s., 3.29 GB/s.)
Q4.1
SELECT
toYear(LO_ORDERDATE) AS year,
C_NATION,
sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE (C_REGION = 'AMERICA') AND (S_REGION = 'AMERICA') AND ((P_MFGR = 'MFGR#1') OR (P_MFGR = 'MFGR#2'))
GROUP BY
year,
C_NATION
ORDER BY
year ASC,
C_NATION ASC
...
35 rows in set. Elapsed: 5.067 sec. Processed 622.26 million rows, 8.72 GB (122.80 million rows/s., 1.72 GB/s.)
35 rows in set. Elapsed: 2.791 sec. Processed 622.26 million rows, 8.72 GB (222.97 million rows/s., 3.12 GB/s.)
Q4.2
SELECT
toYear(LO_ORDERDATE) AS year,
S_NATION,
P_CATEGORY,
sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')
GROUP BY
year,
S_NATION,
P_CATEGORY
ORDER BY
year ASC,
S_NATION ASC,
P_CATEGORY ASC;
...
100 rows in set. Elapsed: 0.804 sec. Processed 149.77 million rows, 2.25 GB (186.21 million rows/s., 2.80 GB/s.)
100 rows in set. Elapsed: 0.752 sec. Processed 149.77 million rows, 2.25 GB (199.13 million rows/s., 2.99 GB/s.)
Q4.3
SELECT
toYear(LO_ORDERDATE) AS year,
S_CITY,
P_BRAND,
sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14'
GROUP BY
year,
S_CITY,
P_BRAND
ORDER BY
year ASC,
S_CITY ASC,
P_BRAND ASC;
...
800 rows in set. Elapsed: 0.561 sec. Processed 149.77 million rows, 2.32 GB (266.84 million rows/s., 4.13 GB/s.)
800 rows in set. Elapsed: 0.483 sec. Processed 149.77 million rows, 2.32 GB (309.80 million rows/s., 4.80 GB/s.)
多表关联测试
在DorisDB vs Clickhouse SSB性能测试对比报告, 不知道是不是疏漏了, Doris在测试时设置了
set global parallel_fragment_exec_instance_num = 8;
但没有写有没有在clickhouse测试时设置SETTINGS max_threads=8;
由于我的环境只有四个4 cpu, 所以我这里设置为4.
通过以下命令可知, lineorder每个分片数据文件大概6.4G(压缩后), 我的环境内存只有8G, 不确定能否都缓存了
这里还需要注意两个问题:
- clickhouse分布式表相比单表关联更加不友好, JOIN要改为GLOBAL JOIN
- JOIN操作时一定要把数据量小的表放在右边(这会导致SQL改写更加麻烦)
所以我对DorisDB vs Clickhouse SSB性能测试对比报告中的多表关联测试SQL进行了微调
Q1.1
SELECT SUM(LO_REVENUE) AS REVENUE
FROM lineorder
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (D_YEAR = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)
SETTINGS max_threads = 4
┌────────REVENUE─┐
│ 21881848590256 │
└────────────────┘
1 rows in set. Elapsed: 1.496 sec. Processed 600.04 million rows, 4.80 GB (401.10 million rows/s., 3.21 GB/s.)
1 rows in set. Elapsed: 1.424 sec. Processed 600.04 million rows, 4.80 GB (421.33 million rows/s., 3.37 GB/s.)
Q1.2
SELECT SUM(LO_REVENUE) AS REVENUE
FROM lineorder
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (D_YEARMONTHNUM = 199401) AND ((LO_DISCOUNT >= 4) AND (LO_DISCOUNT <= 6)) AND ((LO_QUANTITY >= 26) AND (LO_QUANTITY <= 35))
SETTINGS max_threads = 4
┌───────REVENUE─┐
│ 1829705342413 │
└───────────────┘
1 rows in set. Elapsed: 1.366 sec. Processed 600.04 million rows, 4.80 GB (439.39 million rows/s., 3.52 GB/s.)
1 rows in set. Elapsed: 0.659 sec. Processed 600.04 million rows, 4.80 GB (910.87 million rows/s., 7.29 GB/s.)
Q1.3
SELECT SUM(LO_REVENUE) AS REVENUE
FROM lineorder
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (D_WEEKNUMINYEAR = 6) AND (D_YEAR = 1994) AND ((LO_DISCOUNT >= 5) AND (LO_DISCOUNT <= 7)) AND ((LO_QUANTITY >= 26) AND (LO_QUANTITY <= 35))
SETTINGS max_threads = 4
┌──────REVENUE─┐
│ 407995993835 │
└──────────────┘
1 rows in set. Elapsed: 0.678 sec. Processed 600.04 million rows, 4.80 GB (885.40 million rows/s., 7.08 GB/s.)
1 rows in set. Elapsed: 1.377 sec. Processed 600.04 million rows, 4.80 GB (435.84 million rows/s., 3.49 GB/s.)
Q2.1
SELECT SUM(LO_REVENUE) AS LO_REVENUE, D_YEAR, P_BRAND
FROM lineorder
GLOBAL JOIN part ON LO_PARTKEY = P_PARTKEY
GLOBAL JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY),'(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA'
GROUP BY D_YEAR, P_BRAND
ORDER BY D_YEAR, P_BRAND
SETTINGS max_threads = 4;
...
280 rows in set. Elapsed: 4.367 sec. Processed 601.81 million rows, 8.45 GB (137.81 million rows/s., 1.94 GB/s.)
280 rows in set. Elapsed: 2.667 sec. Processed 601.81 million rows, 8.45 GB (225.69 million rows/s., 3.17 GB/s.)
Q2.2
SELECT SUM(LO_REVENUE) AS LO_REVENUE, D_YEAR, P_BRAND
FROM lineorder
GLOBAL JOIN part ON LO_PARTKEY = P_PARTKEY
GLOBAL JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY),'(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE P_BRAND BETWEEN 'MFGR#2221' AND 'MFGR#2228' AND S_REGION = 'ASIA'
GROUP BY D_YEAR, P_BRAND
ORDER BY D_YEAR, P_BRAND
SETTINGS max_threads = 4;
...
56 rows in set. Elapsed: 4.498 sec. Processed 601.67 million rows, 8.45 GB (133.78 million rows/s., 1.88 GB/s.)
56 rows in set. Elapsed: 1.554 sec. Processed 601.67 million rows, 8.45 GB (387.23 million rows/s., 5.44 GB/s.)
Q2.3
SELECT
SUM(LO_REVENUE) AS LO_REVENUE,
D_YEAR,
P_BRAND
FROM lineorder
GLOBAL INNER JOIN part ON LO_PARTKEY = P_PARTKEY
GLOBAL INNER JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (P_BRAND = 'MFGR#2239') AND (S_REGION = 'EUROPE')
GROUP BY
D_YEAR,
P_BRAND
ORDER BY
D_YEAR ASC,
P_BRAND ASC
SETTINGS max_threads = 4
┌──LO_REVENUE─┬─D_YEAR─┬─P_BRAND───┐
│ 65751589723 │ 1992 │ MFGR#2239 │
│ 64532844801 │ 1993 │ MFGR#2239 │
│ 64722599002 │ 1994 │ MFGR#2239 │
│ 65616432683 │ 1995 │ MFGR#2239 │
│ 64802884686 │ 1996 │ MFGR#2239 │
│ 64485541165 │ 1997 │ MFGR#2239 │
│ 37276536361 │ 1998 │ MFGR#2239 │
└─────────────┴────────┴───────────┘
7 rows in set. Elapsed: 2.569 sec. Processed 601.64 million rows, 8.45 GB (234.18 million rows/s., 3.29 GB/s.)
7 rows in set. Elapsed: 2.577 sec. Processed 601.64 million rows, 8.45 GB (233.47 million rows/s., 3.28 GB/s.)
Q3.1
SELECT C_NATION, S_NATION, D_YEAR, SUM(LO_REVENUE) AS LO_REVENUE
FROM lineorder
GLOBAL JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY),'(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA'AND D_YEAR >= 1992 AND D_YEAR <= 1997
GROUP BY C_NATION, S_NATION, D_YEAR
ORDER BY D_YEAR ASC, LO_REVENUE DESC
SETTINGS max_threads = 4;
...
150 rows in set. Elapsed: 10.190 sec. Processed 605.04 million rows, 8.66 GB (59.38 million rows/s., 850.20 MB/s.)
150 rows in set. Elapsed: 12.960 sec. Processed 605.04 million rows, 8.66 GB (46.69 million rows/s., 668.51 MB/s.)
Q3.2
SELECT C_CITY, S_CITY, D_YEAR, SUM(LO_REVENUE) AS LO_REVENUE
FROM lineorder
GLOBAL JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY),'(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES'
AND D_YEAR >= 1992 AND D_YEAR <= 1997
GROUP BY C_CITY, S_CITY, D_YEAR
ORDER BY D_YEAR ASC, LO_REVENUE DESC
SETTINGS max_threads = 4;
...
600 rows in set. Elapsed: 5.926 sec. Processed 603.60 million rows, 8.66 GB (101.85 million rows/s., 1.46 GB/s.)
600 rows in set. Elapsed: 5.743 sec. Processed 603.60 million rows, 8.66 GB (105.10 million rows/s., 1.51 GB/s.)
Q3.3
SELECT
C_CITY,
S_CITY,
D_YEAR,
SUM(LO_REVENUE) AS LO_REVENUE
FROM lineorder
GLOBAL INNER JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL INNER JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE ((C_CITY = 'UNITED KI1') OR (C_CITY = 'UNITED KI5')) AND ((S_CITY = 'UNITED KI1') OR (S_CITY = 'UNITED KI5')) AND (D_YEAR >= 1992) AND (D_YEAR <= 1997)
GROUP BY
C_CITY,
S_CITY,
D_YEAR
ORDER BY
D_YEAR ASC,
LO_REVENUE DESC
SETTINGS max_threads = 4
...
24 rows in set. Elapsed: 3.445 sec. Processed 603.31 million rows, 8.65 GB (175.11 million rows/s., 2.51 GB/s.)
24 rows in set. Elapsed: 3.300 sec. Processed 603.31 million rows, 8.65 GB (182.79 million rows/s., 2.62 GB/s.)
Q3.4
SELECT
C_CITY,
S_CITY,
D_YEAR,
SUM(LO_REVENUE) AS LO_REVENUE
FROM lineorder
GLOBAL INNER JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL INNER JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE ((C_CITY = 'UNITED KI1') OR (C_CITY = 'UNITED KI5')) AND ((S_CITY = 'UNITED KI1') OR (S_CITY = 'UNITED KI5')) AND (D_YEARMONTH = 'Dec1997')
GROUP BY
C_CITY,
S_CITY,
D_YEAR
ORDER BY
D_YEAR ASC,
LO_REVENUE DESC
SETTINGS max_threads = 4
┌─C_CITY─────┬─S_CITY─────┬─D_YEAR─┬─LO_REVENUE─┐
│ UNITED KI1 │ UNITED KI1 │ 1997 │ 481119563 │
│ UNITED KI5 │ UNITED KI5 │ 1997 │ 386477033 │
│ UNITED KI5 │ UNITED KI1 │ 1997 │ 378048353 │
│ UNITED KI1 │ UNITED KI5 │ 1997 │ 366630529 │
└────────────┴────────────┴────────┴────────────┘
4 rows in set. Elapsed: 3.455 sec. Processed 603.31 million rows, 8.65 GB (174.63 million rows/s., 2.50 GB/s.)
4 rows in set. Elapsed: 3.330 sec. Processed 603.31 million rows, 8.65 GB (181.19 million rows/s., 2.60 GB/s.
Q4.1
SELECT
D_YEAR,
C_NATION,
SUM(LO_REVENUE) - SUM(LO_SUPPLYCOST) AS PROFIT
FROM lineorder
GLOBAL INNER JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL INNER JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN part ON LO_PARTKEY = P_PARTKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (C_REGION = 'AMERICA') AND (S_REGION = 'AMERICA') AND ((P_MFGR = 'MFGR#1') OR (P_MFGR = 'MFGR#2'))
GROUP BY
D_YEAR,
C_NATION
ORDER BY
D_YEAR ASC,
C_NATION ASC
SETTINGS max_threads = 4
...
35 rows in set. Elapsed: 15.560 sec. Processed 606.44 million rows, 13.47 GB (38.97 million rows/s., 865.71 MB/s.)
35 rows in set. Elapsed: 9.494 sec. Processed 606.44 million rows, 13.47 GB (63.88 million rows/s., 1.42 GB/s.)
Q4.2
SELECT
D_YEAR,
S_NATION,
P_CATEGORY,
SUM(LO_REVENUE) - SUM(LO_SUPPLYCOST) AS PROFIT
FROM lineorder
GLOBAL INNER JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL INNER JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN part ON LO_PARTKEY = P_PARTKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (C_REGION = 'AMERICA') AND (S_REGION = 'AMERICA') AND ((D_YEAR = 1997) OR (D_YEAR = 1998)) AND ((P_MFGR = 'MFGR#1') OR (P_MFGR = 'MFGR#2'))
GROUP BY
D_YEAR,
S_NATION,
P_CATEGORY
ORDER BY
D_YEAR ASC,
S_NATION ASC,
P_CATEGORY ASC
SETTINGS max_threads = 4
...
100 rows in set. Elapsed: 16.109 sec. Processed 606.44 million rows, 13.47 GB (37.64 million rows/s., 836.01 MB/s.)
100 rows in set. Elapsed: 18.048 sec. Processed 606.44 million rows, 13.47 GB (33.60 million rows/s., 746.35 MB/s.)
Q4.3
SELECT
D_YEAR,
S_CITY,
P_BRAND,
SUM(LO_REVENUE) - SUM(LO_SUPPLYCOST) AS PROFIT
FROM lineorder
GLOBAL INNER JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL INNER JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN part ON LO_PARTKEY = P_PARTKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (C_REGION = 'AMERICA') AND (S_NATION = 'UNITED STATES') AND ((D_YEAR = 1997) OR (D_YEAR = 1998)) AND (P_CATEGORY = 'MFGR#14')
GROUP BY
D_YEAR,
S_CITY,
P_BRAND
ORDER BY
D_YEAR ASC,
S_CITY ASC,
P_BRAND ASC
SETTINGS max_threads = 4
...
800 rows in set. Elapsed: 15.685 sec. Processed 606.44 million rows, 13.47 GB (38.66 million rows/s., 858.94 MB/s.)
800 rows in set. Elapsed: 14.838 sec. Processed 606.44 million rows, 13.47 GB (40.87 million rows/s., 907.94 MB/s.)