Clickhouse(20.4.2.9) SSB性能测试

Clickhouse 性能测试


ClickHouse 简介

ClickHouse 是战斗民族 Yandex 公司出品的 OLAP 开源数据库,简称 CH,也有人简称 CK,是目前市面上最快的

OLAP 数据库。性能远超 Vertica、Sybase IQ 等。ClickHouse 可能更适合流式或批次入库的时序数据。

CH 具有以下几个特点:

  1. 列式存储,因此数据压缩比高。
  2. 向量计算,且支持多核 CPU 并行计算,并且执行每个 SQL 时都力求榨干 CPU 性能。
  3. 基于 Shared nothing 架构,支持分布式方案。
  4. 支持主从复制架构。
  5. 兼容大部分 SQL 语法,其语法和 MySQL 尤其相近。
  6. 数据实时更新。
  7. 不支持事务,不适合高频更新数据。
  8. 建议多用宽表,但不建议总是查询整数据行中的所有列。

简言之,如果你有以下业务场景,可以考虑用 CH:

  1. 海量数据,但又不希望单节点的存储空间消耗太高。
  2. 宽表,为了业务方便,可能会把很多相关数据列都整合到一个表里。
  3. 基于 SQL 的查询方式,提高程序的适用性和可移植性。
性能测试

选用了 CH 官方提供的一个测试方案:SSBM (Star Schema Benchmark)

服务器配置

[root@p2hadoop075 ssb-dbgen-master]# uname -a
Linux p2hadoop075 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@p2hadoop075 ssb-dbgen-master]# cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
     64  Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
[root@p2hadoop075 ssb-dbgen-master]# grep MemTotal /proc/meminfo
MemTotal:       527782880 kB

SSB 模型介绍

SSB(Star Schema Benchmark)是麻省州立大学波士顿校区的研究人员定义的基于现实商业应用的数据模型,业界公认用来模拟决策支持类应用,比较公正和中立。

学术界和工业界普遍采用它来评价决策支持技术方面应用的性能。

全方位评测系统的整体商业计算综合能力,对厂商的要求更高。

在银行信贷分析和信用卡分析、电信运营分析、税收分析、烟草行业决策分析中都有广泛的应用。

SSB 基准测试包括:

  • 1 个事实表:lineorder
  • 4 个维度表:customer,part,dwdate,supplier

13 条标准 SQL 查询测试语句:统计查询、多表关联、sum、复杂条件、group by、order by 等组合方式

生成数据

# 下载SSBM工具
[root@p2hadoop075 data03]# git clone https://github.com/vadimtk/ssb-dbgen.git
[root@p2hadoop075 data03]# cd ssb-dbgen-master
[root@p2hadoop075 ssb-dbgen-master]# make

# 生成测试数据,机器性能和磁盘有限,所以指定 -s 100
[root@sdw1 ssb-dbgen-master]# ./dbgen -s 100 -T c
[root@sdw1 ssb-dbgen-master]# ./dbgen -s 100 -T p
[root@sdw1 ssb-dbgen-master]# ./dbgen -s 100 -T s
[root@sdw1 ssb-dbgen-master]# ./dbgen -s 100 -T l

# 查看文件
[root@sdw1 ssb-dbgen-master]# ll .tbl
-rw-r--r-- 1 root root   289529327 426 17:21 customer.tbl
-rw-r--r-- 1 root root 63289191180 426 17:38 lineorder.tbl
-rw-r--r-- 1 root root   121042413 426 17:21 part.tbl
-rw-r--r-- 1 root root    17062852 426 17:21 supplier.tbl
[root@sdw1 ssb-dbgen-master]#

# 查看记录数
[root@sdw1 ssb-dbgen-master]# wc -l .tbl
    3000000 customer.tbl
  600037902 lineorder.tbl
    1400000 part.tbl
     200000 supplier.tbl
集群建表

​ **note:**注意在lineorder表 切勿使用ReplicatedReplacingMergeTree引擎,会将数据去重,导致数据量不对

create database ssb ON CLUSTER center_cluster;

show databases;
-- customer 本地表
CREATE  TABLE ssb.customer_local ON CLUSTER center_cluster
(
		CCUSTKEY       UInt32,
        CNAME          String,
        CADDRESS       String,
        CCITY          LowCardinality(String),
        CNATION        LowCardinality(String),
        CREGION        LowCardinality(String),
        CPHONE         String,
        CMKTSEGMENT    LowCardinality(String)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/customer_local/{shard}/replicate', '{replica}')
ORDER BY (CCUSTKEY)
SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192;
-- customer 分布式表
CREATE TABLE ssb.customer ON CLUSTER center_cluster AS ssb.customer_local engine = Distributed(center_cluster, ssb, customer_local, rand());

-- lineorder 本地表
CREATE  TABLE ssb.lineorder_local ON CLUSTER center_cluster
(
	LOORDERKEY             UInt32,
    LOLINENUMBER           UInt8,
    LOCUSTKEY              UInt32,
    LOPARTKEY              UInt32,
    LOSUPPKEY              UInt32,
    LOORDERDATE            Date,
    LOORDERPRIORITY        LowCardinality(String),
    LOSHIPPRIORITY         UInt8,
    LOQUANTITY             UInt8,
    LOEXTENDEDPRICE        UInt32,
    LOORDTOTALPRICE        UInt32,
    LODISCOUNT             UInt8,
    LOREVENUE              UInt32,
    LOSUPPLYCOST           UInt32,
    LOTAX                  UInt8,
    LOCOMMITDATE           Date,
    LOSHIPMODE             LowCardinality(String)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/lineorder_local/{shard}/replicate', '{replica}')
PARTITION BY toYear(LOORDERDATE) ORDER BY (LOORDERDATE, LOORDERKEY)
SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192;

-- lineorder 分布式表
CREATE TABLE ssb.lineorder ON CLUSTER center_cluster AS ssb.lineorder_local engine = Distributed(center_cluster, ssb, lineorder_local, rand());

-- part 本地表
CREATE  TABLE ssb.part_local ON CLUSTER center_cluster
(
	PPARTKEY       UInt32,
	PNAME          String,
	PMFGR          LowCardinality(String),
	PCATEGORY      LowCardinality(String),
	PBRAND         LowCardinality(String),
	PCOLOR         LowCardinality(String),
	PTYPE          LowCardinality(String),
	PSIZE          UInt8,
	PCONTAINER     LowCardinality(String)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/part_local/{shard}/replicate', '{replica}')
ORDER BY PPARTKEY
SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192;

-- part 分布式表
CREATE TABLE ssb.part ON CLUSTER center_cluster AS ssb.part_local engine = Distributed(center_cluster, ssb, part_local, rand());

-- supplier本地表
CREATE  TABLE ssb.supplier_local ON CLUSTER center_cluster
(
        SSUPPKEY       UInt32,
        SNAME          String,
        SADDRESS       String,
        SCITY          LowCardinality(String),
        SNATION        LowCardinality(String),
        SREGION        LowCardinality(String),
        SPHONE         String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/supplier_local/{shard}/replicate', '{replica}')
ORDER BY SSUPPKEY
SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192;

CREATE TABLE ssb.supplier ON CLUSTER center_cluster AS ssb.supplier_local engine = Distributed(center_cluster, ssb, supplier_local, rand());

-- 导入数据
clickhouse-client -h p2hadoop074 --format_csv_delimiter="," --query "INSERT INTO ssb.customer FORMAT CSV" < customer.tbl
clickhouse-client -h p2hadoop074 --format_csv_delimiter="," --query "INSERT INTO ssb.part FORMAT CSV" < part.tbl
clickhouse-client -h p2hadoop074 --format_csv_delimiter="," --query "INSERT INTO ssb.supplier FORMAT CSV" < supplier.tbl
clickhouse-client -h p2hadoop074 --format_csv_delimiter="," --query "INSERT INTO ssb.lineorder FORMAT CSV" < lineorder.tbl

-- 查询数据
SELECT  COUNT(*)  from ssb.lineorder -- 600037902
SELECT  COUNT(*)  from ssb.customer  -- 3000000
SELECT  COUNT(*)  from ssb.part -- 1400000
SELECT  COUNT(*)  from ssb.supplier -- 200000

-- lineorderflat本地表
CREATE TABLE ssb.lineorderflat_local  ON CLUSTER center_cluster(
LOORDERKEY UInt32,
LOLINENUMBER UInt8,
LOCUSTKEY UInt32 ,
LOPARTKEY UInt32 ,
LOSUPPKEY UInt32 ,
LOORDERDATE Date ,
LOORDERPRIORITY String ,
LOSHIPPRIORITY UInt8,
LOQUANTITY UInt8,
LOEXTENDEDPRICE UInt32 ,
LOORDTOTALPRICE UInt32 ,
LODISCOUNT UInt8,
LOREVENUE UInt32 ,
LOSUPPLYCOST UInt32 ,
LOTAX UInt32 ,
LOCOMMITDATE Date ,
LOSHIPMODE String ,
CNAME String ,
CADDRESS String ,
CCITY String ,
CNATION String ,
CREGION String ,
CPHONE String ,
CMKTSEGMENT String ,
SNAME String ,
SADDRESS String ,
SCITY String ,
SNATION String ,
SREGION String ,
SPHONE String ,
PNAME String ,
PMFGR String ,
PCATEGORY String ,
PBRAND String ,
PCOLOR String ,
PTYPE String ,
PSIZE UInt8,
PCONTAINER String 
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/lineorderflat_local/{shard}/replicate', '{replica}')
PARTITION BY toYear(LOORDERDATE)
ORDER BY (LOORDERDATE, LOORDERKEY) 
SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192; 

-- -- lineorderflat分布式表
CREATE TABLE ssb.lineorderflat ON CLUSTER center_cluster AS ssb.lineorderflat_local engine = Distributed(center_cluster, ssb, lineorderflat_local, rand());


-- 导入宽表
INSERT INTO ssb.lineorderflat SELECT
    l.LOORDERKEY AS LOORDERKEY,
    l.LOLINENUMBER AS LOLINENUMBER,
    l.LOCUSTKEY AS LOCUSTKEY,
    l.LOPARTKEY AS LOPARTKEY,
    l.LOSUPPKEY AS LOSUPPKEY,
    l.LOORDERDATE AS LOORDERDATE,
    l.LOORDERPRIORITY AS LOORDERPRIORITY,
    l.LOSHIPPRIORITY AS LOSHIPPRIORITY,
    l.LOQUANTITY AS LOQUANTITY,
    l.LOEXTENDEDPRICE AS LOEXTENDEDPRICE,
    l.LOORDTOTALPRICE AS LOORDTOTALPRICE,
    l.LODISCOUNT AS LODISCOUNT,
    l.LOREVENUE AS LOREVENUE,
    l.LOSUPPLYCOST AS LOSUPPLYCOST,
    l.LOTAX AS LOTAX,
    l.LOCOMMITDATE AS LOCOMMITDATE,
    l.LOSHIPMODE AS LOSHIPMODE,
    c.CNAME AS CNAME,
    c.CADDRESS AS CADDRESS,
    c.CCITY AS CCITY,
    c.CNATION AS CNATION,
    c.CREGION AS CREGION,
    c.CPHONE AS CPHONE,
    c.CMKTSEGMENT AS CMKTSEGMENT,
    s.SNAME AS SNAME,
    s.SADDRESS AS SADDRESS,
    s.SCITY AS SCITY,
    s.SNATION AS SNATION,
    s.SREGION AS SREGION,
    s.SPHONE AS SPHONE,
    p.PNAME AS PNAME,
    p.PMFGR AS PMFGR,
    p.PCATEGORY AS PCATEGORY,
    p.PBRAND AS PBRAND,
    p.PCOLOR AS PCOLOR,
    p.PTYPE AS PTYPE,
    p.PSIZE AS PSIZE,
    p.PCONTAINER AS PCONTAINER
FROM lineorder AS l
INNER JOIN ssb.customer AS c ON c.CCUSTKEY = l.LOCUSTKEY
INNER JOIN ssb.supplier AS s ON s.SSUPPKEY = l.LOSUPPKEY
INNER JOIN ssb.part AS p ON p.PPARTKEY = l.LOPARTKEY


Progress: 23.84 million rows, 1.39 GB (687.66 thousand rows/s., 40.02 MB/s.) 


查询测试sql

-- 单表查询
Q1.1
SELECT sum(LOEXTENDEDPRICE * LODISCOUNT) AS revenue FROM lineorderflat WHERE (toYear(LOORDERDATE) = 1993) AND ((LODISCOUNT >= 1) AND (LODISCOUNT <= 3)) AND (LO_QUANTITY < 25)
Q1.2 
SELECT sum(LOEXTENDEDPRICE * LODISCOUNT) AS revenue FROM lineorderflat WHERE (toYYYYMM(LOORDERDATE) = 199401) AND ((LODISCOUNT >= 4) AND (LODISCOUNT <= 6)) AND ((LOQUANTITY >= 26) AND (LOQUANTITY <= 35))
Q1.3
SELECT sum(LOEXTENDEDPRICE * LODISCOUNT) AS revenue FROM lineorderflat WHERE (toISOWeek(LOORDERDATE) = 6) AND (toYear(LOORDERDATE) = 1994) AND ((LODISCOUNT >= 5) AND (LODISCOUNT <= 7)) AND ((LOQUANTITY >= 26) AND (LO_QUANTITY <= 35)) 
Q2.1
SELECT     sum(LOREVENUE),     toYear(LOORDERDATE) AS year,     PBRAND FROM lineorderflat WHERE (PCATEGORY = 'MFGR#12') AND (SREGION = 'AMERICA') GROUP BY     year,     PBRAND ORDER BY     year ASC,     PBRAND ASC
Q2.2
SELECT     sum(LOREVENUE),     toYear(LOORDERDATE) AS year,     PBRAND FROM lineorderflat WHERE (PBRAND >= 'MFGR#2221') AND (PBRAND <= 'MFGR#2228') AND (SREGION = 'ASIA') GROUP BY     year,     PBRAND ORDER BY     year ASC,     PBRAND ASC
Q2.3
SELECT sum(LOREVENUE),toYear(LOORDERDATE) AS year,PBRAND FROM lineorderflat WHERE (PBRAND = 'MFGR#2239') AND (SREGION = 'EUROPE')GROUP BY year,     PBRAND ORDER BY     year ASC,     PBRAND ASC 
Q3.1  
SELECT     CNATION,     SNATION,     toYear(LOORDERDATE) AS year,     sum(LOREVENUE) AS revenue FROM lineorderflat WHERE (CREGION = 'ASIA') AND (SREGION = 'ASIA') AND (year >= 1992) AND (year <= 1997) GROUP BY     CNATION,     SNATION,     year ORDER BY     year ASC,     revenue DESC

Q3.2 
 SELECT     CCITY,     SCITY,     toYear(LOORDERDATE) AS year,     sum(LOREVENUE) AS revenue FROM lineorderflat WHERE (CNATION = 'UNITED STATES') AND (SNATION = 'UNITED STATES') AND (year >= 1992) AND (year <= 1997) GROUP BY     CCITY,     SCITY,     year ORDER BY     year ASC,     revenue DESC
Q3.3 
 SELECT     CCITY,     SCITY,     toYear(LOORDERDATE) AS year,     sum(LOREVENUE) AS revenue FROM lineorderflat WHERE ((CCITY = 'UNITED KI1') OR (CCITY = 'UNITED KI5')) AND ((SCITY = 'UNITED KI1') OR (SCITY = 'UNITED KI5')) AND (year >= 1992) AND (year <= 1997) GROUP BY     CCITY,     SCITY,     year ORDER BY     year ASC,     revenue DESC
 
Q3.4 
SELECT     CCITY,     SCITY,     toYear(LOORDERDATE) AS year,     sum(LOREVENUE) AS revenue FROM lineorderflat WHERE ((CCITY = 'UNITED KI1') OR (CCITY = 'UNITED KI5')) AND ((SCITY = 'UNITED KI1') OR (SCITY = 'UNITED KI5')) AND (toYYYYMM(LOORDERDATE) = 199712) GROUP BY     CCITY,     SCITY,     year ORDER BY     year ASC,     revenue DESC
 
 Q4.1
SELECT     toYear(LOORDERDATE) AS year,     CNATION,     sum(LOREVENUE - LOSUPPLYCOST) AS profit FROM lineorderflat WHERE (CREGION = 'AMERICA') AND (SREGION = 'AMERICA') AND ((PMFGR = 'MFGR#1') OR (PMFGR = 'MFGR#2')) GROUP BY     year,     CNATION ORDER BY     year ASC,     CNATION ASC;
 
 Q4.2
SELECT     toYear(LOORDERDATE) AS year,     SNATION,     PCATEGORY,     sum(LOREVENUE - LOSUPPLYCOST) AS profit FROM lineorderflat WHERE (CREGION = 'AMERICA') AND (SREGION = 'AMERICA') AND ((year = 1997) OR (year = 1998)) AND ((PMFGR = 'MFGR#1') OR (PMFGR = 'MFGR#2')) GROUP BY     year,     SNATION,     PCATEGORY ORDER BY     year ASC,     SNATION ASC,     PCATEGORY ASC ;

Q4.3
SELECT     toYear(LOORDERDATE) AS year,     SCITY,     PBRAND,     sum(LOREVENUE - LOSUPPLYCOST) AS profit FROM lineorderflat WHERE (SNATION = 'UNITED STATES') AND ((year = 1997) OR (year = 1998)) AND (PCATEGORY = 'MFGR#14') GROUP BY     year,     SCITY,     PBRAND ORDER BY     year ASC,     SCITY ASC,     PBRAND ASC;

-- 低基数查询
--Q1
SELECT     count(*),     LOSHIPMODE FROM lineorderflat GROUP BY LOSHIPMODE;
--Q2
SELECT   count(distinct LOSHIPMODE) FROM lineorderflat;
--Q3
SELECT COUNT(*),LOSHIPMODE,LOORDERPRIORITY FROM ssb.lineorderflat GROUP BY LOSHIPMODE,LOORDERPRIORITY;
--Q4
SELECT COUNT(*),LOSHIPMODE,LOORDERPRIORITY FROM ssb.lineorderflat GROUP BY LOSHIPMODE,LOORDERPRIORITY,LOSHIPPRIORITY;
--Q5  
SELECT COUNT(*),LOSHIPMODE,SCITY FROM ssb.lineorderflat GROUP BY LOSHIPMODE,SCITY;
--Q6  
SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY CCITY,SCITY;
--Q7 
SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY LOSHIPMODE,LOORDERDATE;
--Q8 
SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY LOORDERDATE,SNATION,SREGION;
--Q9 
SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY CCITY,SCITY,CNATION,SNATION;
--Q10 
SELECT COUNT(*) FROM (SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY LOSHIPMODE,LOORDERPRIORITY,PCATEGORY,SNATION,CNATION) T;
--Q11 
SELECT COUNT(*) FROM (SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY LOSHIPMODE,LOORDERPRIORITY,PCATEGORY,SNATION,CNATION,PMFGR) T;
--Q12 
SELECT COUNT(*) FROM (SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY SUBSTR(LOSHIPMODE,2),LOWER(LOORDERPRIORITY),PCATEGORY,SNATION,CNATION,SREGION,PMFGR) T;

测试结果

单表测试查询

clickhouse 20.4.2.9 (ms)
Q1.127
Q1.221
Q1.318
Q2.1376
Q2.2309
Q2.3481
Q3.1792
Q3.2848
Q3.3622
Q3.433
Q4.1919
Q4.2441
Q4.3295

低基数查询性能

查询类型clickhouse 20.4.2.9 (s)
Q1group by 1个低基数列(<50)0.199
Q2count distinct 1个低基数列(<50)0.365
Q3group by 2个低基数列2.732
Q4group by 2个低基数列,一个int列3.465
Q5group by 4个低基数列(7*250)0.996
Q6group by 2个低基数列(250*250)1.947
Q7group by 1个低基数列(<50)和1个日期列0.656
Q8group by 2个低基数列(<50)和2个日期列0.978
Q9group by 4个低基数列3.308
Q10group by 5个低基数列(<50)4.46
Q11group by 6个低基数列(<50)5.254
Q12group by 7个包含函数计算低基数列(<50)5.868
  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值