clickhouse SSB 性能测试

SSB(Star Schema Benchmark)的介绍论文地址:
https://www.cs.umb.edu/~poneil/StarSchemaB.PDF
在这里插入图片描述

官网链接 https://clickhouse.com/docs/en/getting-started/example-datasets/star-schema/

如果安装系统时,时最小化mini安装,经常会提示很多命令不存在

  • 提示 git 不存在,使用 yum install git 安装即可
  • 提示 make: command not found,使用以下命令安装 make yum install -y gcc gcc-c++ automake autoconf libtool make

ssb-dbgen 测试工具 GitHub 地址 https://github.com/vadimtk/ssb-dbgen

下载并编译测试工具

git clone https://github.com/vadimtk/ssb-dbgen.git
cd ssb-dbgen
make

之后就会在当前目录生成 dbgenqgen 这两个可执行文件
在这里插入图片描述
|

命令结果
./dbgen -s 1 -T ppart.tbl
./dbgen -s 1 -T ssuppliers.tbl
./dbgen -s 1 -T ccustomers.tbl
./dbgen -s 1 -T ddate.tbl
./dbgen -s 1 -T llineorder.tbl
./dbgen -s 1 -T a一次性生成以上所有表

在这里插入图片描述

添加 -h 符号,会将文件大小进行格式化,并显示单位
在这里插入图片描述
使用 head -n 10 customer.tbl 命令打印前10行可以看到,tbl 文件是用逗号分隔列,然后使用换行符分割行的
在这里插入图片描述

这里数据量业界有一个统称叫做SF 1SF == 1G
装载数据 1G -s 1 == 1G #:但这样有一个弊端 那就是如果装载的数据量特别大的时候例如 1T 这样基本需要花费一天的时间 所以我来用多线程的dbgen方法

方法如下:

./dbgen -vfF -s 1000 -S 1 -C 6 &
./dbgen -vfF -s 1000 -S 2 -C 6 &
./dbgen -vfF -s 1000 -S 3 -C 6 &
./dbgen -vfF -s 1000 -S 4 -C 6 &
./dbgen -vfF -s 1000 -S 5 -C 6 &
./dbgen -vfF -s 1000 -S 6 -C 6

参数详解

  • -v 详细信息
  • -s 表示生成数据的规模
  • -S 切分数据
  • -f 覆盖之前的文件

dss.ddl 这个文件存储的是建表的语句
cat dss.ddl 逐一执行里面的建表语句

建表语句可以在官网拿到,使用外部工具,比如 DBeaver 建表

CREATE DATABASE IF NOT EXISTS ssb
USE ssb
CREATE TABLE IF NOT EXISTS customer
(
        C_CUSTKEY       UInt32,
        C_NAME          String,
        C_ADDRESS       String,
        C_CITY          LowCardinality(String),
        C_NATION        LowCardinality(String),
        C_REGION        LowCardinality(String),
        C_PHONE         String,
        C_MKTSEGMENT    LowCardinality(String)
)
ENGINE = MergeTree ORDER BY (C_CUSTKEY);

CREATE TABLE IF NOT EXISTS lineorder
(
    LO_ORDERKEY             UInt32,
    LO_LINENUMBER           UInt8,
    LO_CUSTKEY              UInt32,
    LO_PARTKEY              UInt32,
    LO_SUPPKEY              UInt32,
    LO_ORDERDATE            Date,
    LO_ORDERPRIORITY        LowCardinality(String),
    LO_SHIPPRIORITY         UInt8,
    LO_QUANTITY             UInt8,
    LO_EXTENDEDPRICE        UInt32,
    LO_ORDTOTALPRICE        UInt32,
    LO_DISCOUNT             UInt8,
    LO_REVENUE              UInt32,
    LO_SUPPLYCOST           UInt32,
    LO_TAX                  UInt8,
    LO_COMMITDATE           Date,
    LO_SHIPMODE             LowCardinality(String)
)
ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);

CREATE TABLE IF NOT EXISTS part
(
        P_PARTKEY       UInt32,
        P_NAME          String,
        P_MFGR          LowCardinality(String),
        P_CATEGORY      LowCardinality(String),
        P_BRAND         LowCardinality(String),
        P_COLOR         LowCardinality(String),
        P_TYPE          LowCardinality(String),
        P_SIZE          UInt8,
        P_CONTAINER     LowCardinality(String)
)
ENGINE = MergeTree ORDER BY P_PARTKEY;

CREATE TABLE IF NOT EXISTS supplier
(
        S_SUPPKEY       UInt32,
        S_NAME          String,
        S_ADDRESS       String,
        S_CITY          LowCardinality(String),
        S_NATION        LowCardinality(String),
        S_REGION        LowCardinality(String),
        S_PHONE         String
)
ENGINE = MergeTree ORDER BY S_SUPPKEY;

在这里插入图片描述

之后定位到 dbgen 生成的目录,使用 pwd 命令,可以打印出当前目录
在这里插入图片描述
接下来,使用 clickhouse-client 工具,将 tbl 中的数据导入数据库

clickhouse-client --query "INSERT INTO ssb.customer FORMAT CSV" < customer.tbl
clickhouse-client --query "INSERT INTO ssb.part FORMAT CSV" < part.tbl
clickhouse-client --query "INSERT INTO ssb.supplier FORMAT CSV" < supplier.tbl
clickhouse-client --query "INSERT INTO ssb.lineorder FORMAT CSV" < lineorder.tbl

注意:如果你使用的不是 default 数据库(默认数据库),请在表前面加上数据库前缀,否则会报如下错误
在这里插入图片描述

这是维度为1,即 -s 1 情况下的数据量

SELECT 
	(SELECT COUNT(1) FROM customer) AS customer, 
	(SELECT COUNT(1) FROM lineorder) AS lineorder, 
	(SELECT COUNT(1) FROM part) AS part,
	(SELECT COUNT(1) FROM supplier) AS supplier

在这里插入图片描述
下面这段SQL,会将星型模式(star schema)转化为 非标准化的(denormalized)平面模型(flat schema)。也就是说,将原本相关联的表结构,通过某种关系,整合到一张表里去。

DROP TABLE IF EXISTS ssb.lineorder_flat;

CREATE TABLE ssb.lineorder_flat
ENGINE = MergeTree
PARTITION BY toYear(LO_ORDERDATE)
ORDER BY (LO_ORDERDATE, LO_ORDERKEY) AS
SELECT
    l.LO_ORDERKEY AS LO_ORDERKEY,
    l.LO_LINENUMBER AS LO_LINENUMBER,
    l.LO_CUSTKEY AS LO_CUSTKEY,
    l.LO_PARTKEY AS LO_PARTKEY,
    l.LO_SUPPKEY AS LO_SUPPKEY,
    l.LO_ORDERDATE AS LO_ORDERDATE,
    l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,
    l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,
    l.LO_QUANTITY AS LO_QUANTITY,
    l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,
    l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,
    l.LO_DISCOUNT AS LO_DISCOUNT,
    l.LO_REVENUE AS LO_REVENUE,
    l.LO_SUPPLYCOST AS LO_SUPPLYCOST,
    l.LO_TAX AS LO_TAX,
    l.LO_COMMITDATE AS LO_COMMITDATE,
    l.LO_SHIPMODE AS LO_SHIPMODE,
    c.C_NAME AS C_NAME,
    c.C_ADDRESS AS C_ADDRESS,
    c.C_CITY AS C_CITY,
    c.C_NATION AS C_NATION,
    c.C_REGION AS C_REGION,
    c.C_PHONE AS C_PHONE,
    c.C_MKTSEGMENT AS C_MKTSEGMENT,
    s.S_NAME AS S_NAME,
    s.S_ADDRESS AS S_ADDRESS,
    s.S_CITY AS S_CITY,
    s.S_NATION AS S_NATION,
    s.S_REGION AS S_REGION,
    s.S_PHONE AS S_PHONE,
    p.P_NAME AS P_NAME,
    p.P_MFGR AS P_MFGR,
    p.P_CATEGORY AS P_CATEGORY,
    p.P_BRAND AS P_BRAND,
    p.P_COLOR AS P_COLOR,
    p.P_TYPE AS P_TYPE,
    p.P_SIZE AS P_SIZE,
    p.P_CONTAINER AS P_CONTAINER
FROM ssb.lineorder AS l
INNER JOIN ssb.customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY
INNER JOIN ssb.supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY
INNER JOIN ssb.part AS p ON p.P_PARTKEY = l.LO_PARTKEY;

clickhouse 默认内存大小是1G。在做 SSB 的星型转平面模型的时,如果没有增加最大内存限制,就会报如下错误(DB::Exception: Memory limit (total) exceeded)
在这里插入图片描述
在这里插入图片描述
注意:set_memory_usage = 20000000000 这个函数在外部的数据库管理工具,比如DBeaver中是无法设置的。但在 clickhouse-client 中设置却有效(本地或远程均可)。这是因为,该函数仅支持在 TCP 模式下被调用,即 port=9000
在这里插入图片描述
另外,set 函数是针对当前会话的,只要一退出,立马又会还原成之前的样子。
查看变量值的命令为:SELECT name,value FROM system.settings WHERE name = 'max_memory_usage'
在这里插入图片描述
由于0太多,不好数。我们可以使用 formatReadableSize 函数将其格式化。但由于 system.settings 这个表中的 value 是字符串类型的,因此我们必须将起转为 Int 类型,这就要用到 toInt64 函数了(Int32长度不够,会导致数值溢出)

完整的命令如下:SELECT name,formatReadableSize(toInt64(value)) FROM system.settings WHERE name = 'max_memory_usage';

在这里插入图片描述
接着先将可用内存增大后,再删除之前创建后转化失败的表,然后进行测试

SET max_memory_usage = 20000000000
DROP TABLE IF EXISTS ssb.lineorder_flat

如果机子内存没有这么大,上述语句无效。但可以通过 SET min_insert_block_size_rows=8192; 减小单次插入块的行大小(默认值为 1048545)这样就能安全通过了。

可以看到在虚拟机3G内存下的速度为:每秒71万行,30MB/S
在这里插入图片描述
在执行时,打开另一个终端,在clickhouse-client 中执行 SHOW PROCESSLIST 会打印当前执行的进度
在这里插入图片描述
将字符串复制出来,把单引号替换成双引号。放在在线JSON格式化页面进行转义
在这里插入图片描述

运行下面查询语句(使用 USE ssb; 设置默认数据库)
在这里插入图片描述

Q1.1

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;

在这里插入图片描述

Q1.2

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;

在这里插入图片描述

Q1.3

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994
  AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;

在这里插入图片描述

Q2.1

SELECT
    sum(LO_REVENUE),
    toYear(LO_ORDERDATE) AS year,
    P_BRAND
FROM lineorder_flat
WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA'
GROUP BY
    year,
    P_BRAND
ORDER BY
    year,
    P_BRAND;

在这里插入图片描述
在这里插入图片描述

Q2.2

SELECT
    sum(LO_REVENUE),
    toYear(LO_ORDERDATE) AS year,
    P_BRAND
FROM lineorder_flat
WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA'
GROUP BY
    year,
    P_BRAND
ORDER BY
    year,
    P_BRAND;

在这里插入图片描述
在这里插入图片描述

Q2.3

SELECT
    sum(LO_REVENUE),
    toYear(LO_ORDERDATE) AS year,
    P_BRAND
FROM lineorder_flat
WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE'
GROUP BY
    year,
    P_BRAND
ORDER BY
    year,
    P_BRAND;

在这里插入图片描述

Q3.1

SELECT
    C_NATION,
    S_NATION,
    toYear(LO_ORDERDATE) AS year,
    sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997
GROUP BY
    C_NATION,
    S_NATION,
    year
ORDER BY
    year ASC,
    revenue DESC;

在这里插入图片描述
在这里插入图片描述

Q3.2

SELECT
    C_CITY,
    S_CITY,
    toYear(LO_ORDERDATE) AS year,
    sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997
GROUP BY
    C_CITY,
    S_CITY,
    year
ORDER BY
    year ASC,
    revenue DESC;

在这里插入图片描述

在这里插入图片描述

Q3.3

SELECT
    C_CITY,
    S_CITY,
    toYear(LO_ORDERDATE) AS year,
    sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997
GROUP BY
    C_CITY,
    S_CITY,
    year
ORDER BY
    year ASC,
    revenue DESC;

在这里插入图片描述

Q3.4

SELECT
    C_CITY,
    S_CITY,
    toYear(LO_ORDERDATE) AS year,
    sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712
GROUP BY
    C_CITY,
    S_CITY,
    year
ORDER BY
    year ASC,
    revenue DESC;

在这里插入图片描述

Q4.1

SELECT
    toYear(LO_ORDERDATE) AS year,
    C_NATION,
    sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')
GROUP BY
    year,
    C_NATION
ORDER BY
    year ASC,
    C_NATION ASC;

在这里插入图片描述

Q4.2

SELECT
    toYear(LO_ORDERDATE) AS year,
    S_NATION,
    P_CATEGORY,
    sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')
GROUP BY
    year,
    S_NATION,
    P_CATEGORY
ORDER BY
    year ASC,
    S_NATION ASC,
    P_CATEGORY ASC;

在这里插入图片描述
在这里插入图片描述

Q4.3

SELECT
    toYear(LO_ORDERDATE) AS year,
    S_CITY,
    P_BRAND,
    sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14'
GROUP BY
    year,
    S_CITY,
    P_BRAND
ORDER BY
    year ASC,
    S_CITY ASC,
    P_BRAND ASC;

在这里插入图片描述

在这里插入图片描述

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值