Clickhouse 简单SQL优化以及特殊数据类型使用场景

最新推荐文章于 2024-06-09 17:22:13 发布

Lara1111

最新推荐文章于 2024-06-09 17:22:13 发布

阅读量941

点赞数

分类专栏： clickhouse

本文链接：https://blog.csdn.net/weixin_38424594/article/details/110560885

版权

clickhouse 专栏收录该内容

17 篇文章 5 订阅

订阅专栏

一. SQL 查询上的优化点

1. 案例1利用好引擎表可以 PREWHER的优势

select big.id, big.time, small.code, small.value
from smalltable small INNER JOIN bigtable big on small.id=big.id
where big.time>‘2020-05-23’ and small.code in(‘a’,‘b’,‘c’);

smalltable为TinyLog引擎表
bigtable为MergeTree引擎表

遍历右表的每一条记录，然后和左表关联，NESTED JOIN
右表是小表。

改写：
select big.id, big.time, small.code, small.value
from bigtable big INNER JOIN smalltable small on small.id=big.id
where big.time>‘2020-05-23’ and small.code in(‘a’,‘b’,‘c’);

PREWHERE，自动where转移为PREWHER，条件，只适用于MERGETREE引擎

select big.id, big.time, small.code, small.value
from bigtable big INNER JOIN (select * from smalltable where small.code in(‘a’,‘b’,‘c’)) small on small.id=big.id
where big.time>‘2020-05-23’ ;

2. 案例2 ,函数在join前操作 /将小表存字典操作

select formatDateTime(ns.date, ‘%Y%m%d’) as dayid, ns.prd_id, m.name,count() as cnt
from crm_prd_inst ns left join dim_code_map m on(ns.code = m.code)
group by dayid, ns.prd_id, m.name;

date格式: 2020-03-31，通过formatDateTime函数转换为：20200331
dim_code_map表是维度表，主键为code。

改写1：
select t1.dayid, t1.prd_id, t2.name, t1.cnt from
(
select formatDateTime(ns.date, ‘%Y%m%d’) as dayid, ns.prd_id, ns.code,count() as cnt
from crm_prd_inst ns
group by dayid, ns.prd_id, ns.code
) t1 left join dim_code_map t2 on(t1code = t2.code)

改写2：

select formatDateTime(t1.date, ‘%Y%m%d’) as dayid, t1.prd_id, t2.name, t1.cnt from
(
select ns.date, ns.prd_id, ns.code,count() as cnt
from crm_prd_inst ns
group by ns.date, ns.prd_id, ns.code
) t1 left join dim_code_map t2 on(t1code = t2.code)

改写3：使用字典
将表数据作为字典的数据源，字典名称dic_dim_code

select formatDateTime(t1.date, ‘%Y%m%d’) as dayid, t1.prd_id, dictGet(‘default.dic_dim_code’, ‘name’, t1.code) name, t1.cnt from
from
(
select ns.date, ns.prd_id, ns.code,count() as cnt
from crm_prd_inst ns
group by ns.date, ns.prd_id, ns.code
) t1

二. 相关特殊数据类型. Tuple ,数组 ,负数

1. tunple


drop table t_tuple;
create table t_tuple(id String, addr Tuple(UInt16,String,String)) ENGINE=TinyLog;
insert into t_tuple values('id001', tuple(556, 'anhui', 'anqing'));
select * from t_tuple;

select id, addr.1, addr.2 from t_tuple;

┌─id────┬─tupleElement(addr, 1)─┬─tupleElement(addr, 2)─┐
│ id001 │                   556 │ anhui                 │
└───────┴───────────────────────┴───────────────────────┘


select id, tupleElement(addr, 1) as area_code from t_tuple;
┌─id────┬─area_code─┐
│ id001 │       556 │
└───────┴───────────┘

2. 数组相关操作

drop table t_arr;
create table t_arr(id String, hobby Array(String)) ENGINE=TinyLog;
insert into t_arr values('id001', ['football', 'basketball', 'movie']);
insert into t_arr values('id001', ['eat', 'sleep', 'play', 'ride']);
select * from t_arr;
数组和元组的元素的 索引都是从1开始的。a[N]和函数arrayElement(a, N)的使用。
xiaochen :) select id, hobby[1] as first_hobby from t_arr;

┌─id────┬─first_hobby─┐
│ id001 │ football    │
│ id001 │ eat         │
└───────┴─────────────┘

xiaochen :) select id, arrayElement(hobby,1) as first_hobby from t_arr;

┌─id────┬─first_hobby─┐
│ id001 │ football    │
│ id001 │ eat         │
└───────┴─────────────┘

3. 负数

select -2 as a, negate (2) as b;
┌──a─┬──b─┐
│ -2 │ -2 │
└────┴────┘

三. 特殊的操作符

1. 比较大小

（1）、相等
a = b
a == b
等价函数：equals(a, b)

（2）不相等
a != b
a <> b
等价函数：notEquals(a, b)

（3）、不等比较
a <= b，等价函数：lessOrEquals(a, b)
a >= b ，等价函数：greaterOrEquals(a, b)
a < b ，等价函数：less(a, b)
a > b ，等价函数：greater(a, b)

（4）、like匹配

a LIKE s，等价函数：like(a, b)
a NOT LIKE s，等价函数：notLike(a, b)

（5）、区间比较
a BETWEEN b AND c，等价于： a >= b AND a <= c
a NOT BETWEEN b AND c ，等价于：a < b OR a > c 。

数据集操作符
a IN … ：等价函数 in(a, b)。
a NOT IN … ：等价函数 notIn(a, b)。
a GLOBAL IN … ：等价函数 globalIn(a, b)。
a GLOBAL NOT IN … ：等价函数 globalNotIn(a, b)。

2. 时间操作之EXTRACT

从给定的日期提取一个时间片段，支持Date和DateTime数据类型。例如可以从给定的日期检索月或者秒等。
使用toYear、toMonth等也可实现类似功能。

EXTRACT(part FROM date);
part参数指定要解析的时间片段名称，可选的片段名称如下：
DAY — 月份的日，Possible valaues: 1–31.
MONTH — 月份， Possible values: 1–12.
YEAR — 年.
SECOND —秒. Possible values: 0–59.
MINUTE — 分钟. Possible values: 0–59.
HOUR — 小时. Possible values: 0–23.

SELECT EXTRACT(DAY FROM toDate('2017-06-15'));
SELECT EXTRACT(MONTH FROM toDate('2017-06-15'));
SELECT EXTRACT(YEAR FROM toDate('2017-06-15'));

CREATE TABLE test.Orders
(
    OrderId UInt64,
    OrderName String,
    OrderDate DateTime
)
ENGINE = Log;

INSERT INTO test.Orders VALUES (1, 'Jarlsberg Cheese', toDateTime('2008-10-11 13:23:44'));

SELECT
    toYear(OrderDate) AS OrderYear,
    toMonth(OrderDate) AS OrderMonth,
    toDayOfMonth(OrderDate) AS OrderDay,
    toHour(OrderDate) AS OrderHour,
    toMinute(OrderDate) AS OrderMinute,
    toSecond(OrderDate) AS OrderSecond
FROM test.Orders;

┌─OrderYear─┬─OrderMonth─┬─OrderDay─┬─OrderHour─┬─OrderMinute─┬─OrderSecond─┐
│      2008 │         10 │       11 │        13 │          23 │          44 │
└───────────┴────────────┴──────────┴───────────┴─────────────┴─────────────┘

3. 时间操作之INTERVAL

在Date和DateTime数据类型的值使用Interval类型的值做算术运算操作。
Interval类型： - SECOND - MINUTE - HOUR - DAY - WEEK - MONTH - QUARTER - YEAR 。
不同类型的interval不能合并。例如不能使用表达式：“INTERVAL 4 DAY 1 HOUR”。
示例：

SELECT now() AS current_date_time, current_date_time + INTERVAL 4 DAY + INTERVAL 3 HOUR
┌───current_date_time─┬─plus(plus(now(), toIntervalDay(4)), toIntervalHour(3))─┐
│ 2020-03-21 12:08:50 │                                    2020-03-25 15:08:50 │
└─────────────────────┴────────────────────────────────────────────────────────┘

4.运算符

  逻辑运算
逻辑非：NOT a ，等价于not(a) 。
逻辑与：a AND b， 等价于and(a, b)。
逻辑或： a OR b， 等价于or(a, b) 。

 三目运算符
元算法：
a ? b : c
等价函数：  if(a, b, c)
 如果b或C是arrayJoin（）函数，则无论“ a”条件如何，都将复制每一行。
select 1 as a, a>1?'a>1':'a<=1' as rs ;
┌─a─┬─rs───┐
│ 1 │ a<=1 │
└───┴──────┘

select 2 as a, a>1?'a>1':'a<=1' as rs ;
┌─a─┬─rs──┐
│ 2 │ a>1 │
└───┴─────┘

case when 表达式
CASE [x]
    WHEN a THEN b
    [WHEN ... THEN ...]
    [ELSE c]
END


SELECT case number when 3 then 111 when 5 then 222 when 7 then 333 else number end as new_number FROM system.numbers LIMIT 10;
┌─new_number─┐
│          0 │
│          1 │
│          2 │
│        111 │
│          4 │
│        222 │
│          6 │
│        333 │
│          8 │
│          9 │
└────────────┘


创建数组和元组
数组 [x1, ...]  等价函数： array(x1, ...)
元组 (x1, x2, ...)  等价函数： tuple(x2, x2, ...)

select (1,1,2);
select [1,2];

NULL判断
支持两种操作符： IS NULL 或 IS NOT NULL 。
（1）、IS NULL
对于 Nullable类型的值， 返回0或1。

（2）、IS NOT NULL
对于 Nullable类型的值， 返回0或1。

SELECT * FROM t_null WHERE y IS NOT NULL

13. 连接运算符
s1 || s2 等价函数： concat(s1, s2)
select 'id001'||':'||'football';
┌─concat('id001', ':', 'football')─┐
│ id001:football                   │
└──────────────────────────────────┘

Lara1111

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Clickhouse 简单SQL优化以及特殊数据类型使用场景

一. SQL 查询上的优化点1. 案例1利用好引擎表可以 PREWHER的优势select big.id, big.time, small.code, small.valuefrom smalltable small INNER JOIN bigtable big on small.id=big.idwhere big.time>‘2020-05-23’ and small.code in(‘a’,‘b’,‘c’);smalltable为TinyLog引擎表bigtable为MergeT
复制链接

扫一扫

专栏目录