1.limit by 子句
Clickhouse的LIMIT子句运行于order by 之后和limit之前,能够按照指定分组最多返回齐前 N 行,常应用于TOP N的查询场景。
Limit by的语法:
LIMIT n BY express
示例:
查询每个数据库分区数的最多的前两名:
Clickhouse> select database,table,count(1) cnt from system.parts where database in ('datasets','system') group by database,table order by database,cnt desc limit 2 by database;
SELECT
database,
table,
count(1) AS cnt
FROM system.parts
WHERE database IN ('datasets', 'system')
GROUP BY
database,
table
ORDER BY
database ASC,
cnt DESC
LIMIT 2 BY database
┌─database─┬─table──────┬─cnt─┐
│ datasets │ hits_v1 │ 1 │
│ datasets │ visits_v1 │ 1 │
│ system │ metric_log │ 134 │
│ system │ text_log │ 86 │
└──────────┴────────────┴─────┘
4 rows in set. Elapsed: 0.017 sec.
LIMIT BY 支持多个字段:
SELECT
database,
table,
count(1) AS cnt
FROM system.parts
WHERE database IN ('datasets', 'system')
GROUP BY
database,
table
ORDER BY
database ASC,
cnt DESC
LIMIT 2 BY database,table
LIMIT BY 子句支持偏移量获取数据:
SELECT
database,
table,
count(1) AS cnt
FROM system.parts
WHERE database IN ('datasets', 'system')
GROUP BY
database,
table
ORDER BY
database ASC,
cnt DESC
LIMIT 2 offset 1 BY database
简写如下:
SELECT
database,
table,
count(1) AS cnt
FROM system.parts
WHERE database IN ('datasets', 'system')
GROUP BY
database,
table
ORDER BY
database ASC,
cnt DESC
LIMIT 1,2 BY database
2. limit子句用于返回指定的前n行数据,常用于分页数场景。
LIMIT子句的语法:
limit n
limit n offset m
limit m,n
取Top 10 行数据:
Clickhouse> select number from system.numbers limit 10;
SELECT number
FROM system.numbers
LIMIT 10
┌─number─┐
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└────────┘
10 rows in set. Elapsed: 0.005 sec.
取第五行到第15行数据:
Clickhouse> select number from system.numbers limit 10 offset 5;;
SELECT number
FROM system.numbers
LIMIT 5, 10
┌─number─┐
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
│ 10 │
│ 11 │
│ 12 │
│ 13 │
│ 14 │
└────────┘
10 rows in set. Elapsed: 0.003 sec.
Clickhouse> select number from system.numbers limit 5,10;
limit子句可以和Limit by 一并使用:
SELECT
database,
table,
count(1) AS cnt
FROM system.parts
WHERE database IN ('datasets', 'system')
GROUP BY
database,
table
ORDER BY
database ASC,
cnt DESC
LIMIT 2 BY database
LIMIT 3
┌─database─┬─table──────┬─cnt─┐
│ datasets │ hits_v1 │ 1 │
│ datasets │ visits_v1 │ 1 │
│ system │ metric_log │ 133 │
└──────────┴────────────┴─────┘
3 rows in set. Elapsed: 0.013 sec.
limit 3 表示总的取3行数据
LIMIT 2 BY database 表示每个数据库取top 2 的数据。
注意:若数据跨越了多个分区,在没有使用order by 指定全局顺序的情况下,每次limit查询所返回的数据可能不同。
若对数据的返回顺序敏感则应该搭配 order by 一同使用。
3.select 正则表达式
基于正则表达式的查询:
示例:查询会返回n字母开头和包含字母p 的列字段:
Clickhouse> select columns('^n'),columns('p') from system.databases;
SELECT
COLUMNS('^n'),
COLUMNS('p')
FROM system.databases
┌─name───────────────────────────┬─data_path──────────────────────────┬─metadata_path──────────────────────────┐
│ _temporary_and_external_tables │ /var/lib/clickhouse/ │ │
│ datasets │ /var/lib/clickhouse/data/datasets/ │ /var/lib/clickhouse/metadata/datasets/ │
│ default │ /var/lib/clickhouse/data/default/ │ /var/lib/clickhouse/metadata/default/ │
│ scott │ /var/lib/clickhouse/data/scott/ │ /var/lib/clickhouse/metadata/scott/ │
│ system │ /var/lib/clickhouse/data/system/ │ /var/lib/clickhouse/metadata/system/ │
└────────────────────────────────┴────────────────────────────────────┴────────────────────────────────────────┘
5 rows in set. Elapsed: 0.005 sec.
4.DISTINCT
DISTINCT子句用于去除重
Clickhouse> SELECT distinct arrayJoin([10,10, 20,20, 30,null,null]) AS a;
SELECT DISTINCT arrayJoin([10, 10, 20, 20, 30, NULL, NULL]) AS a
┌────a─┐
│ 10 │
│ 20 │
│ 30 │
│ ᴺᵁᴸᴸ │
└──────┘
4 rows in set. Elapsed: 0.002 sec.
Clickhouse> SELECT arrayJoin([10,10, 20,20, 30,null,null]) AS a group by a;
SELECT arrayJoin([10, 10, 20, 20, 30, NULL, NULL]) AS a
GROUP BY a
┌────a─┐
│ 10 │
│ 20 │
│ ᴺᵁᴸᴸ │
│ 30 │
└──────┘
4 rows in set. Elapsed: 0.005 sec.
看下二者的执行计划:
Clickhouse> explain PLAN SELECT distinct arrayJoin([10,10, 20,20, 30,null,null]) AS a;
EXPLAIN
SELECT DISTINCT arrayJoin([10, 10, 20, 20, 30, NULL, NULL]) AS a
FORMAT TSV
Union
Expression (Projection)
Distinct
Distinct (Preliminary DISTINCT)
Expression (Before ORDER BY and SELECT)
ReadFromStorage (Read from SystemOne)
6 rows in set. Elapsed: 0.004 sec.
Clickhouse> explain plan SELECT arrayJoin([10,10, 20,20, 30,null,null]) AS a group by a;
EXPLAIN
SELECT arrayJoin([10, 10, 20, 20, 30, NULL, NULL]) AS a
GROUP BY a
FORMAT TSV
Union
Expression (Projection)
Expression (Before ORDER BY and SELECT)
Aggregating
Expression (Before GROUP BY)
ReadFromStorage (Read from SystemOne)
6 rows in set. Elapsed: 0.002 sec.
DISTINCT 子句的执行计划更加简单,查询时间也更快一些。
DISTINCT子句可以和GROUP BY子句同时使用,能够互补。
若使用limit语句且没有order by子句,则distinct 在满足条件的时候能够迅速结束查询,能避免多余的处理逻辑;
若distinct和 order by 子句同时使用的时候其执行的优先级为先distinct则order by。
示例:
正向排序:
Clickhouse> SELECT distinct arrayJoin([10,10, 20,20, 30,null,null]) AS a order by a ASC;
SELECT DISTINCT arrayJoin([10, 10, 20, 20, 30, NULL, NULL]) AS a
ORDER BY a ASC
┌────a─┐
│ 10 │
│ 20 │
│ 30 │
│ ᴺᵁᴸᴸ │
└──────┘
4 rows in set. Elapsed: 0.006 sec.
反向排序:
Clickhouse> SELECT distinct arrayJoin([10,10, 20,20, 30,null,null]) AS a order by a desc;
SELECT DISTINCT arrayJoin([10, 10, 20, 20, 30, NULL, NULL]) AS a
ORDER BY a DESC
┌────a─┐
│ 30 │
│ 20 │
│ 10 │
│ ᴺᵁᴸᴸ │
└──────┘
4 rows in set. Elapsed: 0.008 sec.
可以观察到执行逻辑是先distinct后order by,对于NULL的值则按照null=null 这里的NULL不区分大小写。
复数据,group by子句也可以用于数据去除,二者使用广泛。
5.UNION ALL子句
目前clickhouse 只支持union ALL
Clickhouse> select 1,'wuhan' UNION ALL select 2.0,'shenzhen' FORMAT PrettyCompactMonoBlock ;
SELECT
1,
'wuhan'
UNION ALL
SELECT
2.,
'shenzhen'
FORMAT PrettyCompactMonoBlock
┌─1─┬─'wuhan'──┐
│ 2 │ shenzhen │
│ 1 │ wuhan │
└───┴──────────┘
2 rows in set. Elapsed: 0.012 sec.
其他的union distinct
except all
except distinct
intersect all
intersect distinct
暂时不支持语句,需要用嵌套或者连接语句查询。
union distinct 示例:
Clickhouse> select distinct id,name from (select 1 as id,'wuhan' as name union all select 1,'wuhan');
SELECT DISTINCT
id,
name
FROM
(
SELECT
1 AS id,
'wuhan' AS name
UNION ALL
SELECT
1,
'wuhan'
)
┌─id─┬─name──┐
│ 1 │ wuhan │
└────┴───────┘
1 rows in set. Elapsed: 0.013 sec.