Clickhouse 查询相关的其他子句

1.limit by 子句

Clickhouse的LIMIT子句运行于order by 之后和limit之前,能够按照指定分组最多返回齐前 N 行,常应用于TOP N的查询场景。
Limit by的语法:
LIMIT n BY express

示例:
查询每个数据库分区数的最多的前两名:

Clickhouse> select database,table,count(1) cnt from system.parts where database in ('datasets','system') group by database,table order by database,cnt desc limit 2 by database; 

SELECT 
    database,
    table,
    count(1) AS cnt
FROM system.parts
WHERE database IN ('datasets', 'system')
GROUP BY 
    database,
    table
ORDER BY 
    database ASC,
    cnt DESC
LIMIT 2 BY database

┌─database─┬─table──────┬─cnt─┐
│ datasets │ hits_v1    │   1 │
│ datasets │ visits_v1  │   1 │
│ system   │ metric_log │ 134 │
│ system   │ text_log   │  86 │
└──────────┴────────────┴─────┘

4 rows in set. Elapsed: 0.017 sec. 


LIMIT BY 支持多个字段:
SELECT 
    database,
    table,
    count(1) AS cnt
FROM system.parts
WHERE database IN ('datasets', 'system')
GROUP BY 
    database,
    table
ORDER BY 
    database ASC,
    cnt DESC
LIMIT 2 BY database,table

LIMIT BY 子句支持偏移量获取数据:

SELECT 
    database,
    table,
    count(1) AS cnt
FROM system.parts
WHERE database IN ('datasets', 'system')
GROUP BY 
    database,
    table
ORDER BY 
    database ASC,
    cnt DESC
LIMIT 2 offset 1 BY database

简写如下:

SELECT 
    database,
    table,
    count(1) AS cnt
FROM system.parts
WHERE database IN ('datasets', 'system')
GROUP BY 
    database,
    table
ORDER BY 
    database ASC,
    cnt DESC
LIMIT 1,2 BY database

2. limit子句用于返回指定的前n行数据,常用于分页数场景。

LIMIT子句的语法:
limit n
limit n offset m
limit m,n

取Top 10 行数据:

Clickhouse> select number from system.numbers limit 10;

SELECT number
FROM system.numbers
LIMIT 10

┌─number─┐
│      0 │
│      1 │
│      2 │
│      3 │
│      4 │
│      5 │
│      6 │
│      7 │
│      8 │
│      9 │
└────────┘

10 rows in set. Elapsed: 0.005 sec. 

取第五行到第15行数据:

Clickhouse> select number from system.numbers limit 10 offset 5;;

SELECT number
FROM system.numbers
LIMIT 5, 10

┌─number─┐
│      5 │
│      6 │
│      7 │
│      8 │
│      9 │
│     10 │
│     11 │
│     12 │
│     13 │
│     14 │
└────────┘

10 rows in set. Elapsed: 0.003 sec. 

Clickhouse> select number from system.numbers limit 5,10;


limit子句可以和Limit by 一并使用:
SELECT 
    database,
    table,
    count(1) AS cnt
FROM system.parts
WHERE database IN ('datasets', 'system')
GROUP BY 
    database,
    table
ORDER BY 
    database ASC,
    cnt DESC
LIMIT 2 BY database
LIMIT 3

┌─database─┬─table──────┬─cnt─┐
│ datasets │ hits_v1    │   1 │
│ datasets │ visits_v1  │   1 │
│ system   │ metric_log │ 133 │
└──────────┴────────────┴─────┘

3 rows in set. Elapsed: 0.013 sec. 

limit 3 表示总的取3行数据
LIMIT 2 BY database 表示每个数据库取top 2 的数据。

注意:若数据跨越了多个分区,在没有使用order by 指定全局顺序的情况下,每次limit查询所返回的数据可能不同。
若对数据的返回顺序敏感则应该搭配 order by 一同使用。

 

3.select 正则表达式

基于正则表达式的查询:
示例:查询会返回n字母开头和包含字母p 的列字段:

Clickhouse> select columns('^n'),columns('p') from system.databases;

SELECT 
    COLUMNS('^n'),
    COLUMNS('p')
FROM system.databases

┌─name───────────────────────────┬─data_path──────────────────────────┬─metadata_path──────────────────────────┐
│ _temporary_and_external_tables │ /var/lib/clickhouse/               │                                        │
│ datasets                       │ /var/lib/clickhouse/data/datasets/ │ /var/lib/clickhouse/metadata/datasets/ │
│ default                        │ /var/lib/clickhouse/data/default/  │ /var/lib/clickhouse/metadata/default/  │
│ scott                          │ /var/lib/clickhouse/data/scott/    │ /var/lib/clickhouse/metadata/scott/    │
│ system                         │ /var/lib/clickhouse/data/system/   │ /var/lib/clickhouse/metadata/system/   │
└────────────────────────────────┴────────────────────────────────────┴────────────────────────────────────────┘

5 rows in set. Elapsed: 0.005 sec. 

4.DISTINCT

DISTINCT子句用于去除重

Clickhouse> SELECT   distinct   arrayJoin([10,10, 20,20, 30,null,null]) AS a;

SELECT DISTINCT arrayJoin([10, 10, 20, 20, 30, NULL, NULL]) AS a

┌────a─┐
│   10 │
│   20 │
│   30 │
│ ᴺᵁᴸᴸ │
└──────┘

4 rows in set. Elapsed: 0.002 sec. 

Clickhouse> SELECT     arrayJoin([10,10, 20,20, 30,null,null]) AS a group by a;

SELECT arrayJoin([10, 10, 20, 20, 30, NULL, NULL]) AS a
GROUP BY a

┌────a─┐
│   10 │
│   20 │
│ ᴺᵁᴸᴸ │
│   30 │
└──────┘

4 rows in set. Elapsed: 0.005 sec. 


看下二者的执行计划:

Clickhouse> explain PLAN SELECT   distinct   arrayJoin([10,10, 20,20, 30,null,null]) AS a;

EXPLAIN
SELECT DISTINCT arrayJoin([10, 10, 20, 20, 30, NULL, NULL]) AS a
FORMAT TSV

Union
  Expression (Projection)
    Distinct
      Distinct (Preliminary DISTINCT)
        Expression (Before ORDER BY and SELECT)
          ReadFromStorage (Read from SystemOne)

6 rows in set. Elapsed: 0.004 sec. 

Clickhouse> explain plan SELECT     arrayJoin([10,10, 20,20, 30,null,null]) AS a group by a;

EXPLAIN
SELECT arrayJoin([10, 10, 20, 20, 30, NULL, NULL]) AS a
GROUP BY a
FORMAT TSV

Union
  Expression (Projection)
    Expression (Before ORDER BY and SELECT)
      Aggregating
        Expression (Before GROUP BY)
          ReadFromStorage (Read from SystemOne)

6 rows in set. Elapsed: 0.002 sec. 


DISTINCT 子句的执行计划更加简单,查询时间也更快一些。

DISTINCT子句可以和GROUP BY子句同时使用,能够互补。

若使用limit语句且没有order by子句,则distinct 在满足条件的时候能够迅速结束查询,能避免多余的处理逻辑;
若distinct和 order by 子句同时使用的时候其执行的优先级为先distinct则order by。

示例:
 正向排序:
Clickhouse> SELECT   distinct   arrayJoin([10,10, 20,20, 30,null,null]) AS a order by a ASC;

SELECT DISTINCT arrayJoin([10, 10, 20, 20, 30, NULL, NULL]) AS a
ORDER BY a ASC

┌────a─┐
│   10 │
│   20 │
│   30 │
│ ᴺᵁᴸᴸ │
└──────┘

4 rows in set. Elapsed: 0.006 sec. 
反向排序:
Clickhouse> SELECT   distinct   arrayJoin([10,10, 20,20, 30,null,null]) AS a order by a desc;

SELECT DISTINCT arrayJoin([10, 10, 20, 20, 30, NULL, NULL]) AS a
ORDER BY a DESC

┌────a─┐
│   30 │
│   20 │
│   10 │
│ ᴺᵁᴸᴸ │
└──────┘

4 rows in set. Elapsed: 0.008 sec. 

可以观察到执行逻辑是先distinct后order by,对于NULL的值则按照null=null 这里的NULL不区分大小写。

复数据,group by子句也可以用于数据去除,二者使用广泛。

 

5.UNION ALL子句

目前clickhouse 只支持union ALL
Clickhouse> select 1,'wuhan' UNION ALL select 2.0,'shenzhen' FORMAT PrettyCompactMonoBlock ;

SELECT 
    1,
    'wuhan'
UNION ALL
SELECT 
    2.,
    'shenzhen'
FORMAT PrettyCompactMonoBlock

┌─1─┬─'wuhan'──┐
│ 2 │ shenzhen │
│ 1 │ wuhan    │
└───┴──────────┘

2 rows in set. Elapsed: 0.012 sec. 

其他的union distinct
except all
except distinct
intersect all
intersect distinct
暂时不支持语句,需要用嵌套或者连接语句查询。

union distinct 示例:

Clickhouse> select distinct id,name from (select 1 as id,'wuhan' as name union all select 1,'wuhan');

SELECT DISTINCT 
    id,
    name
FROM 
(
    SELECT 
        1 AS id,
        'wuhan' AS name
    UNION ALL
    SELECT 
        1,
        'wuhan'
)

┌─id─┬─name──┐
│  1 │ wuhan │
└────┴───────┘

1 rows in set. Elapsed: 0.013 sec. 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值