clickhouse常用查询汇总(持续更新)

请直接看目录,查看自己感兴趣的条目

查看某个表的字段大小

SELECT
name,
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
round(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS ratio
FROM system.columns
WHERE table = 'table_name'
GROUP BY name
ORDER BY sum(data_compressed_bytes) DESC

其中:

name表示列名。
formatReadableSize函数用于将字节数转换为可读的文件大小格式。
sum(data_compressed_bytes)和sum(data_uncompressed_bytes)分别计算每列的压缩后与未压缩数据总大小。
ratio显示平均压缩比率(未压缩/压缩)并保留两位小数。
查询针对表table_name。
结果按压缩数据量降序排列。

查看某个表的总大小

SELECT
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
round(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS ratio
FROM system.columns
WHERE table = 'log'

在这里插入图片描述

查看某个库的总大小

SELECT
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
round(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS ratio
FROM system.columns
WHERE database = 'dbname'
compressed_size|uncompressed_size|ratio|
---------------+-----------------+-----+
55.07 GiB      |493.46 GiB       | 8.96|

查看每天的数据分区的总大小

select
partition,
formatReadableSize(sum(bytes_on_disk)) as disk_size,
formatReadableSize(sum(data_compressed_bytes)) as compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) as data_size,
sum(data_uncompressed_bytes)/sum(data_compressed_bytes) asratio
from
`system`.parts
WHERE
`database` = 'dbname'
group by
`partition`
WITH CUBE
order by
`partition` ;

该函数的功能是从系统表system.parts中查询database为dbname的所有分区数据,并计算每个分区的磁盘大小、压缩后数据大小、未压缩数据大小以及压缩比。最后按照分区排序并输出结果。

partition	disk_size	compressed_size	data_size	asratio
202405	16.21 MiB	16.20 MiB	58.61 MiB	3.6177735124
20240516	22.10 MiB	22.09 MiB	76.78 MiB	3.4757203662
20240517	164.04 MiB	163.99 MiB	537.19 MiB	3.2758006692
20240518	159.22 MiB	159.17 MiB	511.80 MiB	3.215426161

查看每天平均增长速度

select formatReadableSize(avg(disk_size)) as disk_size, formatReadableSize(avg(compressed_size)) as compressed_size, formatReadableSize(avg(data_size)) as data_size, avg(ratio) as ratio  from (
select
    partition,
    sum(bytes_on_disk) as disk_size,
    sum(data_compressed_bytes) as compressed_size,
    sum(data_uncompressed_bytes) as data_size,
    sum(data_uncompressed_bytes)/ sum(data_compressed_bytes) as ratio
from
    `system`.parts
WHERE
    `database` = 'dbname'

group by
    `partition`
)

此查询多步完成:

  1. 从system.parts表筛选出database为dbname的数据。
  2. 按partition分组计算各分区的bytes_on_disk、data_compressed_bytes与data_uncompressed_bytes总和及压缩比。
  3. 使用formatReadableSize将平均磁盘大小、压缩后大小和数据大小转换为可读格式并展示。
disk_size|compressed_size|data_size|ratio             |
---------+---------------+---------+------------------+
7.62 GiB |7.62 GiB       |78.72 GiB|10.330222339146841|

查看所有分区的总大小

指定库:

select
formatReadableSize(sum(bytes_on_disk)) as disk_size,
formatReadableSize(sum(data_compressed_bytes)) as compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) as data_size
from
`system`.parts
WHERE
`database` = 'db'
disk_size|compressed_size|data_size |
---------+---------------+----------+
50.02 GiB|50.01 GiB      |467.57 GiB|

全部:

select
formatReadableSize(sum(bytes_on_disk)) as disk_size,
formatReadableSize(sum(data_compressed_bytes)) as compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) as data_size
from
`system`.parts

查看某一个分区的列的压缩比

select				
column,
formatReadableSize(sum(column_bytes_on_disk)) as disk_size,
formatReadableSize(sum(column_data_compressed_bytes)) as compressed_size,
formatReadableSize(sum(column_data_uncompressed_bytes)) as data_size,
sum(column_data_uncompressed_bytes)/sum(column_data_compressed_bytes) as ratio
from
`system`.parts_columns
WHERE table = 'tablename' and `partition` = '20240610'
group by
`column` ORDER BY sum(column_bytes_on_disk) DESC
column	disk_size	compressed_size	data_size	ratio
url	989.29 MiB	989.23 MiB	4.73 GiB	4.8934305444
client_ip	122.92 MiB	122.87 MiB	234.32 MiB	1.9070819244
client_port	110.10 MiB	110.05 MiB	117.16 MiB	1.0646291515
...

慢查询检索和诊断

SELECT 
query_start_time, 
query_duration_ms / 1000 AS query_duration_seconds, 
query, 
user, 
client_hostname, 
client_name
FROM system.query_log
WHERE event_time >= now() - INTERVAL 1 DAY
AND type = 2  
AND query_duration_ms > 10000
ORDER BY query_duration_seconds DESC
LIMIT 50;

该SQL查询从system.query_log表中选择过去一天内(从现在起)查询时间超过10秒的最慢的50个查询。它显示查询开始时间、查询持续时间(以秒为单位)、查询语句、用户、客户端主机名和客户端名称。查询持续时间以毫秒为单位从源数据中提取并转换为秒。

query_start_time   |query_duration_seconds|query                                                                                       |user   |client_hostname      |client_name      |
-------------------+----------------------+--------------------------------------------------------------------------------------------+-------+---------------------+-----------------+
2024-05-28 16:33:03|               481.812|ALTER TABLE xxx MODIFY TTL time+ INTERVAL 6 MONTH DELETE;     |default|C20240518713622.local|ClickHouse client|
...

修改表的TTL

ALTER TABLE tablename MODIFY TTL TimeRangeStart  + INTERVAL 6 MONTH DELETE;  

查看分区TTL删除日志记录

SELECT *
FROM system.part_log
WHERE merge_reason = 'TTLDeleteMerge'

优化表(主动合并所有的part)

OPTIMIZE TABLE table_name;

使用With 将map转为key、value列

with arrayJoin(sumMap(c_Map)) as n
select  domain,hour, n.1 as ip, n.2 as count from aaha WHERE  `hour` >= toStartOfDay(now())  group by  domain, hour  

URL 处理

分割

SELECT
protocol('https://example.com:8080/path?query=1#fragment') AS protocol,
domain('https://example.com:8080/path?query=1#fragment') AS domain,
port('https://example.com:8080/path?query=1#fragment') AS port,
path('https://example.com:8080/path?query=1#fragment') AS path,
queryString('https://example.com:8080/path?query=1#fragment') AS queryString,
fragment('https://example.com:8080/path?query=1#fragment') AS fragment

输出:
protocol domain	port	path	queryString	fragment
https	example.com	8,080	/path	query=1	fragment
参数获取:
SELECT extractURLParameter('https://example.com/path?query=1&foo=bar', 'foo') AS foo_value

|foo_value|
|---------|
|bar      |

所有参数:
SELECT extractURLParameters('https://example.com/path?query=1&foo=bar') AS all_parameters

all_parameters       |
---------------------+
['query=1','foo=bar']|

移除某个参数:
SELECT cutURLParameter('https://example.com/path?query=1&foo=bar', 'foo') AS url_without_foo

url_without_foo                 |
--------------------------------+
https://example.com/path?query=1|

利用窗口查询,从累加数据计算出差异量

SELECT
	tablename.nid AS nid,
	tablename.record_time AS record_time,
	greatest(packets_in - lagInFrame(packets_in, 1,	packets_in) OVER wnd,	0) AS packets_in_diff,
	greatest(packets_out - lagInFrame(packets_out, 1,	packets_out) OVER wnd,	0) AS packets_out_diff,
FROM
	tablename
WHERE tablename.record_time >= 1718600000
	AND tablename.record_time < 1718678346
	AND 1 = 1 WINDOW wnd AS (PARTITION BY nid ORDER BY	record_time)
ORDER BY
	tablename.nid,
	tablename.record_time

此查询从tablename选择nid和record_time,并计算packets_in及packets_out相对于前一行的变化量,具体如下:

  • nid: 节点ID。 record_time: 记录时间。
  • packets_in_diff:当前行与前一行的packets_in差值,若差值小于0则取0
  • packets_out_diff:当前行与前一行的packets_out差值,若差值小于0则取0。
  • 窗口wnd按nid分组并按record_time排序。查询过滤record_time范围在1718600000至1718678346之间,并按nid和record_time排序。
  • lagInFrame是在窗口中的偏移量。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值