请直接看目录,查看自己感兴趣的条目
查看某个表的字段大小
SELECT
name,
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
round(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS ratio
FROM system.columns
WHERE table = 'table_name'
GROUP BY name
ORDER BY sum(data_compressed_bytes) DESC
其中:
name表示列名。
formatReadableSize函数用于将字节数转换为可读的文件大小格式。
sum(data_compressed_bytes)和sum(data_uncompressed_bytes)分别计算每列的压缩后与未压缩数据总大小。
ratio显示平均压缩比率(未压缩/压缩)并保留两位小数。
查询针对表table_name。
结果按压缩数据量降序排列。
查看某个表的总大小
SELECT
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
round(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS ratio
FROM system.columns
WHERE table = 'log'
查看某个库的总大小
SELECT
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
round(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS ratio
FROM system.columns
WHERE database = 'dbname'
compressed_size|uncompressed_size|ratio|
---------------+-----------------+-----+
55.07 GiB |493.46 GiB | 8.96|
查看每天的数据分区的总大小
select
partition,
formatReadableSize(sum(bytes_on_disk)) as disk_size,
formatReadableSize(sum(data_compressed_bytes)) as compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) as data_size,
sum(data_uncompressed_bytes)/sum(data_compressed_bytes) asratio
from
`system`.parts
WHERE
`database` = 'dbname'
group by
`partition`
WITH CUBE
order by
`partition` ;
该函数的功能是从系统表system.parts中查询database为dbname的所有分区数据,并计算每个分区的磁盘大小、压缩后数据大小、未压缩数据大小以及压缩比。最后按照分区排序并输出结果。
partition disk_size compressed_size data_size asratio
202405 16.21 MiB 16.20 MiB 58.61 MiB 3.6177735124
20240516 22.10 MiB 22.09 MiB 76.78 MiB 3.4757203662
20240517 164.04 MiB 163.99 MiB 537.19 MiB 3.2758006692
20240518 159.22 MiB 159.17 MiB 511.80 MiB 3.215426161
查看每天平均增长速度
select formatReadableSize(avg(disk_size)) as disk_size, formatReadableSize(avg(compressed_size)) as compressed_size, formatReadableSize(avg(data_size)) as data_size, avg(ratio) as ratio from (
select
partition,
sum(bytes_on_disk) as disk_size,
sum(data_compressed_bytes) as compressed_size,
sum(data_uncompressed_bytes) as data_size,
sum(data_uncompressed_bytes)/ sum(data_compressed_bytes) as ratio
from
`system`.parts
WHERE
`database` = 'dbname'
group by
`partition`
)
此查询多步完成:
- 从system.parts表筛选出database为dbname的数据。
- 按partition分组计算各分区的bytes_on_disk、data_compressed_bytes与data_uncompressed_bytes总和及压缩比。
- 使用formatReadableSize将平均磁盘大小、压缩后大小和数据大小转换为可读格式并展示。
disk_size|compressed_size|data_size|ratio |
---------+---------------+---------+------------------+
7.62 GiB |7.62 GiB |78.72 GiB|10.330222339146841|
查看所有分区的总大小
指定库:
select
formatReadableSize(sum(bytes_on_disk)) as disk_size,
formatReadableSize(sum(data_compressed_bytes)) as compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) as data_size
from
`system`.parts
WHERE
`database` = 'db'
disk_size|compressed_size|data_size |
---------+---------------+----------+
50.02 GiB|50.01 GiB |467.57 GiB|
全部:
select
formatReadableSize(sum(bytes_on_disk)) as disk_size,
formatReadableSize(sum(data_compressed_bytes)) as compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) as data_size
from
`system`.parts
查看某一个分区的列的压缩比
select
column,
formatReadableSize(sum(column_bytes_on_disk)) as disk_size,
formatReadableSize(sum(column_data_compressed_bytes)) as compressed_size,
formatReadableSize(sum(column_data_uncompressed_bytes)) as data_size,
sum(column_data_uncompressed_bytes)/sum(column_data_compressed_bytes) as ratio
from
`system`.parts_columns
WHERE table = 'tablename' and `partition` = '20240610'
group by
`column` ORDER BY sum(column_bytes_on_disk) DESC
column disk_size compressed_size data_size ratio
url 989.29 MiB 989.23 MiB 4.73 GiB 4.8934305444
client_ip 122.92 MiB 122.87 MiB 234.32 MiB 1.9070819244
client_port 110.10 MiB 110.05 MiB 117.16 MiB 1.0646291515
...
慢查询检索和诊断
SELECT
query_start_time,
query_duration_ms / 1000 AS query_duration_seconds,
query,
user,
client_hostname,
client_name
FROM system.query_log
WHERE event_time >= now() - INTERVAL 1 DAY
AND type = 2
AND query_duration_ms > 10000
ORDER BY query_duration_seconds DESC
LIMIT 50;
该SQL查询从system.query_log表中选择过去一天内(从现在起)查询时间超过10秒的最慢的50个查询。它显示查询开始时间、查询持续时间(以秒为单位)、查询语句、用户、客户端主机名和客户端名称。查询持续时间以毫秒为单位从源数据中提取并转换为秒。
query_start_time |query_duration_seconds|query |user |client_hostname |client_name |
-------------------+----------------------+--------------------------------------------------------------------------------------------+-------+---------------------+-----------------+
2024-05-28 16:33:03| 481.812|ALTER TABLE xxx MODIFY TTL time+ INTERVAL 6 MONTH DELETE; |default|C20240518713622.local|ClickHouse client|
...
修改表的TTL
ALTER TABLE tablename MODIFY TTL TimeRangeStart + INTERVAL 6 MONTH DELETE;
查看分区TTL删除日志记录
SELECT *
FROM system.part_log
WHERE merge_reason = 'TTLDeleteMerge'
优化表(主动合并所有的part)
OPTIMIZE TABLE table_name;
使用With 将map转为key、value列
with arrayJoin(sumMap(c_Map)) as n
select domain,hour, n.1 as ip, n.2 as count from aaha WHERE `hour` >= toStartOfDay(now()) group by domain, hour
URL 处理
分割
SELECT
protocol('https://example.com:8080/path?query=1#fragment') AS protocol,
domain('https://example.com:8080/path?query=1#fragment') AS domain,
port('https://example.com:8080/path?query=1#fragment') AS port,
path('https://example.com:8080/path?query=1#fragment') AS path,
queryString('https://example.com:8080/path?query=1#fragment') AS queryString,
fragment('https://example.com:8080/path?query=1#fragment') AS fragment
输出:
protocol domain port path queryString fragment
https example.com 8,080 /path query=1 fragment
参数获取:
SELECT extractURLParameter('https://example.com/path?query=1&foo=bar', 'foo') AS foo_value
|foo_value|
|---------|
|bar |
所有参数:
SELECT extractURLParameters('https://example.com/path?query=1&foo=bar') AS all_parameters
all_parameters |
---------------------+
['query=1','foo=bar']|
移除某个参数:
SELECT cutURLParameter('https://example.com/path?query=1&foo=bar', 'foo') AS url_without_foo
url_without_foo |
--------------------------------+
https://example.com/path?query=1|
利用窗口查询,从累加数据计算出差异量
SELECT
tablename.nid AS nid,
tablename.record_time AS record_time,
greatest(packets_in - lagInFrame(packets_in, 1, packets_in) OVER wnd, 0) AS packets_in_diff,
greatest(packets_out - lagInFrame(packets_out, 1, packets_out) OVER wnd, 0) AS packets_out_diff,
FROM
tablename
WHERE tablename.record_time >= 1718600000
AND tablename.record_time < 1718678346
AND 1 = 1 WINDOW wnd AS (PARTITION BY nid ORDER BY record_time)
ORDER BY
tablename.nid,
tablename.record_time
此查询从tablename选择nid和record_time,并计算packets_in及packets_out相对于前一行的变化量,具体如下:
- nid: 节点ID。 record_time: 记录时间。
- packets_in_diff:当前行与前一行的packets_in差值,若差值小于0则取0
- packets_out_diff:当前行与前一行的packets_out差值,若差值小于0则取0。
- 窗口wnd按nid分组并按record_time排序。查询过滤record_time范围在1718600000至1718678346之间,并按nid和record_time排序。
- lagInFrame是在窗口中的偏移量。