clickhouse的sql执行计划

23 篇文章 1 订阅
7 篇文章 1 订阅

Clickhouse目前(20.6版本之前)没有直接提供explain查询,但是可以借助于后台的服务日志(设置为debug或者trace即可),能变相的实现该功能。

clickhouse-client -h localhost --send_logs_level=trace  <<<"SQL语句" >/dev/null

这里我们使用clickhouse官方提供的hits_v1表来说明。

hits表为clickhouse官方提供的数据集,表的定义如下:

 
  1. Clickhouse> show create table datasets.hits_v1\G

  2.  
  3. SHOW CREATE TABLE datasets.hits_v1

  4.  
  5. Row 1:

  6. ──────

  7. statement: CREATE TABLE datasets.hits_v1

  8. (

  9. `WatchID` UInt64,

  10. `JavaEnable` UInt8,

  11. `Title` String,

  12. `GoodEvent` Int16,

  13. `EventTime` DateTime,

  14. `EventDate` Date,

  15. `CounterID` UInt32,

  16. `ClientIP` UInt32,

  17. `ClientIP6` FixedString(16),

  18. `RegionID` UInt32,

  19. ......

  20. `RequestNum` UInt32,

  21. `RequestTry` UInt8

  22. )

  23. ENGINE = MergeTree()

  24. PARTITION BY toYYYYMM(EventDate)

  25. ORDER BY (CounterID, EventDate, intHash32(UserID))

  26. SAMPLE BY intHash32(UserID)

  27. SETTINGS index_granularity = 8192

  28.  
  29. 1 rows in set. Elapsed: 0.009 sec.

  30.  
  31. 分区情况:

  32. Clickhouse> select partition,name from system.parts where table='hits_v1' and active;

  33.  
  34. SELECT

  35. partition,

  36. name

  37. FROM system.parts

  38. WHERE (table = 'hits_v1') AND active

  39.  
  40. ┌─partition─┬─name───────────┐

  41. │ 201403 │ 201403_10_18_2 │

  42. └───────────┴────────────────┘

  43.  
  44. 1 rows in set. Elapsed: 0.008 sec.

  45.  
  46. 已经将其合并为一个分区了。

  47.  
  48.  
  49. 重新多个分区的过程:

  50. Clickhouse> truncate table hits_v1;

  51.  
  52. TRUNCATE TABLE hits_v1

  53.  
  54. Ok.

  55.  
  56. 0 rows in set. Elapsed: 0.088 sec.

  57. 导入数据:

  58.  
  59. # cat hits_v1.tsv | clickhouse-client --query "INSERT INTO datasets.hits_v1 FORMAT TSV" --max_insert_block_size=100000

  60.  
  61. 查看表的分区情况:

  62.  
  63. Clickhouse> select partition,name from system.parts where table='hits_v1' and active=1;

  64.  
  65. SELECT

  66. partition,

  67. name

  68. FROM system.parts

  69. WHERE (table = 'hits_v1') AND (active = 1)

  70.  
  71. ┌─partition─┬─name───────────┐

  72. │ 201403 │ 201403_19_24_1 │

  73. │ 201403 │ 201403_25_30_1 │

  74. │ 201403 │ 201403_31_31_0 │

  75. │ 201403 │ 201403_32_32_0 │

  76. │ 201403 │ 201403_33_33_0 │

  77. │ 201403 │ 201403_34_34_0 │

  78. │ 201403 │ 201403_35_35_0 │

  79. │ 201403 │ 201403_36_36_0 │

  80. │ 201403 │ 201403_37_37_0 │

  81. │ 201403 │ 201403_38_38_0 │

  82. │ 201403 │ 201403_39_39_0 │

  83. │ 201403 │ 201403_40_40_0 │

  84. └───────────┴────────────────┘

  85.  
  86. 12 rows in set. Elapsed: 0.040 sec.

  87.  

1.全字段全表扫描:

 
  1.  
  2. # clickhouse-client -h localhost --send_logs_level=trace <<<"select * from datasets.hits_v1" >/dev/null

  3.  
  4. [hadoop.example.com] 2020.07.14 14:31:26.747392 [ 15000 ] {ef471d40-0b3f-43f1-936e-0176907c0310} <Debug> datasets.hits_v1 (SelectExecutor): Key condition: unknown

  5. [hadoop.example.com] 2020.07.14 14:31:26.747460 [ 15000 ] {ef471d40-0b3f-43f1-936e-0176907c0310} <Debug> datasets.hits_v1 (SelectExecutor): MinMax index condition: unknown

  6. [hadoop.example.com] 2020.07.14 14:31:26.747521 [ 15000 ] {ef471d40-0b3f-43f1-936e-0176907c0310} <Debug> datasets.hits_v1 (SelectExecutor): Selected 6 parts by date, 6 parts by key, 1086 marks to read from 6 ranges

  7. [hadoop.example.com] 2020.07.14 14:31:26.751436 [ 15000 ] {ef471d40-0b3f-43f1-936e-0176907c0310} <Trace> datasets.hits_v1 (SelectExecutor): Reading approx. 8896512 rows with 4 streams

  8. [hadoop.example.com] 2020.07.14 14:31:26.753085 [ 15000 ] {ef471d40-0b3f-43f1-936e-0176907c0310} <Trace> InterpreterSelectQuery: FetchColumns -> Complete

  9. [hadoop.example.com] 2020.07.14 14:32:40.191365 [ 15000 ] {ef471d40-0b3f-43f1-936e-0176907c0310} <Information> executeQuery: Read 8873898 rows, 7.88 GiB in 73.449650215 sec., 120816 rows/sec., 109.85 MiB/sec.

  10. [hadoop.example.com] 2020.07.14 14:32:40.191461 [ 15000 ] {ef471d40-0b3f-43f1-936e-0176907c0310} <Debug> MemoryTracker: Peak memory usage (for query): 528.56 MiB.

  11.  
  12.  
  13.  
  14. 上面的信息重点关注右边即可:

  15. <Debug> datasets.hits_v1 (SelectExecutor): Key condition: unknown

  16. <Debug> datasets.hits_v1 (SelectExecutor): MinMax index condition: unknown

  17. <Debug> datasets.hits_v1 (SelectExecutor): Selected 6 parts by date, 6 parts by key, 1086 marks to read from 6 ranges

  18. <Trace> datasets.hits_v1 (SelectExecutor): Reading approx. 8896512 rows with 4 streams

  19. <Trace> InterpreterSelectQuery: FetchColumns -> Complete

  20. <Information> executeQuery: Read 8873898 rows, 7.88 GiB in 73.449650215 sec., 120816 rows/sec., 109.85 MiB/sec.

  21. <Debug> MemoryTracker: Peak memory usage (for query): 528.56 MiB.

  22.  

2.单个字段的全表扫描:

 

 
  1. # clickhouse-client -h localhost --send_logs_level=trace <<<"select WatchID from datasets.hits_v1" >/dev/null

  2.  
  3. <Debug> executeQuery: (from [::1]:58630) select WatchID from datasets.hits_v1

  4. <Trace> ContextAccess (default): Access granted: SELECT(WatchID) ON datasets.hits_v1

  5. <Debug> datasets.hits_v1 (SelectExecutor): Key condition: unknown

  6. <Debug> datasets.hits_v1 (SelectExecutor): MinMax index condition: unknown

  7. <Debug> datasets.hits_v1 (SelectExecutor): Selected 6 parts by date, 6 parts by key, 1086 marks to read from 6 ranges

  8. <Trace> datasets.hits_v1 (SelectExecutor): Reading approx. 8896512 rows with 4 streams

  9. <Trace> InterpreterSelectQuery: FetchColumns -> Complete

  10. <Information> executeQuery: Read 8873898 rows, 67.70 MiB in 0.568247033 sec., 15616268 rows/sec., 119.14 MiB/sec.

  11. <Debug> MemoryTracker: Peak memory usage (for query): 18.90 MiB.

  12.  
  13.  
  14. 观察后面的两行数据 都的数据大小变为67.70MiB

  15. 内存的峰值:18.99MiB

 3.使用分区索引:

 
  1. # clickhouse-client -h localhost --send_logs_level=trace <<<"select WatchID from datasets.hits_v1 where EventDate='2014-03-18'" >/dev/null

  2.  
  3. <Debug> executeQuery: (from [::1]:58632) select WatchID from datasets.hits_v1 where EventDate='2014-03-18'

  4. <Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "EventDate = '2014-03-18'" moved to PREWHERE

  5. <Trace> ContextAccess (default): Access granted: SELECT(WatchID, EventDate) ON datasets.hits_v1

  6. <Debug> datasets.hits_v1 (SelectExecutor): Key condition: (column 1 in [16147, 16147])

  7. <Debug> datasets.hits_v1 (SelectExecutor): MinMax index condition: (column 0 in [16147, 16147])

  8. <Debug> datasets.hits_v1 (SelectExecutor): Selected 6 parts by date, 6 parts by key, 853 marks to read from 70 ranges

  9. <Trace> datasets.hits_v1 (SelectExecutor): Reading approx. 6987776 rows with 4 streams

  10. <Trace> InterpreterSelectQuery: FetchColumns -> Complete

  11. <Information> executeQuery: Read 6965162 rows, 65.35 MiB in 0.202466468 sec., 34401558 rows/sec., 322.75 MiB/sec.

  12. <Debug> MemoryTracker: Peak memory usage (for query): 27.02 MiB.

  13.  
  14. 1.where 子句被优化为prewhere子句

  15. 2.分区索引被启动

  16. 3.分区索引 扫描了6个

  17. 由于没有启用主键索引,该查询仍然扫描了9个分区内的 853个markRange.

  18. 853 marks to read from 70 ranges

4.使用主键索引:

 
  1. # clickhouse-client -h localhost --send_logs_level=trace <<<"select WatchID from datasets.hits_v1 where EventDate='2014-03-18' and CounterID=67141" >/dev/null

  2.  
  3. <Debug> executeQuery: (from [::1]:58634) select WatchID from datasets.hits_v1 where EventDate='2014-03-18' and CounterID=67141

  4. <Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "EventDate = '2014-03-18'" moved to PREWHERE

  5. <Trace> ContextAccess (default): Access granted: SELECT(WatchID, EventDate, CounterID) ON datasets.hits_v1

  6. <Debug> datasets.hits_v1 (SelectExecutor): Key condition: (column 1 in [16147, 16147]), (column 0 in [67141, 67141]), and, (column 1 in [16147, 16147]), and

  7. <Debug> datasets.hits_v1 (SelectExecutor): MinMax index condition: (column 0 in [16147, 16147]), unknown, and, (column 0 in [16147, 16147]), and

  8. <Debug> datasets.hits_v1 (SelectExecutor): Selected 6 parts by date, 6 parts by key, 6 marks to read from 6 ranges

  9. <Trace> datasets.hits_v1 (SelectExecutor): Reading approx. 49152 rows with 4 streams

  10. <Trace> InterpreterSelectQuery: FetchColumns -> Complete

  11. <Information> executeQuery: Read 49152 rows, 598.09 KiB in 0.033530217 sec., 1465901 rows/sec., 17.42 MiB/sec.

  12. <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.

  13.  
  14.  
  15. 主键被启动:

  16. Key condition: (column 1 in [16147, 16147]), (column 0 in [67141, 67141]), and, (column 1 in [16147, 16147]), and

  17. 扫描的MarkRange:

  18. Selected 6 parts by date, 6 parts by key, 6 marks to read from 6 ranges

  19.  
  20. 读取到内存的预估数据量:

  21. Reading approx. 49152 rows with 4 streams

总结:

 
  1. 1.将Clickhouse的服务日志设置为debug或者trace级别 可以实现查看执行计划

  2. 2.需要真正执行了SQL查询,clickhouse才能打印执行计划的日志,若表的数据很大需要借助limit减少返回的数据量

  3. 3.禁用select * 全字段查询语句

  4. 4.尽可能利用各种索引(分区索引,一级索引,二级索引)可以避免全表扫描。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值