ClickHouse研究

目录

 

一、介绍

二、特点

三、表引擎介绍

3.1. Log系列

3.2. Integration系列

3.3. Special系列

3.4. MergeTree系列

3.4.1. MergeTree

3.4.2. ReplacingMergeTree

3.4.3. CollapsingMergeTree

3.4.4. VersionedCollapsingMergeTree

3.4.5. SummingMergeTree

3.4.6. AggregatingMergeTree

四、安装

4.1. 系统要求

4.2. RPM安装包

4.3. 配置文件

4.3.1. 覆写

4.3.2. 替代

4.3.3. 用户设置

4.3.4. 服务器设置

4.3.5. 简化版配置文件


 

一、介绍

ClickHouse 是俄罗斯的 Yandex 于 2016 年开源的用于在线分析处理查询(OLAP :Online Analytical Processing)MPP架构列式存储数据库(DBMS:Database Management System),能够使用 SQL 查询实时生成分析数据报告。ClickHouse的全称是Click Stream,Data WareHouse。

二、特点

优势·:

  • 基于shard+replica实现的线性扩展和高可靠
  • 采用列式存储,数据类型一致,压缩性能更高
  • 硬件利用率高,连续IO,提高了磁盘驱动器的效率
  • 向量化引擎与SIMD提高了CPU利用率,多核多节点并行化大查询

不足:

  • 不支持事务、异步删除与更新
  • 不适用高并发场景

三、表引擎介绍

一共分为四个系列,分别是Log、MergeTree、Integration、Special;下图是ClickHouse提供的所有表引擎汇总。

format,png

3.1. Log系列

Log系列表引擎功能相对简单,主要用于快速写入小表(1百万行左右的表),然后全部读出的场景。

几种Log表引擎的共性是:

  • 数据被顺序append写到磁盘上;
  • 不支持delete、update;
  • 不支持index;
  • 不支持原子性写;
  • insert会阻塞select操作。

它们彼此之间的区别是:

  • TinyLog:不支持并发读取数据文件,查询性能较差;格式简单,适合用来暂存中间数据;
  • StripLog:支持并发读取数据文件,查询性能比TinyLog好;将所有列存储在同一个大文件中,减少了文件个数;
  • Log:支持并发读取数据文件,查询性能比TinyLog好;每个列会单独存储在一个独立文件中。

3.2. Integration系列

该系统表引擎主要用于将外部数据导入到ClickHouse中,或者在ClickHouse中直接操作外部数据源。

  • Kafka:将Kafka Topic中的数据直接导入到ClickHouse;
  • MySQL:将Mysql作为存储引擎,直接在ClickHouse中对MySQL表进行select等操作;
  • JDBC/ODBC:通过指定jdbc、odbc连接串读取数据源;
  • HDFS:直接读取HDFS上的特定格式的数据文件;

3.3. Special系列

Special系列的表引擎,大多是为了特定场景而定制的。这里也挑选几个简单介绍,不做详述。

  • Memory:将数据存储在内存中,重启后会导致数据丢失。查询性能极好,适合于对于数据持久性没有要求的1亿一下的小表。在ClickHouse中,通常用来做临时表。
  • Buffer:为目标表设置一个内存buffer,当buffer达到了一定条件之后会flush到磁盘。
  • File:直接将本地文件作为数据存储;
  • Null:写入数据被丢弃、读取数据为空;

3.4. MergeTree系列

Log、Special、Integration主要用于特殊用途,场景相对有限。MergeTree系列才是官方主推的存储引擎,支持几乎所有ClickHouse核心功能。

以下重点介绍MergeTree、ReplacingMergeTree、CollapsingMergeTree、VersionedCollapsingMergeTree、SummingMergeTree、AggregatingMergeTree引擎。

3.4.1. MergeTree

MergeTree表引擎主要用于海量数据分析,支持数据分区、存储有序、主键索引、稀疏索引、数据TTL等。MergeTree支持所有ClickHouse SQL语法,但是有些功能与MySQL并不一致,比如在MergeTree中主键并不用于去重,以下通过示例说明。

如下建表DDL所示,test_tbl的主键为(id, create_time),并且按照主键进行存储排序,按照create_time进行数据分区,数据保留最近一个月。

CREATE TABLE test_tbl (
  id UInt16,
  create_time Date,
  comment Nullable(String)
) ENGINE = MergeTree()
PARTITION BY create_time
ORDER BY  (id, create_time)
PRIMARY KEY (id, create_time)
TTL create_time + INTERVAL 1 MONTH
SETTINGS index_granularity=8192;

写入数据:值得注意的是这里我们写入了几条primary key相同的数据。

insert into test_tbl values(0, '2019-12-12', null);
insert into test_tbl values(0, '2019-12-12', null);
insert into test_tbl values(1, '2019-12-13', null);
insert into test_tbl values(1, '2019-12-13', null);
insert into test_tbl values(2, '2019-12-14', null);

查询数据: 可以看到虽然主键id、create_time相同的数据只有3条数据,但是结果却有5行。

select count(*) from test_tbl;
┌─count()─┐
│       5 │
└─────────┘

select * from test_tbl;
┌─id─┬─create_time─┬─comment─┐
│  2 │  2019-12-14 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  1 │  2019-12-13 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  0 │  2019-12-12 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  1 │  2019-12-13 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  0 │  2019-12-12 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘

由于MergeTree采用类似LSM tree的结构,很多存储层处理逻辑直到Compaction期间才会发生。因此强制后台compaction执行完毕,再次查询,发现仍旧有5条数据。

optimize table test_tbl final;


select count(*) from test_tbl;
┌─count()─┐
│       5 │
└─────────┘

select * from test_tbl;
┌─id─┬─create_time─┬─comment─┐
│  2 │  2019-12-14 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  0 │  2019-12-12 │ ᴺᵁᴸᴸ    │
│  0 │  2019-12-12 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  1 │  2019-12-13 │ ᴺᵁᴸᴸ    │
│  1 │  2019-12-13 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘

结合以上示例可以看到,MergeTree虽然有主键索引,但是其主要作用是加速查询,而不是类似MySQL等数据库用来保持记录唯一。即便在Compaction完成后,主键相同的数据行也仍旧共同存在。

3.4.2. ReplacingMergeTree

为了解决MergeTree相同主键无法去重的问题,ClickHouse提供了ReplacingMergeTree引擎,用来做去重。

示例如下:

CREATE TABLE test_tbl_replacing (
  id UInt16,
  create_time Date,
  comment Nullable(String)
) ENGINE = ReplacingMergeTree()
   PARTITION BY create_time
     ORDER BY  (id, create_time)
     PRIMARY KEY (id, create_time)
     TTL create_time + INTERVAL 1 MONTH
     SETTINGS index_granularity=8192;

insert into test_tbl_replacing values(0, '2019-12-12', null);
insert into test_tbl_replacing values(0, '2019-12-12', null);
insert into test_tbl_replacing values(1, '2019-12-13', null);
insert into test_tbl_replacing values(1, '2019-12-13', null);
insert into test_tbl_replacing values(2, '2019-12-14', null);

#未执行去重,就去查询
select count(*) from test_tbl_replacing;
┌─count()─┐
│       5 │
└─────────┘

select * from test_tbl_replacing;
┌─id─┬─create_time─┬─comment─┐
│  0 │  2019-12-12 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  0 │  2019-12-12 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  1 │  2019-12-13 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  1 │  2019-12-13 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  2 │  2019-12-14 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘

#执行去重
optimize table test_tbl_replacing final;

select count(*) from test_tbl_replacing;
┌─count()─┐
│       3 │
└─────────┘

select * from test_tbl_replacing;
┌─id─┬─create_time─┬─comment─┐
│  2 │  2019-12-14 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  1 │  2019-12-13 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  0 │  2019-12-12 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘

虽然ReplacingMergeTree提供了主键去重的能力,但是仍旧有以下限制:

  • 在没有彻底optimize之前,可能无法达到主键去重的效果,比如部分数据已经被去重,而另外一部分数据仍旧有主键重复;
  • 在分布式场景下,相同primary key的数据可能被sharding到不同节点上,不同shard间可能无法去重;
  • optimize是后台动作,无法预测具体执行时间点;
  • 手动执行optimize在海量数据场景下要消耗大量时间,无法满足业务即时查询的需求;

因此ReplacingMergeTree更多被用于确保数据最终被去重,而无法保证查询过程中主键不重复。

3.4.3. CollapsingMergeTree

ClickHouse实现了CollapsingMergeTree来消除ReplacingMergeTree的限制。该引擎要求在建表语句中指定一个标记列Sign,后台Compaction时会将主键相同、Sign相反的行进行折叠,也即删除。

CollapsingMergeTree将行按照Sign的值分为两类:Sign=1的行称之为状态行,Sign=-1的行称之为取消行。

每次需要新增状态时,写入一行状态行;需要删除状态时,则写入一行取消行。

在后台Compaction时,状态行与取消行会自动做折叠(删除)处理。而尚未进行Compaction的数据,状态行与取消行同时存在。

因此为了能够达到主键折叠(删除)的目的,需要业务层进行适当改造:

1) 执行删除操作需要写入取消行,而取消行中需要包含与原始状态行一样的数据(Sign列除外)。所以在应用层需要记录原始状态行的值,或者在执行删除操作前先查询数据库获取原始状态行;

2)由于后台Compaction时机无法预测,在发起查询时,状态行和取消行可能尚未被折叠;另外,ClickHouse无法保证primary key相同的行落在同一个节点上,不在同一节点上的数据无法折叠。因此在进行count(*)、sum(col)等聚合计算时,可能会存在数据冗余的情况。为了获得正确结果,业务层需要改写SQL,将count()、sum(col)分别改写为sum(Sign)、sum(col * Sign)。

以下用示例说明:

CREATE TABLE UAct
(
    UserID UInt64,
    PageViews UInt8,
    Duration UInt8,
    Sign Int8
)
ENGINE = CollapsingMergeTree(Sign)
ORDER BY UserID;

INSERT INTO UAct VALUES (4324182021466249494, 5, 146, 1);

INSERT INTO UAct VALUES (4324182021466249494, 5, 146, -1), (4324182021466249494, 6, 185, 1);

#未Compaction,就进行查询
SELECT * FROM UAct;
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249494 │         5 │      146 │   -1 │
│ 4324182021466249494 │         6 │      185 │    1 │
└─────────────────────┴───────────┴──────────┴──────┘
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249494 │         5 │      146 │    1 │
└─────────────────────┴───────────┴──────────┴──────┘

SELECT
    UserID,
    sum(PageViews * Sign) AS PageViews,
    sum(Duration * Sign) AS Duration
FROM UAct
GROUP BY UserID
HAVING sum(Sign) > 0;
┌──────────────UserID─┬─PageViews─┬─Duration─┐
│ 4324182021466249494 │         6 │      185 │
└─────────────────────┴───────────┴──────────┘

#Compaction后,进行查询
optimize table UAct final;

select * from UAct;
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249494 │         6 │      185 │    1 │
└─────────────────────┴───────────┴──────────┴──────┘

CollapsingMergeTree虽然解决了主键相同的数据即时删除的问题,但是状态持续变化且多线程并行写入情况下,状态行与取消行位置可能乱序,导致无法正常折叠。

如下面例子所示:

CREATE TABLE UAct_order
(
    UserID UInt64,
    PageViews UInt8,
    Duration UInt8,
    Sign Int8
)
ENGINE = CollapsingMergeTree(Sign)
ORDER BY UserID;

#插入顺序乱的
INSERT INTO UAct_order VALUES (4324182021466249495, 5, 146, -1);
INSERT INTO UAct_order VALUES (4324182021466249495, 5, 146, 1);


optimize table UAct_order final;

select * from UAct_order;
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249495 │         5 │      146 │   -1 │
│ 4324182021466249495 │         5 │      146 │    1 │
└─────────────────────┴───────────┴──────────┴──────┘

3.4.4. VersionedCollapsingMergeTree

为了解决CollapsingMergeTree乱序写入情况下无法正常折叠问题,VersionedCollapsingMergeTree表引擎在建表语句中新增了一列Version,用于在乱序情况下记录状态行与取消行的对应关系。主键相同,且Version相同、Sign相反的行,在Compaction时会被删除。

与CollapsingMergeTree类似, 为了获得正确结果,业务层需要改写SQL,将count()、sum(col)分别改写为sum(Sign)、sum(col * Sign)。

示例如下:

CREATE TABLE UAct_version
(
    UserID UInt64,
    PageViews UInt8,
    Duration UInt8,
    Sign Int8,
    Version UInt8
)
ENGINE = VersionedCollapsingMergeTree(Sign, Version)
ORDER BY UserID;

#插入顺序乱的
INSERT INTO UAct_version VALUES (4324182021466249494, 5, 146, -1, 1);

INSERT INTO UAct_version VALUES (4324182021466249494, 5, 146, 1, 1),(4324182021466249494, 6, 185, 1, 2);


SELECT * FROM UAct_version;
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249494 │         5 │      146 │   -1 │
│ 4324182021466249494 │         6 │      185 │    1 │
└─────────────────────┴───────────┴──────────┴──────┘
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249494 │         5 │      146 │    1 │
└─────────────────────┴───────────┴──────────┴──────┘


SELECT
    UserID,
    sum(PageViews * Sign) AS PageViews,
    sum(Duration * Sign) AS Duration
FROM UAct_version
GROUP BY UserID
HAVING sum(Sign) > 0;
┌──────────────UserID─┬─PageViews─┬─Duration─┐
│ 4324182021466249494 │         6 │      185 │
└─────────────────────┴───────────┴──────────┘



optimize table UAct_version final;

select * from UAct_version;
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┬─Version─┐
│ 4324182021466249494 │         6 │      185 │    1 │       2 │
└─────────────────────┴───────────┴──────────┴──────┴─────────┘

3.4.5. SummingMergeTree

ClickHouse通过SummingMergeTree来支持对主键列进行预先聚合。在后台Compaction时,会将主键相同的多行进行sum求和,然后使用一行数据取而代之,从而大幅度降低存储空间占用,提升聚合计算性能。

值得注意的是:

  • ClickHouse只在后台Compaction时才会进行数据的预先聚合,而compaction的执行时机无法预测,所以可能存在部分数据已经被预先聚合、部分数据尚未被聚合的情况。因此,在执行聚合计算时,SQL中仍需要使用GROUP BY子句。
  • 在预先聚合时,ClickHouse会对主键列之外的其他所有列进行预聚合。如果这些列是可聚合的(比如数值类型),则直接sum;如果不可聚合(比如String类型),则随机选择一个值。
  • 通常建议将SummingMergeTree与MergeTree配合使用,使用MergeTree来存储具体明细,使用SummingMergeTree来存储预先聚合的结果加速查询。

示例如下:

CREATE TABLE summtt
(
    key UInt32,
    value UInt32
)
ENGINE = SummingMergeTree()
ORDER BY key


INSERT INTO summtt Values(1,1),(1,2),(2,1)


select * from summtt;
┌─key─┬─value─┐
│   1 │     1 │
│   1 │     2 │
│   2 │     1 │
└─────┴───────┘


SELECT key, sum(value) FROM summtt GROUP BY key
┌─key─┬─sum(value)─┐
│   2 │          1 │
│   1 │          3 │
└─────┴────────────┘


optimize table summtt final;


select * from summtt;
┌─key─┬─value─┐
│   1 │     3 │
│   2 │     1 │
└─────┴───────┘



SELECT key, sum(value) FROM summtt GROUP BY key
┌─key─┬─sum(value)─┐
│   2 │          1 │
│   1 │          3 │
└─────┴────────────┘

3.4.6. AggregatingMergeTree

AggregatingMergeTree也是预先聚合引擎的一种,用于提升聚合计算的性能。与SummingMergeTree的区别在于:SummingMergeTree对非主键列进行sum聚合,而AggregatingMergeTree则可以指定各种聚合函数。

AggregatingMergeTree的语法比较复杂,需要结合物化视图或ClickHouse的特殊数据类型AggregateFunction一起使用。在insert和select时,也有独特的写法和要求:写入时需要使用-State语法,查询时使用-Merge语法。

以下通过示例进行介绍。

示例一:配合物化视图使用。

CREATE TABLE visits
(
    UserID UInt64,
    CounterID UInt8,
    StartDate Date,
    Sign Int8
)
ENGINE = CollapsingMergeTree(Sign)
ORDER BY UserID;


CREATE MATERIALIZED VIEW visits_agg_view
ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(StartDate) ORDER BY (CounterID, StartDate)
AS SELECT
    CounterID,
    StartDate,
    sumState(Sign)    AS Visits,
    uniqState(UserID) AS Users
FROM visits
GROUP BY CounterID, StartDate;


INSERT INTO visits VALUES(0, 0, '2019-11-11', 1);
INSERT INTO visits VALUES(1, 1, '2019-11-12', 1);


SELECT
    StartDate,
    sumMerge(Visits) AS Visits,
    uniqMerge(Users) AS Users
FROM visits_agg_view
GROUP BY StartDate
ORDER BY StartDate;



SELECT
    StartDate,
    sum(Visits),
    uniq(Users)
FROM visits_agg_view
GROUP BY StartDate
ORDER BY StartDate;

示例二:配合特殊数据类型AggregateFunction使用。

 

CREATE TABLE detail_table
(   CounterID UInt8,
    StartDate Date,
    UserID UInt64
) ENGINE = MergeTree() 
PARTITION BY toYYYYMM(StartDate) 
ORDER BY (CounterID, StartDate);


INSERT INTO detail_table VALUES(0, '2019-11-11', 1);
INSERT INTO detail_table VALUES(1, '2019-11-12', 1);



CREATE TABLE agg_table
(   CounterID UInt8,
    StartDate Date,
    UserID AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree() 
PARTITION BY toYYYYMM(StartDate) 
ORDER BY (CounterID, StartDate);


INSERT INTO agg_table
select CounterID, StartDate, uniqState(UserID)
from detail_table
group by CounterID, StartDate


INSERT INTO agg_table VALUES(1, '2019-11-12', 1);


SELECT uniqMerge(UserID) AS state 
FROM agg_table 
GROUP BY CounterID, StartDate;

 

四、安装

4.1. 系统要求

ClickHouse可以在任何具有x86_64,AArch64或PowerPC64LE CPU架构的Linux,FreeBSD或Mac OS X上运行。

官方预构建的二进制文件通常针对x86_64进行编译,并利用SSE 4.2指令集,因此,除非另有说明,支持它的CPU使用将成为额外的系统需求。下面是检查当前CPU是否支持SSE 4.2的命令:

$ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

要在不支持SSE 4.2AArch64PowerPC64LE架构的处理器上运行ClickHouse,您应该通过适当的配置调整从源代码构建ClickHouse

4.2. RPM安装包

推荐使用CentOS、RedHat和所有其他基于rpm的Linux发行版的官方预编译rpm包。

需要执行的指令:

sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
sudo yum install -y clickhouse-server clickhouse-client

sudo /etc/init.d/clickhouse-server start
clickhouse-client # or "clickhouse-client --password" if you set up a password.

如果您想使用最新的版本,请用testing替代stable(我们只推荐您用于测试环境)。prestable有时也可用。

4.3. 配置文件

ClickHouse支持多文件配置管理。主服务器配置文件是/etc/clickhouse-server/config.xml。其他文件必须在/etc/clickhouse-server/config.d目录中。

4.3.1. 覆写

主配置文件中指定的某些设置可以在其他配置文件中覆盖:

  • replace或remove属性,用于这些配置文件的元素指定属性。
  • 如果未指定,则它将递归合并元素的内容,替换重复子元素的值。
  • 如果replace指定,则将整个元素替换为指定的元素。
  • 如果remove指定,则删除该元素。

4.3.2. 替代

该配置还可以定义“substitutions”(替代)。

如果元素具有incl属性,将使用文件中属性对应的值。默认情况下,替代文件的文件目录是/etc/metrika.xml,这个可以在服务器设置的include_from属性中被更改。

在此文件中,替换值被保存在/yandex/substitution_name元素中。如果incl中指定的替代不存在,则将其记录在日志中。为防止ClickHouse记录缺少的替代项,请指定:optional= true属性。(例如,宏设置)。

可以从ZooKeeper中进行替换,指定属性from_zk =“ /path/to/node”。元素值将替换为ZooKeeper中/path/to/node上节点的内容。还可以将整个XML子树放在ZooKeeper节点上,并将其完全插入到source元素中。

4.3.3. 用户设置

config.xml文件可以使用users设置,为profiles和quotas指定单独的配置。此配置的相对路径在'users_config'元素中设置。默认情况下,它是users.xml。如果省略users_config,则直接在config.xml中指定。

用户配置可以指定在config.xml或者指定为保存在config.d/中的和config.xml类似的单独文件。

用户设置的目录名称被定义users_config中(不带.xml后缀)。

默认使用目录users.d,users_config默认使用users.xml。

举例

可以为每个用户使用单独的配置文件,如下所示:

$ cat /etc/clickhouse-server/users.d/sznb.xml

<yandex>
    <users>
      <sznb>
          <profile>analytics</profile>
            <networks>
                  <ip>::/0</ip>
            </networks>
        	<!-- sha256加密后的密码 -->
          <password_sha256_hex>...</password_sha256_hex>
          <quota>analytics</quota>
      </sznb>
    </users>
</yandex>

注意

对于每个配置文件,服务器启动时会生成file-preprocessed.xml 。这些文件包含所有已完成的替换和替代,仅供参考。如果在配置文件中使用了ZooKeeper替换项,但是ZooKeeper在服务器启动时不可用,则服务器将从预处理文件中加载配置。

4.3.4. 服务器设置

builtin_dictionaries_reload_interval

重新加载内置词典的时间间隔(以秒为单位),默认3600。可以在不重新启动服务器的情况下“即时”修改词典。

<builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>

compression

如果对clickhouse不是很熟悉,不要进行改动。

MergeTree引擎表的数据压缩设置。配置模板如:

<compression>
    <case>
      <min_part_size>...</min_part_size>
      <min_part_size_ratio>...</min_part_size_ratio>
      <method>...</method>
    </case>
    ...
</compression>

<case>部分说明:

min_part_size –数据部分的最小大小。

min_part_size_ratio –数据部分大小与表大小的比率。

method–压缩方法。可接受的值:lz4或zstd。

可以配置多个<case>部分。

满足条件时的操作:

如果数据部分与条件集匹配,则ClickHouse使用指定的压缩方法。

如果数据部分匹配多个条件集,则ClickHouse将使用第一个匹配的条件集。

<compression incl="clickhouse_compression">
    <case>
        <min_part_size>10000000000</min_part_size>
        <min_part_size_ratio>0.01</min_part_size_ratio>
        <method>zstd</method>
    </case>
</compression>

custom_settings_prefixes

自定义设置的前缀列表。(在users.xml中有提及)前缀必须用逗号分隔。

<custom_settings_prefixes>custom_</custom_settings_prefixes>


SET custom_a = 123;
SELECT getSetting('custom_a');  

core_dump

核心转储文件大小的软限制,默认情况下为1 GB。

<core_dump>
    <size_limit>1073741824</size_limit>
</core_dump>

default_database(重点)

默认数据库。

要获取数据库列表,请使用SHOW DATABASES查询。

<default_database>default</default_database>

default_profile

默认设置配置文件。

设置配置文件位于参数中指定的文件中user_config。

<default_profile>default</default_profile>

dictionaries_config

外部词典的配置文件的路径。

路径:

  • 指定绝对路径或相对于服务器配置文件的路径。
  • 该路径可以包含通配符*和?
<dictionaries_config>*_dictionary.xml</dictionaries_config>

dictionaries_lazy_load

延迟加载字典。

如果为true,则每个字典都在首次使用时创建。如果字典创建失败,则正在使用字典的函数将引发异常。

如果为false,则在服务器启动时将创建所有词典,如果有错误,则服务器将关闭。

默认值为true。

<dictionaries_lazy_load>true</dictionaries_lazy_load>

format_schema_path

包含输入数据方案(例如CapnProto格式的方案)的目录路径。

<!-- Directory containing schema files for various input formats. -->
<format_schema_path>format_schemas/</format_schema_path>

graphite

发送数据到Graphite。它是一款企业级监控。

设定:

<graphite>
    <host>localhost</host>  -- Graphite服务器
    <port>42000</port>   -- Graphite服务器上的端口
    <timeout>0.1</timeout> -- 发送超时时间,以秒为单位
    <interval>60</interval>  -- 发送间隔,以秒为单位
    <root_path>one_min</root_path> -- 密钥的前缀
    <metrics>true</metrics>  -- 从system.metrics表发送数据
    <events>true</events>  -- 从system.events表发送在该时间段内累积的增量数据
    <events_cumulative>false</events_cumulative> -- 从system.events表发送累积数据
    <asynchronous_metrics>true</asynchronous_metrics> -- 从system.asynchronous_metrics表发送数据
</graphite>

可以配置多个<graphite>子句。例如,您可以使用它以不同的时间间隔发送不同的数据。

graphite_rollup

数据汇总设置

<graphite_rollup_example>
      <pattern>
          <regexp>click_cost</regexp>
          <function>any</function>
          <retention>
              <age>0</age>
              <precision>3600</precision>
          </retention>
          <retention>
              <age>86400</age>
              <precision>60</precision>
          </retention>
      </pattern>
      <default>
          <function>max</function>
          <retention>
              <age>0</age>
              <precision>60</precision>
          </retention>
          <retention>
              <age>3600</age>
              <precision>300</precision>
          </retention>
          <retention>
              <age>86400</age>
              <precision>3600</precision>
          </retention>
      </default>
</graphite_rollup_example>

http_port/https_port(重点)

通过HTTP连接到服务器的端口。

  • 如果指定了https_port,则必须配置openSSL
  • 如果指定了http_port,则即使已设置openSSL配置,也会将其忽略。
<https_port>9999</https_port>

http_server_default_response

访问ClickHouse HTTP服务器时默认显示的页面。默认值为“OK”(末尾有换行符)

例如:当访问http://localhost: http_port.时打开TABIX - Open source simple business intelligence application and sql editor tool for ClickHouse Database.

<http_server_default_response>
  <![CDATA[<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>]]>
</http_server_default_response>

include_from

替换文件的路径。

有关更多信息,请参见“配置文件”一节。

默认为/etc/metrica.xml

<include_from>/etc/metrica.xml</include_from>

interserver_http_port

在ClickHouse服务器之间交换数据的端口。

<interserver_http_port>9009</interserver_http_port>

interserver_http_host

其他服务器可以用来访问该服务器的主机名。如果省略,则其定义方法与hostname -f命令相同。

<interserver_http_host>example.yandex.ru</interserver_http_host>

interserver_http_credentials

在使用Replicated *引擎进行复制期间进行身份验证的用户名和密码。

这些凭据仅用于副本之间的通信,与ClickHouse客户端的凭据无关。

服务器会在连接副本时检查凭据,并在连接到其他副本时使用相同的凭据。 因此,对于群集中的所有副本,应将这些凭据设置为相同。默认不使用身份验证。

包含以下参数:user - 用户名。password —密码。

<interserver_http_credentials>
    <user>admin</user>
    <password>222</password>
</interserver_http_credentials>

keep_alive_timeout

ClickHouse在关闭连接之前等待传入请求的秒数。默认为3秒。

<keep_alive_timeout>3</keep_alive_timeout>

listen_host(重点)

限制来源主机的请求, 如果要服务器回答所有请求,请指定“::” :

<listen_host>::1</listen_host>
<listen_host>127.0.0.1</listen_host>

logger(重点)

日志记录设置。选项组里的设置有:level、log、errorlog、size、count:

<logger>
    <level>trace</level>  --日志记录级别。可接受的值: trace, debug, information, warning, error
    <log>/var/log/clickhouse-server/clickhouse-server.log</log> --日志文件,根据级别包含所有条目
    <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog> -- 错误日志文件
    <size>1000M</size> -- 文件的大小。适用于loganderrorlog,文件达到大小后,ClickHouse将对其进行存档并重命名,并在其位置创建一个新的日志文件
    <count>10</count>  --  ClickHouse存储的已归档日志文件的数量
</logger>

还支持写入系统日志。配置示例:

<logger>
    <use_syslog>1</use_syslog>  -- 写入系统日志
    <syslog>
        <address>syslog.remote:10514</address> -- syslogd的主机[:port]。如果省略,则使用本地守护程序
        <hostname>myhost.local</hostname> -- 可选,从中发送日志的主机的名称。
        <facility>LOG_LOCAL6</facility> -- syslog关键字,其大写字母带有“ LOG_”前缀:(LOG_USER,LOG_DAEMON,LOG_LOCAL3,依此类推)
        <format>syslog</format> -- 格式。可能的值:bsd和syslog
    </syslog>
</logger>

send_crash_reports

通过Sentry(一个应用程序监视和错误跟踪软件)选择将崩溃报告发送给ClickHouse核心开发人员团队的设置。

服务器将需要通过IPv4访问公共Internet(Sentry不支持编写IPv6时),此功能才能正常运行。

enabled–false默认情况下,启用该功能的布尔标志。设置为true允许发送崩溃报告。

endpoint–您可以覆盖Sentry端点URL,以发送故障报告。它可以是单独的Sentry帐户,也可以是您自己托管的Sentry实例。使用Sentry DSN语法。

anonymize -避免将服务器主机名附加到崩溃报告中。

http_proxy -配置HTTP代理以发送崩溃报告。

debug -将Sentry客户端设置为调试模式。

tmp_path -临时崩溃报告状态的文件系统路径。

<send_crash_reports>
    <enabled>true</enabled>
</send_crash_reports>

macros

复制表的参数替换,如果不使用复制表,则可以省略。有关更多信息,请参见“创建复制表”一节。

<macros incl="macros" optional="true" />

mark_cache_size

标记缓存的大小,用于MergeTree系列的表中。 以字节为单位,共享服务器的缓存,并根据需要分配内存。缓存大小必须至少为5368709120(5G)。

<mark_cache_size>5368709120</mark_cache_size>

max_server_memory_usage

限制ClickHouse服务器的总RAM使用量。

可能的值:

  • 正整数。
  • 0(自动)。

默认值:0。

注:默认时,max_server_memory_usage值计算为memory_amount * max_server_memory_usage_to_ram_ratio

和users.xml中的max_memory_usage不同的是,这个是针对整个集群所有用户的,另一个是针对指定用户的

x_server_memory_usage_to_ram_ratio

可用于Clickhouse的,服务器的总物理RAM量的比例。如果服务器尝试利用更多资源,则会将内存减少到适当的数量。

可能值:

  • 正的double类型
  • 0

默认值:0。

在具有低RAM和交换max_server_memory_usage_to_ram_ratio容量的主机上,您可能需要设置大于1的值。

max_concurrent_queries

同时处理的最大请求数。

<max_concurrent_queries>100</max_concurrent_queries>

max_concurrent_queries_for_all_users

如果此设置的值小于或等于当前同时处理的查询数,则抛出异常。

举例:

max_concurrent_queries_for_all_users被设置成99,然后满了,这时候数据库管理员可以为了他自己调大这个值来进行查询,这样即使超载也能跑。

修改一个查询或用户的设置不会影响其他查询。

默认值:0。

<max_concurrent_queries_for_all_users>99</max_concurrent_queries_for_all_users>

(我看system.settings表里没有这个值,有另一个值max_concurrent_queries_for_user)

max_connections

最大连接数。

<max_connections>4096</max_connections>

max_open_files

打开最大的文件数,默认最大值

建议在Mac OS X中使用此选项,因为该getrlimit()函数返回的值不正确。

<max_open_files>262144</max_open_files>

max_table_size_to_drop

删除表的限制,默认50G,0表示不限制。

如果MergeTree表的大小超过max_table_size_to_drop(以字节为单位),则无法使用DROP查询将其删除。

如果仍然需要删除表而不重新启动ClickHouse服务器,请创建<clickhouse-path>/flags/force_drop_table文件并运行DROP查询。

<max_table_size_to_drop>0</max_table_size_to_drop>

max_thread_pool_size

全局线程池中的最大线程数。

预设值:10000。

<max_thread_pool_size>12000</max_thread_pool_size>

merge_tree

对MergeTree中的表进行调整,有关更多信息,请参见MergeTreeSettings.h头文件。

<merge_tree>
    <max_suspicious_broken_parts>5</max_suspicious_broken_parts>
</merge_tree>

replicated_merge_tree

对ReplicatedMergeTree中的表进行微调。

此设置具有更高的优先级。

有关更多信息,请参见MergeTreeSettings.h头文件。

<replicated_merge_tree>
    <max_suspicious_broken_parts>5</max_suspicious_broken_parts>
</replicated_merge_tree>

openSSL

SSL客户端/服务器配置。

libpoco库提供了对SSL的支持。该接口在文件SSLManager.h中进行了描述

<openSSL>
    <server>
        <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
        <certificateFile>/etc/clickhouse-server/server.crt</certificateFile> --PEM格式的客户端/服务器证书文件的路径。如果privateKeyFile包含证书,则可以忽略它。
        <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile>   --具有PEM证书的秘密密钥的文件的路径。该文件可能同时包含密钥和证书。
        <!-- openssl dhparam -out /etc/clickhouse-server/dhparam.pem 4096 -->
        <dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile>
        <verificationMode>none</verificationMode>  --检查节点证书的方法。详细信息在Context类的描述中。可能的值:none, relaxed, strict, once.
        <loadDefaultCAFile>true</loadDefaultCAFile> --指示将使用OpenSSL的内置CA证书。可接受的值:true,false
        <cacheSessions>true</cacheSessions>  --启用或禁用缓存会话。必须与sessionIdContext结合使用。可接受的值:true,false。
        <disableProtocols>sslv2,sslv3</disableProtocols> --不允许使用的协议。
        <preferServerCiphers>true</preferServerCiphers> ----首选服务器密码
    </server>
    <client>
        <loadDefaultCAFile>true</loadDefaultCAFile>  --指示将使用OpenSSL的内置CA证书。可接受的值:true,false
        <cacheSessions>true</cacheSessions>  -- --启用或禁用缓存会话。必须与sessionIdContext结合使用。可接受的值:true,false。
        <disableProtocols>sslv2,sslv3</disableProtocols>  --不允许使用的协议。
        <preferServerCiphers>true</preferServerCiphers> --首选服务器密码
        <!-- Use for self-signed: <verificationMode>none</verificationMode> -->
        <invalidCertificateHandler>  --用于验证无效证书的类
            <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
            <name>RejectCertificateHandler</name>
        </invalidCertificateHandler>  
    </client>
</openSSL>

part_log

记录与MergeTree关联的事件。例如,添加或合并数据。您可以使用日志来模拟合并算法并比较其特征。您可以可视化合并过程。

查询记录在system.part_log表中,而不记录在单独的文件中。您可以在table参数中配置此表的名称。

<part_log>
    <database>system</database>   --库名
    <table>part_log</table>  --表名
    <partition_by>toMonday(event_date)</partition_by>  --自定义分区键
    <flush_interval_milliseconds>7500</flush_interval_milliseconds> --将数据从内存中的缓冲区刷新到表的时间间隔,单位毫秒。
</part_log>

path(重点)

数据的目录路径。

注意:斜杠是必需的。

<path>/var/lib/clickhouse/</path>

prometheus

从Prometheus公开度量数据以进行刮擦。

设定:

endpoint– HTTP端点,用于通过Prometheus服务器抓取度量标准。从...开始 '/'。

port–的端口endpoint。

metrics–设置为从system.metrics表公开指标的标志。

events–设置为从system.events表公开指标的标志。

asynchronous_metrics–设置为从system.asynchronous_metrics表中公开当前指标值的标志。

<prometheus>
    <endpoint>/metrics</endpoint>
    <port>8001</port>
    <metrics>true</metrics>
    <events>true</events>
    <asynchronous_metrics>true</asynchronous_metrics>
</prometheus>

query_log

通过log_queries = 1设置,记录接收到的查询。查询记录在system.query_log表中,而不记录在单独的文件中。可以在table参数中更改表的名称。

如果在更新ClickHouse服务器时查询日志的结构发生了更改,则具有旧结构的表将重命名,并自动创建一个新表。

partition_by—系统表的自定义分区键。如果用engine定义,则无法使用。

engine-系统表的MergeTree引擎定义。如果用partition_by定义,则无法使用。

<query_log>
    <database>system</database>   --库名
    <table>query_log</table>   --表名
    <partition_by>toMonday(event_date)</partition_by>  --自定义分区键
    <flush_interval_milliseconds>7500</flush_interval_milliseconds>  --将数据从内存中的缓冲区刷新到表的时间间隔
</query_log>

query_thread_log

使用log_query_threads = 1设置,记录接收到查询的线程。查询记录在system.query_thread_log表中,而不记录在单独的文件中。您可以在table参数中更改表的名称。

如果该表不存在,ClickHouse将创建它。如果在更新ClickHouse服务器时查询线程日志的结构发生了更改,则具有旧结构的表将重命名,并自动创建一个新表。

  • partition_by—系统表的自定义分区键。如果用engine定义,则无法使用。
  • engine-系统表的MergeTree引擎定义。如果用partition_by定义,则无法使用。
<query_thread_log>
    <database>system</database>     --库名
    <table>query_thread_log</table>  --表名
    <partition_by>toMonday(event_date)</partition_by>  --自定义分区键
    <flush_interval_milliseconds>7500</flush_interval_milliseconds>  --将数据从内存中的缓冲区刷新到表的时间间隔
</query_thread_log>

text_log

用于记录文本消息的text_log系统表的设置。

<yandex>
    <text_log>
        <level>notice</level>    --将存储在表中的最大消息级别(Maximum Message Level)(默认为Trace))。
        <database>system</database>    --库名
        <table>text_log</table>    --表名
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>    --将数据从内存中的缓冲区刷新到表的时间间隔。
        <!-- <partition_by>event_date</partition_by> -->    --自定义分区键
        <engine>Engine = MergeTree PARTITION BY event_date ORDER BY event_time TTL event_date + INTERVAL 30 day</engine>    --系统表的MergeTree引擎定义。
    </text_log>
</yandex>

trace_log

trace_log系统表操作的设置。

<trace_log>
    <database>system</database>  --库名
    <table>trace_log</table>   --表名
    <partition_by>toYYYYMM(event_date)</partition_by>  ----自定义分区键
    <flush_interval_milliseconds>7500</flush_interval_milliseconds>  ----将数据从内存中的缓冲区刷新到表的时间间隔
</trace_log>

query_masking_rules

正则表达式规则。将在存储在日志文件之前应用于请求、与该请求相关的日志消息、system.query_log, system.text_log, system.processes表、发送给客户端的日志文件。

屏蔽规则适用于整个查询(以防止敏感数据因格式错误/不可解析的查询而泄漏)。

这样可以防止SQL查询中的敏感数据(例如姓名,电子邮件,个人标识符或信用卡号))泄漏记录到日志中。

<query_masking_rules>
    <rule>
        <name>hide SSN</name>   --规则名称(可选)
        <regexp>(^|\D)\d{3}-\d{2}-\d{4}($|\D)</regexp>  --正则表达式(强制性)
        <replace>000-00-0000</replace>  --替换,敏感数据的替换字符串(默认为可选-六个星号)
    </rule>
</query_masking_rules>

system.events表具有计数器QueryMaskingRulesMatch,该计数器具有与查询屏蔽规则匹配的总数。

对于分布式查询,必须分别配置每个服务器,否则,将存储传递给其他节点的子查询不会进行屏蔽。

remote_servers

远程服务器,分布式表引擎和集群表功能使用的集群的配置。

<remote_servers incl="clickhouse_remote_servers" />

也可以看看

timezone

服务器的时区,定为UTC时区或地理位置(例如,非洲/阿比让)的IANA标识符。

当DateTime字段输出为文本格式(打印在屏幕或文件中),以及从字符串获取DateTime时,时区对于在String和DateTime格式之间进行转换是必需的。

此外,如果在输入参数中没有指定timezone,则在使用时间和日期的函数中会使用这个值。

<timezone>Asia/Shanghai</timezone>

tcp_port

通过TCP协议与客户端进行通信的端口。即ClickHouse端口。

<tcp_port>9000</tcp_port>

tcp_port_secure

TCP端口,用于与客户端进行安全通信。与OpenSSL设置一起使用。

<mysql_port>9004</mysql_port>

mysql_port

通过MySQL协议与客户端通信的端口。

<mysql_port>9004</mysql_port>

tmp_path

用于处理大型查询的临时数据的路径。

<tmp_path>/var/lib/clickhouse/tmp/</tmp_path>

tmp_policy

storage_configuration的策略,用于存储临时文件。如果未设置,则使用tmp_path,否则将忽略tmp_path。

  • move_factor、keep_free_space_bytes、max_data_part_size_bytes 会被忽略
  • 必须在该策略中仅包含一个卷。

uncompressed_cache_size

表引擎从MergeTree使用的未压缩数据的缓存大小(以字节为单位,8G)。服务器有一个共享缓存,内存是按需分配的。如果启用,则使用高速缓存。在个别情况下,未压缩的缓存对于非常短的查询是有利的。

<uncompressed_cache_size>8589934592</uncompressed_cache_size>

user_files_path

包含用户文件的目录。在表函数file()中使用。

用于在指定文件中读取或写入数据。

<user_files_path>/var/lib/clickhouse/user_files/</user_files_path>

users_config

包含以下内容的文件的路径:

  • 用户配置。
  • 访问权限。
  • 设置配置文件。
  • 配额设置。
<users_config>users.xml</users_config>

zookeeper

ClickHouse与ZooKeeper群集进行交互的设置。使用复制表时,ClickHouse使用ZooKeeper来存储副本的元数据。如果不使用复制表,则可以忽略此参数。

模板:

<node index="1">
    <host>example_host</host>
    <port>2181</port>
</node>

当尝试连接到ZooKeeper集群时,`index`属性指定节点顺序。

举例:

<zookeeper>
    <node>
        <host>example1</host>
        <port>2181</port>
    </node>
    <node>
        <host>example2</host>
        <port>2181</port>
    </node>
    <session_timeout_ms>30000</session_timeout_ms>  --客户端会话的最大超时(以毫秒为单位)
    <operation_timeout_ms>10000</operation_timeout_ms>
    <!-- Optional. Chroot suffix. Should exist. -->
    <root>/path/to/zookeeper/node</root>   -- 用作ClickHouse服务器使用的znode的根的znode(可选)
    <!-- Optional. Zookeeper digest ACL string. -->
    <identity>user:password</identity>  --用户和密码,ZooKeeper可能需要这些用户和密码才能访问请求的znode(可选)
</zookeeper>

可以看复制和ZooKeeper说明。

use_minimalistic_part_header_in_zookeeper

ZooKeeper中数据部分头的存储方法。

此设置仅适用于MergeTree家族。可以指定:

  • 位于config.xml文件的merge_tree部分,对服务器上的所有表使用该设置。 可以随时更改设置。 当设置更改时,现有表将更改其行为。
  • 对于每个单独的表,创建表时,请指定相应的引擎设置。 即使全局设置发生更改,具有此设置的现有表的行为也不会更改。

可能的值

  • 0-功能已关闭。
  • 1-功能已打开。

设置开启时,复制表使用单个znode紧凑地存储数据部分的头。

如果表包含许多列,则use_minimalistic_part_header_in_zookeeper = 1时,此存储方法将大大减少Zookeeper中存储的数据量。

 

注意:

  • 应用后use_minimalistic_part_header_in_zookeeper = 1,您无法将ClickHouse服务器降级到不支持此设置的版本。
  • 在群集中的服务器上升级ClickHouse时要小心。不要一次升级所有服务器。在测试环境中或仅在群集的几台服务器上测试ClickHouse的新版本更为安全。
  • 已经使用此设置存储的数据部件标题无法恢复为其以前的(非紧凑)表示形式。

默认值:0

disable_internal_dns_cache

禁用内部DNS缓存。建议用于在基础设施频繁更改的系统(例如Kubernetes)中操作ClickHouse。

默认值:0

dns_cache_update_period

ClickHouse内部DNS缓存中存储的IP地址的更新时间(以秒为单位)。

更新是在单独的系统线程中异步执行的。

默认值:15。

  • background_schedule_pool_size

access_control_path

ClickHouse服务器存储SQL命令创建的用户和角色配置的文件夹的路径。

默认值:/var/lib/clickhouse/access/。

user_directories

配置文件中包含设置的部分:

  • 具有预定义用户的配置文件的路径。
  • 由SQL命令创建的用户存储位置的文件夹路径。

如果指定了此部分,则不会使用users_config和access_control_path的路径。

该user_directories部分可以包含任意数量的项目,项目的顺序表示它们的优先级(项目越高,优先级越高)。

<user_directories>
    <users_xml>
        <path>/etc/clickhouse-server/users.xml</path>
    </users_xml>
    <local_directory>
        <path>/var/lib/clickhouse/access/</path>
    </local_directory>
</user_directories>

您还可以指定设置

  • memory-表示仅将信息存储在内存中,而不写入磁盘
  • ldap-表示将信息存储在LDAP服务器上。

要将LDAP服务器添加为未在本地定义的用户的远程用户目录,请定义一个ldap部分:

<ldap>
    <server>my_ldap_server</server>    --此参数是必需的,不能为空。表示ldap_servers config部分中定义的LDAP服务器名称之一。
        <roles>    --包含本地定义的角色的列表,这些角色将分配给从LDAP服务器检索到的每个用户。如果未指定任何角色,则用户在身份验证后将无法执行任何操作。如果在身份验证时未在本地定义任何列出的角色,则身份验证尝试将失败,就像提供的密码不正确一样。
            <my_local_role1 />
            <my_local_role2 />
        </roles>
</ldap>

4.3.5. 简化版配置文件

config.xml

<?xml version="1.0"?>
<!--
  NOTE: User and query level settings are set up in "users.xml" file.
  If you have accidentally specified user-level settings here, server won't start.
  You can either move the settings to the right place inside "users.xml" file
   or add <skip_check_for_incorrect_settings>1</skip_check_for_incorrect_settings> here.
-->
<clickhouse>
    <logger>
        <!-- Possible levels [1]:

          - none (turns off logging)
          - fatal
          - critical
          - error
          - warning
          - notice
          - information
          - debug
          - trace
          - test (not for production usage)

            [1]: https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/Logger.h#L105-L114
        -->
        <level>trace</level>
        <log>/var/log/clickhouse-server/clickhouse-server.log</log>
        <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
        <!-- Rotation policy
             See https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/FileChannel.h#L54-L85
          -->
        <size>1000M</size>
        <count>10</count>
        <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->

        <!-- Per level overrides (legacy):

        For example to suppress logging of the ConfigReloader you can use:
        NOTE: levels.logger is reserved, see below.
        -->
        <!--
        <levels>
          <ConfigReloader>none</ConfigReloader>
        </levels>
        -->

        <!-- Per level overrides:

        For example to suppress logging of the RBAC for default user you can use:
        (But please note that the logger name maybe changed from version to version, even after minor upgrade)
        -->
        <!--
        <levels>
          <logger>
            <name>ContextAccess (default)</name>
            <level>none</level>
          </logger>
          <logger>
            <name>DatabaseOrdinary (test)</name>
            <level>none</level>
          </logger>
        </levels>
        -->
    </logger>

    <!-- Add headers to response in options request. OPTIONS method is used in CORS preflight requests. -->
    <!-- It is off by default. Next headers are obligate for CORS.-->
    <!-- http_options_response>
        <header>
            <name>Access-Control-Allow-Origin</name>
            <value>*</value>
        </header>
        <header>
            <name>Access-Control-Allow-Headers</name>
            <value>origin, x-requested-with</value>
        </header>
        <header>
            <name>Access-Control-Allow-Methods</name>
            <value>POST, GET, OPTIONS</value>
        </header>
        <header>
            <name>Access-Control-Max-Age</name>
            <value>86400</value>
        </header>
    </http_options_response -->

    <!-- It is the name that will be shown in the clickhouse-client.
         By default, anything with "production" will be highlighted in red in query prompt.
    -->
    <!--display_name>production</display_name-->

    <!-- Port for HTTP API. See also 'https_port' for secure connections.
         This interface is also used by ODBC and JDBC drivers (DataGrip, Dbeaver, ...)
         and by most of web interfaces (embedded UI, Grafana, Redash, ...).
      -->
    <http_port>8123</http_port>

    <!-- Port for interaction by native protocol with:
         - clickhouse-client and other native ClickHouse tools (clickhouse-benchmark, clickhouse-copier);
         - clickhouse-server with other clickhouse-servers for distributed query processing;
         - ClickHouse drivers and applications supporting native protocol
         (this protocol is also informally called as "the TCP protocol");
         See also 'tcp_port_secure' for secure connections.
    -->
    <tcp_port>9000</tcp_port>

    <!-- Compatibility with MySQL protocol.
         ClickHouse will pretend to be MySQL for applications connecting to this port.
    -->
    <mysql_port>9004</mysql_port>

    <!-- Compatibility with PostgreSQL protocol.
         ClickHouse will pretend to be PostgreSQL for applications connecting to this port.
    -->
    <postgresql_port>9005</postgresql_port>

    <!-- HTTP API with TLS (HTTPS).
         You have to configure certificate to enable this interface.
         See the openSSL section below.
    -->
    <!-- <https_port>8443</https_port> -->

    <!-- Native interface with TLS.
         You have to configure certificate to enable this interface.
         See the openSSL section below.
    -->
    <!-- <tcp_port_secure>9440</tcp_port_secure> -->

    <!-- Native interface wrapped with PROXYv1 protocol
         PROXYv1 header sent for every connection.
         ClickHouse will extract information about proxy-forwarded client address from the header.
    -->
    <!-- <tcp_with_proxy_port>9011</tcp_with_proxy_port> -->

    <!-- Port for communication between replicas. Used for data exchange.
         It provides low-level data access between servers.
         This port should not be accessible from untrusted networks.
         See also 'interserver_http_credentials'.
         Data transferred over connections to this port should not go through untrusted networks.
         See also 'interserver_https_port'.
      -->
    <interserver_http_port>9009</interserver_http_port>

    <!-- Port for communication between replicas with TLS.
         You have to configure certificate to enable this interface.
         See the openSSL section below.
         See also 'interserver_http_credentials'.
      -->
    <!-- <interserver_https_port>9010</interserver_https_port> -->

    <!-- Hostname that is used by other replicas to request this server.
         If not specified, then it is determined analogous to 'hostname -f' command.
         This setting could be used to switch replication to another network interface
         (the server may be connected to multiple networks via multiple addresses)
      -->

    <!--
    <interserver_http_host>example.clickhouse.com</interserver_http_host>
    -->

    <!-- You can specify credentials for authenthication between replicas.
         This is required when interserver_https_port is accessible from untrusted networks,
         and also recommended to avoid SSRF attacks from possibly compromised services in your network.
      -->
    <!--<interserver_http_credentials>
        <user>interserver</user>
        <password></password>
    </interserver_http_credentials>-->

    <!-- Listen specified address.
         Use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere.
         Notes:
         If you open connections from wildcard address, make sure that at least one of the following measures applied:
         - server is protected by firewall and not accessible from untrusted networks;
         - all users are restricted to subset of network addresses (see users.xml);
         - all users have strong passwords, only secure (TLS) interfaces are accessible, or connections are only made via TLS interfaces.
         - users without password have readonly access.
         See also: https://www.shodan.io/search?query=clickhouse
      -->
    <!-- <listen_host>::</listen_host> -->


    <!-- Same for hosts without support for IPv6: -->
    <listen_host>0.0.0.0</listen_host>

    <!-- Default values - try listen localhost on IPv4 and IPv6. -->
    <!--
    <listen_host>::1</listen_host>
    <listen_host>127.0.0.1</listen_host>
    -->

    <!-- Don't exit if IPv6 or IPv4 networks are unavailable while trying to listen. -->
    <!-- <listen_try>0</listen_try> -->

    <!-- Allow multiple servers to listen on the same address:port. This is not recommended.
      -->
    <!-- <listen_reuse_port>0</listen_reuse_port> -->

    <!-- <listen_backlog>4096</listen_backlog> -->

    <max_connections>4096</max_connections>

    <!-- For 'Connection: keep-alive' in HTTP 1.1 -->
    <keep_alive_timeout>3</keep_alive_timeout>

    <!-- gRPC protocol (see src/Server/grpc_protos/clickhouse_grpc.proto for the API) -->
    <!-- <grpc_port>9100</grpc_port> -->
    <grpc>
        <enable_ssl>false</enable_ssl>

        <!-- The following two files are used only if enable_ssl=1 -->
        <ssl_cert_file>/path/to/ssl_cert_file</ssl_cert_file>
        <ssl_key_file>/path/to/ssl_key_file</ssl_key_file>

        <!-- Whether server will request client for a certificate -->
        <ssl_require_client_auth>false</ssl_require_client_auth>

        <!-- The following file is used only if ssl_require_client_auth=1 -->
        <ssl_ca_cert_file>/path/to/ssl_ca_cert_file</ssl_ca_cert_file>

        <!-- Default transport compression type (can be overridden by client, see the transport_compression_type field in QueryInfo).
             Supported algorithms: none, deflate, gzip, stream_gzip -->
        <transport_compression_type>none</transport_compression_type>

        <!-- Default transport compression level. Supported levels: 0..3 -->
        <transport_compression_level>0</transport_compression_level>

        <!-- Send/receive message size limits in bytes. -1 means unlimited -->
        <max_send_message_size>-1</max_send_message_size>
        <max_receive_message_size>-1</max_receive_message_size>

        <!-- Enable if you want very detailed logs -->
        <verbose_logs>false</verbose_logs>
    </grpc>

    <!-- Used with https_port and tcp_port_secure. Full ssl options list: https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h#L71 -->
    <openSSL>
        <server> <!-- Used for https server AND secure tcp port -->
            <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
            <!-- <certificateFile>/etc/clickhouse-server/server.crt</certificateFile>
            <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile> -->
            <!-- dhparams are optional. You can delete the <dhParamsFile> element.
                 To generate dhparams, use the following command:
                  openssl dhparam -out /etc/clickhouse-server/dhparam.pem 4096
                 Only file format with BEGIN DH PARAMETERS is supported.
              -->
            <!-- <dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile>-->
            <verificationMode>none</verificationMode>
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
        </server>

        <client> <!-- Used for connecting to https dictionary source and secured Zookeeper communication -->
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
            <!-- Use for self-signed: <verificationMode>none</verificationMode> -->
            <invalidCertificateHandler>
                <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
                <name>RejectCertificateHandler</name>
            </invalidCertificateHandler>
        </client>
    </openSSL>

    <!-- Default root page on http[s] server. For example load UI from https://tabix.io/ when opening http://localhost:8123 -->
    <!--
    <http_server_default_response><![CDATA[<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>]]></http_server_default_response>
    -->

    <!-- Maximum number of concurrent queries. -->
    <max_concurrent_queries>100</max_concurrent_queries>

    <!-- Maximum memory usage (resident set size) for server process.
         Zero value or unset means default. Default is "max_server_memory_usage_to_ram_ratio" of available physical RAM.
         If the value is larger than "max_server_memory_usage_to_ram_ratio" of available physical RAM, it will be cut down.

         The constraint is checked on query execution time.
         If a query tries to allocate memory and the current memory usage plus allocation is greater
          than specified threshold, exception will be thrown.

         It is not practical to set this constraint to small values like just a few gigabytes,
          because memory allocator will keep this amount of memory in caches and the server will deny service of queries.
      -->
    <max_server_memory_usage>0</max_server_memory_usage>

    <!-- Maximum number of threads in the Global thread pool.
    This will default to a maximum of 10000 threads if not specified.
    This setting will be useful in scenarios where there are a large number
    of distributed queries that are running concurrently but are idling most
    of the time, in which case a higher number of threads might be required.
    -->

    <max_thread_pool_size>10000</max_thread_pool_size>

    <!-- Number of workers to recycle connections in background (see also drain_timeout).
         If the pool is full, connection will be drained synchronously. -->
    <!-- <max_threads_for_connection_collector>10</max_threads_for_connection_collector> -->

    <!-- On memory constrained environments you may have to set this to value larger than 1.
      -->
    <max_server_memory_usage_to_ram_ratio>0.9</max_server_memory_usage_to_ram_ratio>

    <!-- Simple server-wide memory profiler. Collect a stack trace at every peak allocation step (in bytes).
         Data will be stored in system.trace_log table with query_id = empty string.
         Zero means disabled.
      -->
    <total_memory_profiler_step>4194304</total_memory_profiler_step>

    <!-- Collect random allocations and deallocations and write them into system.trace_log with 'MemorySample' trace_type.
         The probability is for every alloc/free regardless to the size of the allocation.
         Note that sampling happens only when the amount of untracked memory exceeds the untracked memory limit,
          which is 4 MiB by default but can be lowered if 'total_memory_profiler_step' is lowered.
         You may want to set 'total_memory_profiler_step' to 1 for extra fine grained sampling.
      -->
    <total_memory_tracker_sample_probability>0</total_memory_tracker_sample_probability>

    <!-- Set limit on number of open files (default: maximum). This setting makes sense on Mac OS X because getrlimit() fails to retrieve
         correct maximum value. -->
    <!-- <max_open_files>262144</max_open_files> -->

    <!-- Size of cache of uncompressed blocks of data, used in tables of MergeTree family.
         In bytes. Cache is single for server. Memory is allocated only on demand.
         Cache is used when 'use_uncompressed_cache' user setting turned on (off by default).
         Uncompressed cache is advantageous only for very short queries and in rare cases.

         Note: uncompressed cache can be pointless for lz4, because memory bandwidth
         is slower than multi-core decompression on some server configurations.
         Enabling it can sometimes paradoxically make queries slower.
      -->
    <uncompressed_cache_size>8589934592</uncompressed_cache_size>

    <!-- Approximate size of mark cache, used in tables of MergeTree family.
         In bytes. Cache is single for server. Memory is allocated only on demand.
         You should not lower this value.
      -->
    <mark_cache_size>5368709120</mark_cache_size>


    <!-- If you enable the `min_bytes_to_use_mmap_io` setting,
         the data in MergeTree tables can be read with mmap to avoid copying from kernel to userspace.
         It makes sense only for large files and helps only if data reside in page cache.
         To avoid frequent open/mmap/munmap/close calls (which are very expensive due to consequent page faults)
         and to reuse mappings from several threads and queries,
         the cache of mapped files is maintained. Its size is the number of mapped regions (usually equal to the number of mapped files).
         The amount of data in mapped files can be monitored
         in system.metrics, system.metric_log by the MMappedFiles, MMappedFileBytes metrics
         and in system.asynchronous_metrics, system.asynchronous_metrics_log by the MMapCacheCells metric,
         and also in system.events, system.processes, system.query_log, system.query_thread_log, system.query_views_log by the
         CreatedReadBufferMMap, CreatedReadBufferMMapFailed, MMappedFileCacheHits, MMappedFileCacheMisses events.
         Note that the amount of data in mapped files does not consume memory directly and is not accounted
         in query or server memory usage - because this memory can be discarded similar to OS page cache.
         The cache is dropped (the files are closed) automatically on removal of old parts in MergeTree,
         also it can be dropped manually by the SYSTEM DROP MMAP CACHE query.
      -->
    <mmap_cache_size>1000</mmap_cache_size>

    <!-- Cache size in bytes for compiled expressions.-->
    <compiled_expression_cache_size>134217728</compiled_expression_cache_size>

    <!-- Cache size in elements for compiled expressions.-->
    <compiled_expression_cache_elements_size>10000</compiled_expression_cache_elements_size>

    <!-- Path to data directory, with trailing slash. -->
    <path>/var/lib/clickhouse/</path>

    <!-- Multi-disk configuration example: -->
    <!--
    <storage_configuration>
        <disks>
            <default>
                <keep_free_space_bytes>0</keep_free_space_bytes>
            </default>
            <data>
                <path>/data/</path>
                <keep_free_space_bytes>0</keep_free_space_bytes>
            </data>
            <s3>
                <type>s3</type>
                <endpoint>http://path/to/endpoint</endpoint>
                <access_key_id>your_access_key_id</access_key_id>
                <secret_access_key>your_secret_access_key</secret_access_key>
            </s3>
            <blob_storage_disk>
                <type>azure_blob_storage</type>
                <storage_account_url>http://account.blob.core.windows.net</storage_account_url>
                <container_name>container</container_name>
                <account_name>account</account_name>
                <account_key>pass123</account_key>
                <metadata_path>/var/lib/clickhouse/disks/blob_storage_disk/</metadata_path>
                <cache_enabled>true</cache_enabled>
                <cache_path>/var/lib/clickhouse/disks/blob_storage_disk/cache/</cache_path>
                <skip_access_check>false</skip_access_check>
            </blob_storage_disk>
        </disks>

        <policies>
            <all>
                <volumes>
                    <main>
                        <disk>default</disk>
                        <disk>data</disk>
                        <disk>s3</disk>
                        <disk>blob_storage_disk</disk>

                        <max_data_part_size_bytes></max_data_part_size_bytes>
                        <max_data_part_size_ratio></max_data_part_size_ratio>
                        <perform_ttl_move_on_insert>true</perform_ttl_move_on_insert>
                        <prefer_not_to_merge>false</prefer_not_to_merge>
                        <load_balancing>round_robin</load_balancing>
                    </main>
                </volumes>
                <move_factor>0.2</move_factor>
            </all>
        </policies>
    </storage_configuration>
    -->


    <!-- Path to temporary data for processing hard queries. -->
    <tmp_path>/var/lib/clickhouse/tmp/</tmp_path>

    <!-- Disable AuthType plaintext_password and no_password for ACL. -->
    <!-- <allow_plaintext_password>0</allow_plaintext_password> -->
    <!-- <allow_no_password>0</allow_no_password> -->`

    <!-- Policy from the <storage_configuration> for the temporary files.
         If not set <tmp_path> is used, otherwise <tmp_path> is ignored.

         Notes:
         - move_factor              is ignored
         - keep_free_space_bytes    is ignored
         - max_data_part_size_bytes is ignored
         - you must have exactly one volume in that policy
    -->
    <!-- <tmp_policy>tmp</tmp_policy> -->

    <!-- Directory with user provided files that are accessible by 'file' table function. -->
    <user_files_path>/var/lib/clickhouse/user_files/</user_files_path>

    <!-- LDAP server definitions. -->
    <ldap_servers>
        <!-- List LDAP servers with their connection parameters here to later 1) use them as authenticators for dedicated local users,
              who have 'ldap' authentication mechanism specified instead of 'password', or to 2) use them as remote user directories.
             Parameters:
                host - LDAP server hostname or IP, this parameter is mandatory and cannot be empty.
                port - LDAP server port, default is 636 if enable_tls is set to true, 389 otherwise.
                bind_dn - template used to construct the DN to bind to.
                        The resulting DN will be constructed by replacing all '{user_name}' substrings of the template with the actual
                         user name during each authentication attempt.
                user_dn_detection - section with LDAP search parameters for detecting the actual user DN of the bound user.
                        This is mainly used in search filters for further role mapping when the server is Active Directory. The
                         resulting user DN will be used when replacing '{user_dn}' substrings wherever they are allowed. By default,
                         user DN is set equal to bind DN, but once search is performed, it will be updated with to the actual detected
                         user DN value.
                    base_dn - template used to construct the base DN for the LDAP search.
                            The resulting DN will be constructed by replacing all '{user_name}' and '{bind_dn}' substrings
                             of the template with the actual user name and bind DN during the LDAP search.
                    scope - scope of the LDAP search.
                            Accepted values are: 'base', 'one_level', 'children', 'subtree' (the default).
                    search_filter - template used to construct the search filter for the LDAP search.
                            The resulting filter will be constructed by replacing all '{user_name}', '{bind_dn}', and '{base_dn}'
                             substrings of the template with the actual user name, bind DN, and base DN during the LDAP search.
                            Note, that the special characters must be escaped properly in XML.
                verification_cooldown - a period of time, in seconds, after a successful bind attempt, during which a user will be assumed
                         to be successfully authenticated for all consecutive requests without contacting the LDAP server.
                        Specify 0 (the default) to disable caching and force contacting the LDAP server for each authentication request.
                enable_tls - flag to trigger use of secure connection to the LDAP server.
                        Specify 'no' for plain text (ldap://) protocol (not recommended).
                        Specify 'yes' for LDAP over SSL/TLS (ldaps://) protocol (recommended, the default).
                        Specify 'starttls' for legacy StartTLS protocol (plain text (ldap://) protocol, upgraded to TLS).
                tls_minimum_protocol_version - the minimum protocol version of SSL/TLS.
                        Accepted values are: 'ssl2', 'ssl3', 'tls1.0', 'tls1.1', 'tls1.2' (the default).
                tls_require_cert - SSL/TLS peer certificate verification behavior.
                        Accepted values are: 'never', 'allow', 'try', 'demand' (the default).
                tls_cert_file - path to certificate file.
                tls_key_file - path to certificate key file.
                tls_ca_cert_file - path to CA certificate file.
                tls_ca_cert_dir - path to the directory containing CA certificates.
                tls_cipher_suite - allowed cipher suite (in OpenSSL notation).
             Example:
                <my_ldap_server>
                    <host>localhost</host>
                    <port>636</port>
                    <bind_dn>uid={user_name},ou=users,dc=example,dc=com</bind_dn>
                    <verification_cooldown>300</verification_cooldown>
                    <enable_tls>yes</enable_tls>
                    <tls_minimum_protocol_version>tls1.2</tls_minimum_protocol_version>
                    <tls_require_cert>demand</tls_require_cert>
                    <tls_cert_file>/path/to/tls_cert_file</tls_cert_file>
                    <tls_key_file>/path/to/tls_key_file</tls_key_file>
                    <tls_ca_cert_file>/path/to/tls_ca_cert_file</tls_ca_cert_file>
                    <tls_ca_cert_dir>/path/to/tls_ca_cert_dir</tls_ca_cert_dir>
                    <tls_cipher_suite>ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:AES256-GCM-SHA384</tls_cipher_suite>
                </my_ldap_server>
             Example (typical Active Directory with configured user DN detection for further role mapping):
                <my_ad_server>
                    <host>localhost</host>
                    <port>389</port>
                    <bind_dn>EXAMPLE\{user_name}</bind_dn>
                    <user_dn_detection>
                        <base_dn>CN=Users,DC=example,DC=com</base_dn>
                        <search_filter>(&amp;(objectClass=user)(sAMAccountName={user_name}))</search_filter>
                    </user_dn_detection>
                    <enable_tls>no</enable_tls>
                </my_ad_server>
        -->
    </ldap_servers>

    <!-- To enable Kerberos authentication support for HTTP requests (GSS-SPNEGO), for those users who are explicitly configured
          to authenticate via Kerberos, define a single 'kerberos' section here.
         Parameters:
            principal - canonical service principal name, that will be acquired and used when accepting security contexts.
                    This parameter is optional, if omitted, the default principal will be used.
                    This parameter cannot be specified together with 'realm' parameter.
            realm - a realm, that will be used to restrict authentication to only those requests whose initiator's realm matches it.
                    This parameter is optional, if omitted, no additional filtering by realm will be applied.
                    This parameter cannot be specified together with 'principal' parameter.
         Example:
            <kerberos />
         Example:
            <kerberos>
                <principal>HTTP/clickhouse.example.com@EXAMPLE.COM</principal>
            </kerberos>
         Example:
            <kerberos>
                <realm>EXAMPLE.COM</realm>
            </kerberos>
    -->

    <!-- Sources to read users, roles, access rights, profiles of settings, quotas. -->
    <user_directories>
        <users_xml>
            <!-- Path to configuration file with predefined users. -->
            <path>users.xml</path>
        </users_xml>
        <local_directory>
            <!-- Path to folder where users created by SQL commands are stored. -->
            <path>/var/lib/clickhouse/access/</path>
        </local_directory>

        <!-- To add an LDAP server as a remote user directory of users that are not defined locally, define a single 'ldap' section
              with the following parameters:
                server - one of LDAP server names defined in 'ldap_servers' config section above.
                        This parameter is mandatory and cannot be empty.
                roles - section with a list of locally defined roles that will be assigned to each user retrieved from the LDAP server.
                        If no roles are specified here or assigned during role mapping (below), user will not be able to perform any
                         actions after authentication.
                role_mapping - section with LDAP search parameters and mapping rules.
                        When a user authenticates, while still bound to LDAP, an LDAP search is performed using search_filter and the
                         name of the logged in user. For each entry found during that search, the value of the specified attribute is
                         extracted. For each attribute value that has the specified prefix, the prefix is removed, and the rest of the
                         value becomes the name of a local role defined in ClickHouse, which is expected to be created beforehand by
                         CREATE ROLE command.
                        There can be multiple 'role_mapping' sections defined inside the same 'ldap' section. All of them will be
                         applied.
                    base_dn - template used to construct the base DN for the LDAP search.
                            The resulting DN will be constructed by replacing all '{user_name}', '{bind_dn}', and '{user_dn}'
                             substrings of the template with the actual user name, bind DN, and user DN during each LDAP search.
                    scope - scope of the LDAP search.
                            Accepted values are: 'base', 'one_level', 'children', 'subtree' (the default).
                    search_filter - template used to construct the search filter for the LDAP search.
                            The resulting filter will be constructed by replacing all '{user_name}', '{bind_dn}', '{user_dn}', and
                             '{base_dn}' substrings of the template with the actual user name, bind DN, user DN, and base DN during
                             each LDAP search.
                            Note, that the special characters must be escaped properly in XML.
                    attribute - attribute name whose values will be returned by the LDAP search. 'cn', by default.
                    prefix - prefix, that will be expected to be in front of each string in the original list of strings returned by
                             the LDAP search. Prefix will be removed from the original strings and resulting strings will be treated
                             as local role names. Empty, by default.
             Example:
                <ldap>
                    <server>my_ldap_server</server>
                    <roles>
                        <my_local_role1 />
                        <my_local_role2 />
                    </roles>
                    <role_mapping>
                        <base_dn>ou=groups,dc=example,dc=com</base_dn>
                        <scope>subtree</scope>
                        <search_filter>(&amp;(objectClass=groupOfNames)(member={bind_dn}))</search_filter>
                        <attribute>cn</attribute>
                        <prefix>clickhouse_</prefix>
                    </role_mapping>
                </ldap>
             Example (typical Active Directory with role mapping that relies on the detected user DN):
                <ldap>
                    <server>my_ad_server</server>
                    <role_mapping>
                        <base_dn>CN=Users,DC=example,DC=com</base_dn>
                        <attribute>CN</attribute>
                        <scope>subtree</scope>
                        <search_filter>(&amp;(objectClass=group)(member={user_dn}))</search_filter>
                        <prefix>clickhouse_</prefix>
                    </role_mapping>
                </ldap>
        -->
    </user_directories>

    <access_control_improvements>
        <!-- Enables logic that users without permissive row policies can still read rows using a SELECT query.
             For example, if there two users A, B and a row policy is defined only for A, then
             if this setting is true the user B will see all rows, and if this setting is false the user B will see no rows.
             By default this setting is false for compatibility with earlier access configurations. -->
        <users_without_row_policies_can_read_rows>false</users_without_row_policies_can_read_rows>
        
        <!-- By default, for backward compatibility ON CLUSTER queries ignore CLUSTER grant,
             however you can change this behaviour by setting this to true -->
        <on_cluster_queries_require_cluster_grant>false</on_cluster_queries_require_cluster_grant>

        <!-- By default, for backward compatibility "SELECT * FROM system.<table>" doesn't require any grants and can be executed
             by any user. You can change this behaviour by setting this to true.
             If it's set to true then this query requires "GRANT SELECT ON system.<table>" just like as for non-system tables.
             Exceptions: a few system tables ("tables", "columns", "databases", and some constant tables like "one", "contributors")
             are still accessible for everyone; and if there is a SHOW privilege (e.g. "SHOW USERS") granted the corresponding system
             table (i.e. "system.users") will be accessible. -->
        <select_from_system_db_requires_grant>false</select_from_system_db_requires_grant>

        <!-- By default, for backward compatibility "SELECT * FROM information_schema.<table>" doesn't require any grants and can be
             executed by any user. You can change this behaviour by setting this to true.
             If it's set to true then this query requires "GRANT SELECT ON information_schema.<table>" just like as for ordinary tables. -->
        <select_from_information_schema_requires_grant>false</select_from_information_schema_requires_grant>
    </access_control_improvements>

    <!-- Default profile of settings. -->
    <default_profile>default</default_profile>

    <!-- Comma-separated list of prefixes for user-defined settings. -->
    <custom_settings_prefixes></custom_settings_prefixes>

    <!-- System profile of settings. This settings are used by internal processes (Distributed DDL worker and so on). -->
    <!-- <system_profile>default</system_profile> -->

    <!-- Buffer profile of settings.
         This settings are used by Buffer storage to flush data to the underlying table.
         Default: used from system_profile directive.
    -->
    <!-- <buffer_profile>default</buffer_profile> -->

    <!-- Default database. -->
    <default_database>default</default_database>

    <!-- Server time zone could be set here.

         Time zone is used when converting between String and DateTime types,
          when printing DateTime in text formats and parsing DateTime from text,
          it is used in date and time related functions, if specific time zone was not passed as an argument.

         Time zone is specified as identifier from IANA time zone database, like UTC or Africa/Abidjan.
         If not specified, system time zone at server startup is used.

         Please note, that server could display time zone alias instead of specified name.
         Example: Zulu is an alias for UTC.
    -->
    <!-- <timezone>UTC</timezone> -->

    <!-- You can specify umask here (see "man umask"). Server will apply it on startup.
         Number is always parsed as octal. Default umask is 027 (other users cannot read logs, data files, etc; group can only read).
    -->
    <!-- <umask>022</umask> -->

    <!-- Perform mlockall after startup to lower first queries latency
          and to prevent clickhouse executable from being paged out under high IO load.
         Enabling this option is recommended but will lead to increased startup time for up to a few seconds.
    -->
    <mlock_executable>true</mlock_executable>

    <!-- Reallocate memory for machine code ("text") using huge pages. Highly experimental. -->
    <remap_executable>false</remap_executable>

    <![CDATA[
         Uncomment below in order to use JDBC table engine and function.

         To install and run JDBC bridge in background:
         * [Debian/Ubuntu]
           export MVN_URL=https://repo1.maven.org/maven2/com/clickhouse/clickhouse-jdbc-bridge/
           export PKG_VER=$(curl -sL $MVN_URL/maven-metadata.xml | grep '<release>' | sed -e 's|.*>\(.*\)<.*|\1|')
           wget https://github.com/ClickHouse/clickhouse-jdbc-bridge/releases/download/v$PKG_VER/clickhouse-jdbc-bridge_$PKG_VER-1_all.deb
           apt install --no-install-recommends -f ./clickhouse-jdbc-bridge_$PKG_VER-1_all.deb
           clickhouse-jdbc-bridge &

         * [CentOS/RHEL]
           export MVN_URL=https://repo1.maven.org/maven2/com/clickhouse/clickhouse-jdbc-bridge/
           export PKG_VER=$(curl -sL $MVN_URL/maven-metadata.xml | grep '<release>' | sed -e 's|.*>\(.*\)<.*|\1|')
           wget https://github.com/ClickHouse/clickhouse-jdbc-bridge/releases/download/v$PKG_VER/clickhouse-jdbc-bridge-$PKG_VER-1.noarch.rpm
           yum localinstall -y clickhouse-jdbc-bridge-$PKG_VER-1.noarch.rpm
           clickhouse-jdbc-bridge &

         Please refer to https://github.com/ClickHouse/clickhouse-jdbc-bridge#usage for more information.
    ]]>
    <!--
    <jdbc_bridge>
        <host>127.0.0.1</host>
        <port>9019</port>
    </jdbc_bridge>
    -->

    <!-- Configuration of clusters that could be used in Distributed tables.
         https://clickhouse.com/docs/en/operations/table_engines/distributed/
      -->
    <remote_servers>
        <!-- Test only shard config for testing distributed storage -->
        <test_shard_localhost>
            <!-- Inter-server per-cluster secret for Distributed queries
                 default: no secret (no authentication will be performed)

                 If set, then Distributed queries will be validated on shards, so at least:
                 - such cluster should exist on the shard,
                 - such cluster should have the same secret.

                 And also (and which is more important), the initial_user will
                 be used as current user for the query.

                 Right now the protocol is pretty simple and it only takes into account:
                 - cluster name
                 - query

                 Also it will be nice if the following will be implemented:
                 - source hostname (see interserver_http_host), but then it will depends from DNS,
                   it can use IP address instead, but then the you need to get correct on the initiator node.
                 - target hostname / ip address (same notes as for source hostname)
                 - time-based security tokens
            -->
            <!-- <secret></secret> -->

            <shard>
                <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
                <!-- <internal_replication>false</internal_replication> -->
                <!-- Optional. Shard weight when writing data. Default: 1. -->
                <!-- <weight>1</weight> -->
                <replica>
                    <host>localhost</host>
                    <port>9000</port>
                    <!-- Optional. Priority of the replica for load_balancing. Default: 1 (less value has more priority). -->
                    <!-- <priority>1</priority> -->
                </replica>
            </shard>
        </test_shard_localhost>
        <test_cluster_one_shard_three_replicas_localhost>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>127.0.0.1</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>127.0.0.2</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>127.0.0.3</host>
                    <port>9000</port>
                </replica>
            </shard>
            <!--shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>127.0.0.1</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>127.0.0.2</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>127.0.0.3</host>
                    <port>9000</port>
                </replica>
            </shard-->
        </test_cluster_one_shard_three_replicas_localhost>
        <test_cluster_two_shards_localhost>
             <shard>
                 <replica>
                     <host>localhost</host>
                     <port>9000</port>
                 </replica>
             </shard>
             <shard>
                 <replica>
                     <host>localhost</host>
                     <port>9000</port>
                 </replica>
             </shard>
        </test_cluster_two_shards_localhost>
        <test_cluster_two_shards>
            <shard>
                <replica>
                    <host>127.0.0.1</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>127.0.0.2</host>
                    <port>9000</port>
                </replica>
            </shard>
        </test_cluster_two_shards>
        <test_cluster_two_shards_internal_replication>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>127.0.0.1</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>127.0.0.2</host>
                    <port>9000</port>
                </replica>
            </shard>
        </test_cluster_two_shards_internal_replication>
        <test_shard_localhost_secure>
            <shard>
                <replica>
                    <host>localhost</host>
                    <port>9440</port>
                    <secure>1</secure>
                </replica>
            </shard>
        </test_shard_localhost_secure>
        <test_unavailable_shard>
            <shard>
                <replica>
                    <host>localhost</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>localhost</host>
                    <port>1</port>
                </replica>
            </shard>
        </test_unavailable_shard>
    </remote_servers>

    <!-- The list of hosts allowed to use in URL-related storage engines and table functions.
        If this section is not present in configuration, all hosts are allowed.
    -->
    <!--<remote_url_allow_hosts>-->
        <!-- Host should be specified exactly as in URL. The name is checked before DNS resolution.
            Example: "clickhouse.com", "clickhouse.com." and "www.clickhouse.com" are different hosts.
                    If port is explicitly specified in URL, the host:port is checked as a whole.
                    If host specified here without port, any port with this host allowed.
                    "clickhouse.com" -> "clickhouse.com:443", "clickhouse.com:80" etc. is allowed, but "clickhouse.com:80" -> only "clickhouse.com:80" is allowed.
            If the host is specified as IP address, it is checked as specified in URL. Example: "[2a02:6b8:a::a]".
            If there are redirects and support for redirects is enabled, every redirect (the Location field) is checked.
            Host should be specified using the host xml tag:
                    <host>clickhouse.com</host>
        -->

        <!-- Regular expression can be specified. RE2 engine is used for regexps.
            Regexps are not aligned: don't forget to add ^ and $. Also don't forget to escape dot (.) metacharacter
            (forgetting to do so is a common source of error).
        -->
    <!--</remote_url_allow_hosts>-->

    <!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
         By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
         Values for substitutions are specified in /clickhouse/name_of_substitution elements in that file.
      -->

    <!-- ZooKeeper is used to store metadata about replicas, when using Replicated tables.
         Optional. If you don't use replicated tables, you could omit that.

         See https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/
      -->

    <!--
    <zookeeper>
        <node>
            <host>example1</host>
            <port>2181</port>
        </node>
        <node>
            <host>example2</host>
            <port>2181</port>
        </node>
        <node>
            <host>example3</host>
            <port>2181</port>
        </node>
    </zookeeper>
    -->

    <!-- Substitutions for parameters of replicated tables.
          Optional. If you don't use replicated tables, you could omit that.

         See https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/#creating-replicated-tables
      -->
    <!--
    <macros>
        <shard>01</shard>
        <replica>example01-01-1</replica>
    </macros>
    -->


    <!-- Reloading interval for embedded dictionaries, in seconds. Default: 3600. -->
    <builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>


    <!-- Maximum session timeout, in seconds. Default: 3600. -->
    <max_session_timeout>3600</max_session_timeout>

    <!-- Default session timeout, in seconds. Default: 60. -->
    <default_session_timeout>60</default_session_timeout>

    <!-- Sending data to Graphite for monitoring. Several sections can be defined. -->
    <!--
        interval - send every X second
        root_path - prefix for keys
        hostname_in_path - append hostname to root_path (default = true)
        metrics - send data from table system.metrics
        events - send data from table system.events
        asynchronous_metrics - send data from table system.asynchronous_metrics
    -->
    <!--
    <graphite>
        <host>localhost</host>
        <port>42000</port>
        <timeout>0.1</timeout>
        <interval>60</interval>
        <root_path>one_min</root_path>
        <hostname_in_path>true</hostname_in_path>

        <metrics>true</metrics>
        <events>true</events>
        <events_cumulative>false</events_cumulative>
        <asynchronous_metrics>true</asynchronous_metrics>
    </graphite>
    <graphite>
        <host>localhost</host>
        <port>42000</port>
        <timeout>0.1</timeout>
        <interval>1</interval>
        <root_path>one_sec</root_path>

        <metrics>true</metrics>
        <events>true</events>
        <events_cumulative>false</events_cumulative>
        <asynchronous_metrics>false</asynchronous_metrics>
    </graphite>
    -->

    <!-- Serve endpoint for Prometheus monitoring. -->
    <!--
        endpoint - mertics path (relative to root, statring with "/")
        port - port to setup server. If not defined or 0 than http_port used
        metrics - send data from table system.metrics
        events - send data from table system.events
        asynchronous_metrics - send data from table system.asynchronous_metrics
        status_info - send data from different component from CH, ex: Dictionaries status
    -->
    <!--
    <prometheus>
        <endpoint>/metrics</endpoint>
        <port>9363</port>

        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>true</asynchronous_metrics>
        <status_info>true</status_info>
    </prometheus>
    -->

    <!-- Query log. Used only for queries with setting log_queries = 1. -->
    <query_log>
        <!-- What table to insert data. If table is not exist, it will be created.
             When query log structure is changed after system update,
              then old table will be renamed and new table will be created automatically.
        -->
        <database>system</database>
        <table>query_log</table>
        <!--
            PARTITION BY expr: https://clickhouse.com/docs/en/table_engines/mergetree-family/custom_partitioning_key/
            Example:
                event_date
                toMonday(event_date)
                toYYYYMM(event_date)
                toStartOfHour(event_time)
        -->
        <partition_by>toYYYYMM(event_date)</partition_by>
        <!--
            Table TTL specification: https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/#mergetree-table-ttl
            Example:
                event_date + INTERVAL 1 WEEK
                event_date + INTERVAL 7 DAY DELETE
                event_date + INTERVAL 2 WEEK TO DISK 'bbb'

        <ttl>event_date + INTERVAL 30 DAY DELETE</ttl>
        -->

        <!-- Instead of partition_by, you can provide full engine expression (starting with ENGINE = ) with parameters,
             Example: <engine>ENGINE = MergeTree PARTITION BY toYYYYMM(event_date) ORDER BY (event_date, event_time) SETTINGS index_granularity = 1024</engine>
          -->

        <!-- Interval of flushing data. -->
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_log>

    <!-- Trace log. Stores stack traces collected by query profilers.
         See query_profiler_real_time_period_ns and query_profiler_cpu_time_period_ns settings. -->
    <trace_log>
        <database>system</database>
        <table>trace_log</table>

        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </trace_log>

    <!-- Query thread log. Has information about all threads participated in query execution.
         Used only for queries with setting log_query_threads = 1. -->
    <query_thread_log>
        <database>system</database>
        <table>query_thread_log</table>
        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_thread_log>

    <!-- Query views log. Has information about all dependent views associated with a query.
         Used only for queries with setting log_query_views = 1. -->
    <query_views_log>
        <database>system</database>
        <table>query_views_log</table>
        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_views_log>

    <!-- Uncomment if use part log.
         Part log contains information about all actions with parts in MergeTree tables (creation, deletion, merges, downloads).-->
    <part_log>
        <database>system</database>
        <table>part_log</table>
        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </part_log>

    <!-- Uncomment to write text log into table.
         Text log contains all information from usual server log but stores it in structured and efficient way.
         The level of the messages that goes to the table can be limited (<level>), if not specified all messages will go to the table.
    <text_log>
        <database>system</database>
        <table>text_log</table>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
        <level></level>
    </text_log>
    -->

    <!-- Metric log contains rows with current values of ProfileEvents, CurrentMetrics collected with "collect_interval_milliseconds" interval. -->
    <metric_log>
        <database>system</database>
        <table>metric_log</table>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
        <collect_interval_milliseconds>1000</collect_interval_milliseconds>
    </metric_log>

    <!--
        Asynchronous metric log contains values of metrics from
        system.asynchronous_metrics.
    -->
    <asynchronous_metric_log>
        <database>system</database>
        <table>asynchronous_metric_log</table>
        <!--
            Asynchronous metrics are updated once a minute, so there is
            no need to flush more often.
        -->
        <flush_interval_milliseconds>7000</flush_interval_milliseconds>
    </asynchronous_metric_log>

    <!--
        OpenTelemetry log contains OpenTelemetry trace spans.
    -->
    <opentelemetry_span_log>
        <!--
            The default table creation code is insufficient, this <engine> spec
            is a workaround. There is no 'event_time' for this log, but two times,
            start and finish. It is sorted by finish time, to avoid inserting
            data too far away in the past (probably we can sometimes insert a span
            that is seconds earlier than the last span in the table, due to a race
            between several spans inserted in parallel). This gives the spans a
            global order that we can use to e.g. retry insertion into some external
            system.
        -->
        <engine>
            engine MergeTree
            partition by toYYYYMM(finish_date)
            order by (finish_date, finish_time_us, trace_id)
        </engine>
        <database>system</database>
        <table>opentelemetry_span_log</table>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </opentelemetry_span_log>


    <!-- Crash log. Stores stack traces for fatal errors.
         This table is normally empty. -->
    <crash_log>
        <database>system</database>
        <table>crash_log</table>

        <partition_by />
        <flush_interval_milliseconds>1000</flush_interval_milliseconds>
    </crash_log>

    <!-- Session log. Stores user log in (successful or not) and log out events.

        Note: session log has known security issues and should not be used in production.
    -->
    <!-- <session_log>
        <database>system</database>
        <table>session_log</table>

        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </session_log> -->

    <!-- Profiling on Processors level. -->
    <processors_profile_log>
        <database>system</database>
        <table>processors_profile_log</table>

        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </processors_profile_log>

    <!-- <top_level_domains_path>/var/lib/clickhouse/top_level_domains/</top_level_domains_path> -->
    <!-- Custom TLD lists.
         Format: <name>/path/to/file</name>

         Changes will not be applied w/o server restart.
         Path to the list is under top_level_domains_path (see above).
    -->
    <top_level_domains_lists>
        <!--
        <public_suffix_list>/path/to/public_suffix_list.dat</public_suffix_list>
        -->
    </top_level_domains_lists>

    <!-- Configuration of external dictionaries. See:
         https://clickhouse.com/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts
    -->
    <dictionaries_config>*_dictionary.xml</dictionaries_config>

    <!-- Configuration of user defined executable functions -->
    <user_defined_executable_functions_config>*_function.xml</user_defined_executable_functions_config>

    <!-- Uncomment if you want data to be compressed 30-100% better.
         Don't do that if you just started using ClickHouse.
      -->
    <!--
    <compression>
        <!- - Set of variants. Checked in order. Last matching case wins. If nothing matches, lz4 will be used. - ->
        <case>

            <!- - Conditions. All must be satisfied. Some conditions may be omitted. - ->
            <min_part_size>10000000000</min_part_size>        <!- - Min part size in bytes. - ->
            <min_part_size_ratio>0.01</min_part_size_ratio>   <!- - Min size of part relative to whole table size. - ->

            <!- - What compression method to use. - ->
            <method>zstd</method>
        </case>
    </compression>
    -->

    <!-- Configuration of encryption. The server executes a command to
         obtain an encryption key at startup if such a command is
         defined, or encryption codecs will be disabled otherwise. The
         command is executed through /bin/sh and is expected to write
         a Base64-encoded key to the stdout. -->
    <encryption_codecs>
        <!-- aes_128_gcm_siv -->
            <!-- Example of getting hex key from env -->
            <!-- the code should use this key and throw an exception if its length is not 16 bytes -->
            <!--key_hex from_env="..."></key_hex -->

            <!-- Example of multiple hex keys. They can be imported from env or be written down in config-->
            <!-- the code should use these keys and throw an exception if their length is not 16 bytes -->
            <!-- key_hex id="0">...</key_hex -->
            <!-- key_hex id="1" from_env=".."></key_hex -->
            <!-- key_hex id="2">...</key_hex -->
            <!-- current_key_id>2</current_key_id -->

            <!-- Example of getting hex key from config -->
            <!-- the code should use this key and throw an exception if its length is not 16 bytes -->
            <!-- key>...</key -->

            <!-- example of adding nonce -->
            <!-- nonce>...</nonce -->

        <!-- /aes_128_gcm_siv -->
    </encryption_codecs>

    <!-- Allow to execute distributed DDL queries (CREATE, DROP, ALTER, RENAME) on cluster.
         Works only if ZooKeeper is enabled. Comment it if such functionality isn't required. -->
    <distributed_ddl>
        <!-- Path in ZooKeeper to queue with DDL queries -->
        <path>/clickhouse/task_queue/ddl</path>

        <!-- Settings from this profile will be used to execute DDL queries -->
        <!-- <profile>default</profile> -->

        <!-- Controls how much ON CLUSTER queries can be run simultaneously. -->
        <!-- <pool_size>1</pool_size> -->

        <!--
             Cleanup settings (active tasks will not be removed)
        -->

        <!-- Controls task TTL (default 1 week) -->
        <!-- <task_max_lifetime>604800</task_max_lifetime> -->

        <!-- Controls how often cleanup should be performed (in seconds) -->
        <!-- <cleanup_delay_period>60</cleanup_delay_period> -->

        <!-- Controls how many tasks could be in the queue -->
        <!-- <max_tasks_in_queue>1000</max_tasks_in_queue> -->
    </distributed_ddl>

    <!-- Settings to fine tune MergeTree tables. See documentation in source code, in MergeTreeSettings.h -->
    <!--
    <merge_tree>
        <max_suspicious_broken_parts>5</max_suspicious_broken_parts>
    </merge_tree>
    -->

    <!-- Protection from accidental DROP.
         If size of a MergeTree table is greater than max_table_size_to_drop (in bytes) than table could not be dropped with any DROP query.
         If you want do delete one table and don't want to change clickhouse-server config, you could create special file <clickhouse-path>/flags/force_drop_table and make DROP once.
         By default max_table_size_to_drop is 50GB; max_table_size_to_drop=0 allows to DROP any tables.
         The same for max_partition_size_to_drop.
         Uncomment to disable protection.
    -->
    <!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
    <!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->

    <!-- Example of parameters for GraphiteMergeTree table engine -->
    <graphite_rollup_example>
        <pattern>
            <regexp>click_cost</regexp>
            <function>any</function>
            <retention>
                <age>0</age>
                <precision>3600</precision>
            </retention>
            <retention>
                <age>86400</age>
                <precision>60</precision>
            </retention>
        </pattern>
        <default>
            <function>max</function>
            <retention>
                <age>0</age>
                <precision>60</precision>
            </retention>
            <retention>
                <age>3600</age>
                <precision>300</precision>
            </retention>
            <retention>
                <age>86400</age>
                <precision>3600</precision>
            </retention>
        </default>
    </graphite_rollup_example>

    <!-- Directory in <clickhouse-path> containing schema files for various input formats.
         The directory will be created if it doesn't exist.
      -->
    <format_schema_path>/var/lib/clickhouse/format_schemas/</format_schema_path>

    <!-- Default query masking rules, matching lines would be replaced with something else in the logs
        (both text logs and system.query_log).
        name - name for the rule (optional)
        regexp - RE2 compatible regular expression (mandatory)
        replace - substitution string for sensitive data (optional, by default - six asterisks)
    -->
    <query_masking_rules>
        <rule>
            <name>hide encrypt/decrypt arguments</name>
            <regexp>((?:aes_)?(?:encrypt|decrypt)(?:_mysql)?)\s*\(\s*(?:'(?:\\'|.)+'|.*?)\s*\)</regexp>
            <!-- or more secure, but also more invasive:
                (aes_\w+)\s*\(.*\)
            -->
            <replace>\1(???)</replace>
        </rule>
    </query_masking_rules>

    <!-- Uncomment to use custom http handlers.
        rules are checked from top to bottom, first match runs the handler
            url - to match request URL, you can use 'regex:' prefix to use regex match(optional)
            methods - to match request method, you can use commas to separate multiple method matches(optional)
            headers - to match request headers, match each child element(child element name is header name), you can use 'regex:' prefix to use regex match(optional)
        handler is request handler
            type - supported types: static, dynamic_query_handler, predefined_query_handler
            query - use with predefined_query_handler type, executes query when the handler is called
            query_param_name - use with dynamic_query_handler type, extracts and executes the value corresponding to the <query_param_name> value in HTTP request params
            status - use with static type, response status code
            content_type - use with static type, response content-type
            response_content - use with static type, Response content sent to client, when using the prefix 'file://' or 'config://', find the content from the file or configuration send to client.

    <http_handlers>
        <rule>
            <url>/</url>
            <methods>POST,GET</methods>
            <headers><pragma>no-cache</pragma></headers>
            <handler>
                <type>dynamic_query_handler</type>
                <query_param_name>query</query_param_name>
            </handler>
        </rule>

        <rule>
            <url>/predefined_query</url>
            <methods>POST,GET</methods>
            <handler>
                <type>predefined_query_handler</type>
                <query>SELECT * FROM system.settings</query>
            </handler>
        </rule>

        <rule>
            <handler>
                <type>static</type>
                <status>200</status>
                <content_type>text/plain; charset=UTF-8</content_type>
                <response_content>config://http_server_default_response</response_content>
            </handler>
        </rule>
    </http_handlers>
    -->

    <send_crash_reports>
        <!-- Changing <enabled> to true allows sending crash reports to -->
        <!-- the ClickHouse core developers team via Sentry https://sentry.io -->
        <!-- Doing so at least in pre-production environments is highly appreciated -->
        <enabled>false</enabled>
        <!-- Change <anonymize> to true if you don't feel comfortable attaching the server hostname to the crash report -->
        <anonymize>false</anonymize>
        <!-- Default endpoint should be changed to different Sentry DSN only if you have -->
        <!-- some in-house engineers or hired consultants who're going to debug ClickHouse issues for you -->
        <endpoint>https://6f33034cfe684dd7a3ab9875e57b1c8d@o388870.ingest.sentry.io/5226277</endpoint>
    </send_crash_reports>

    <!-- Uncomment to disable ClickHouse internal DNS caching. -->
    <!-- <disable_internal_dns_cache>1</disable_internal_dns_cache> -->

    <!-- You can also configure rocksdb like this: -->
    <!--
    <rocksdb>
        <options>
            <max_background_jobs>8</max_background_jobs>
        </options>
        <column_family_options>
            <num_levels>2</num_levels>
        </column_family_options>
        <tables>
            <table>
                <name>TABLE</name>
                <options>
                    <max_background_jobs>8</max_background_jobs>
                </options>
                <column_family_options>
                    <num_levels>2</num_levels>
                </column_family_options>
            </table>
        </tables>
    </rocksdb>
    -->

    <!-- Uncomment if enable merge tree metadata cache -->
    <!--merge_tree_metadata_cache>
        <lru_cache_size>268435456</lru_cache_size>
        <continue_if_corrupted>true</continue_if_corrupted>
    </merge_tree_metadata_cache-->
</clickhouse>

users.xml

<?xml version="1.0"?>
<clickhouse>
    <!-- See also the files in users.d directory where the settings can be overridden. -->

    <!-- Profiles of settings. -->
    <profiles>
        <!-- Default settings. -->
        <default>
            <!-- How to choose between replicas during distributed query processing.
                 random - choose random replica from set of replicas with minimum number of errors
                 nearest_hostname - from set of replicas with minimum number of errors, choose replica
                  with minimum number of different symbols between replica's hostname and local hostname
                  (Hamming distance).
                 in_order - first live replica is chosen in specified order.
                 first_or_random - if first replica one has higher number of errors, pick a random one from replicas with minimum number of errors.
            -->
            <load_balancing>random</load_balancing>
        </default>

        <!-- Profile that allows only read queries. -->
        <readonly>
            <readonly>1</readonly>
        </readonly>
    </profiles>

    <!-- Users and ACL. -->
    <users>
		<default>
			<password_sha256_hex>2c835ba8966d902120fb4504037fad34effa4b9461e988e4c4da073ad50dae82</password_sha256_hex>
			<networks incl="networks" replace="replace">
                <ip>::/0</ip>
            </networks>
			<profile>default</profile>
			<quota>default</quota>
		</default>
    </users>

    <!-- Quotas. -->
    <quotas>
        <!-- Name of quota. -->
        <default>
            <!-- Limits for time interval. You could specify many intervals with different limits. -->
            <interval>
                <!-- Length of interval. -->
                <duration>3600</duration>

                <!-- No limits. Just calculate resource usage for time interval. -->
                <queries>0</queries>
                <errors>0</errors>
                <result_rows>0</result_rows>
                <read_rows>0</read_rows>
                <execution_time>0</execution_time>
            </interval>
        </default>
    </quotas>
</clickhouse>

部分来源于:clickhouse聚合之探索聚合内部机制 - 掘金

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值