clickhouse分析：zookeeper数据存储

最新推荐文章于 2024-03-05 22:51:46 发布

追梦青春09

最新推荐文章于 2024-03-05 22:51:46 发布

阅读量5.3k

点赞数 2

分类专栏： clickhouse分析大数据技术纪实文章标签： clickhouse zookeeper clickhouse分析

本文链接：https://blog.csdn.net/iceyung/article/details/104060187

版权

clickhouse分析同时被 2 个专栏收录

8 篇文章 7 订阅

订阅专栏

大数据技术纪实

2 篇文章 0 订阅

订阅专栏

安装clickhouse

安装全新的clickhouse，暂无数据。

开启debug级别的日志

在config.xml中，开启<level>debug</level>模式，便于查看日志内容。

创建复制表

建表语句：

create table test_test
(
    Id               Int32,
    YearMonth        Int32,
    DeviceType       String
) engine = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/default_test',
'{replica}') PARTITION BY YearMonth ORDER BY (YearMonth, DeviceType) SETTINGS index_granularity = 8192;

以下为zookeeper存储的内容：

主要存储了一些建表的数据和数据块等信息。

查看zookeeper存储的内容

插入一条数据到复制表中：

insert into test_test(Id,YearMonth,DeviceType) values (1,202001,'D1-2');

zookeeper显示的内容：

在分区中多出了内容，出现了202001_0_0_0的文件夹，该文件夹为我们导入的分区的文件夹，存储的内容如下：

再次写入同分区的数据：

insert into test_test(Id,YearMonth,DeviceType) values (2,202001,'D1-3');

zookeeper显示：

本地磁盘：

此时同分区的数据还未merge，官方文档提示merge将在接下来的十几分钟内完成，我们也可以手动执行：

OPTIMIZE TABLE test_test PARTITION 202001;

此时磁盘的显示：

zookeeper的显示：

不仅没有删除，还多了一个，我们通过查询语句：

select * from system.parts where table='test_test';

显示：

active中只有一个为活跃状态1，rows也为2，证明该数据块已经是merge后的，只是剩下的这两个暂时还没有被删除。

过个几分钟可以发现之前的数据块已经被删除，同时zookeeper也保持着最新的数据块状态：

查询在zookeeper中的存储

目前观察，在存储中还未发现有查询在zookeeper中的存储数据，在config.xml中有一个配置：

 <!-- Allow to execute distributed DDL queries (CREATE, DROP, ALTER, RENAME) on cluster.
         Works only if ZooKeeper is enabled. Comment it if such functionality isn't required. -->
    <distributed_ddl>
        <!-- Path in ZooKeeper to queue with DDL queries -->
        <path>/clickhouse/task_queue/ddl</path>

        <!-- Settings from this profile will be used to execute DDL queries -->
        <!-- <profile>default</profile> -->
    </distributed_ddl>

在上述的截图中也看到了这个目录：

我们来执行ddl的语句看下存储的内容：

drop table test_test on cluster 'cluster01';

这其中cluster01为我们集群的名字，在metrika.xml中配置，metrika.xml是通过config.xml中的<include_from>/etc/clickhouse-server/metrika.xml</include_from>导入的。

在ddl中显示出了我们的刚刚查询的语句，该语句会被分发到所有的节点执行，目前我们只有一个节点，该部分的数据也会随着执行语句的增多而增加。

其它的注意事项

若你的集群中有多个节点，会出现如下节点：

其中01-01为layer-shard，在具体的节点内，还有replica，该部分包含的节点互为副本，也即是zookeeper会将这些副本的数据进行同步。这些也跟我们的建表语句有关，不同的建表语句可能出现不同的目录。

<macros>
<layer>01</layer>
<shard>01</shard>
<replica>cluster01-01-01</replica>
</macros>

log文件夹中记录了每次insert的一些简单信息，该部分的数据也会随着数据块的增加而增多：

format version: 4
create_time: 2020-01-21 11:09:42
source replica: cluster01-01-01
block_id: 202001_18140686589791493295_2460127323285304296
get
202001_0_0_0

执行完OPTIMIZE的merge命令后，新增了一条log显示如下：

format version: 4
create_time: 2020-01-21 11:14:28
source replica: cluster01-01-01
block_id: 
merge
202001_0_0_0
202001_1_1_0
202001_2_2_0
into
202001_0_2_1
deduplicate: 0

当merge完成后parts中当块数据会同样被删除，log文件不删除，会一直增加。若将表删除，则该表的所有记录都会从zookeeper中删除掉。

附录：clickhouse安装等实例教程：

https://blog.csdn.net/zhangpeterx/article/details/95091335

追梦青春09

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
clickhouse分析：zookeeper数据存储

安装clickhouse安装全新的clickhouse，暂无数据。开启debug级别的日志在config.xml中，开启<level>debug</level>模式，便于查看日志内容。创建复制表建表语句：create table test_test( Id Int32, YearMonth In...
复制链接

扫一扫