主要现象:服务器断电重启后,重新启动clickhouse时一直报错,如下:
2024.05.20 10:56:32.882584 [ 221 ] {} <Error> xxx.analys_log (f2a5df75-9220-4ee1-b2a5-df759220cee1): Detaching broken part /var/lib/clickhouse/store/f2a/f2a5df75-9220-4ee1-b2a5-df759220cee1/20240517_17474114_17474119_1. If it happened after update, it is likely because of backward incompability. You need to resolve this manually
2024.05.20 10:56:33.071531 [ 53 ] {} <Error> Application: Caught exception while loading metadata: Code: 231, e.displayText() = DB::Exception: Suspiciously many (45) broken parts to remove.: Cannot attach table `xxx`.`analys_log` from metadata file /var/lib/clickhouse/store/7e0/7e03a687-555f-4282-be03-a687555fb282/analys_log.sql from query ATTACH TABLE XXX.analys_log UUID 'f2a5df75-9220-4ee1-b2a5-df759220cee1' (`xxx` String DEFAULT '', `xxx` String DEFAULT '', `xxx` String DEFAULT '', `xxx` String DEFAULT '', `xxx` String DEFAULT '', `xxx` String DEFAULT '', `xxx` DateTime, `xxx` Date DEFAULT toDate(now()), `xxx` String DEFAULT '') ENGINE = MergeTree PARTITION BY toYYYYMMDD(xxx) ORDER BY part_create SETTINGS index_granularity = 8192: while loading database `xxx` from path /var/lib/clickhouse/metadata/xxx, Stack trace (when copying this message, always include the lines below):
主要错误为: DB::Exception: Suspiciously many (45) broken parts to remove.
经过搜索,是clickhouse配置参数太低,具体如下:
这个是发生在机器断电场景下的报错,查找原因是说因为写入数据造成的元数据和数据不一致问题
clickhouse
在重启服务的时候会重新加载MergeTree
表引擎数据,数据可能存在损坏情况。
config.xml 配置参数当中包含一个参数max_suspicious_broken_parts
,默认值是5,可选值范围是任意正整数,如果单个分区中的损坏部分数量超过max_suspicious_broken_parts
配置的值,则拒绝自动修复或者拒绝删除损坏部分的数据,并且服务启动时候直接报错退出
目前需要尽量避免该错误以免服务启动失败,推荐把该参数配置为1000
或者更大的值
<!-- Settings to fine tune MergeTree tables. See documentation in source code, in MergeTreeSettings.h -->
<merge_tree>
<max_suspicious_broken_parts>1000</max_suspicious_broken_parts>
</merge_tree>
修改后,重启clickhouse即可。docker-clickhouse.yml配置如下:
version: '3.3'
services:
clickhouse:
restart: always
image: yandex/clickhouse-server
container_name: clickhouse
hostname: xxx-clickhouse
privileged: true
ports:
- "8123:8123"
volumes:
- ./database:/var/lib/clickhouse
- ./config:/etc/clickhouse-server
- ./logs:/var/log/clickhouse-server
- /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime
ulimits:
nproc: 65535
nofile:
soft: 262144
hard: 262144
下面是docker-compose重启
docker-compose -f docker-clickhouse.yml down
docker-compose -f docker-clickhouse.yml up -d
重启后,查询相关参数,
SELECT *
FROM system.merge_tree_settings
WHERE name LIKE '%max_suspicious_broken_parts%'
结果如下:
相同教程参考:解决clickhouse服务器启动异常Suspiciously many broken parts to remove