服务器异常断电导致文件损坏,clickhouse无法启动--Detaching Broken Part,DB::Exception: Suspiciously many (12) broken par

11 篇文章 0 订阅
3 篇文章 0 订阅

一、ClickHouse介绍

ClickHouse是由俄罗斯的Yandex公司开发的开源列式存储数据库(DBMS),主要用于在线分析处理查询(OLAP),能够使用SQL 查询实时生成分析数据报告,在海量数据分析和查询方面具有出色的性能表现,同时还能支持实时数据插入和更新。

列式存储的好处:

  • 对于列的聚合,计数,求和等统计操作原因优于行式存储。
  • 由于某一列的数据类型都是相同的,针对于数据存储更容易进行数据压缩,每一列选择更优的数据压缩算法,大大提高了数据的压缩比重。
  • 由于数据压缩比更好,一方面节省了磁盘空间,另一方面对于 cache 也有了更大的发挥空间。

二、问题描述

  • 对于某一类系统产品,公司有专门的服务器,部门内部人员平时可用于开发测试。
  • 昨天下午园区突然断电,所有服务器断掉。等来电后,重启服务器,尝试开启所有服务,无法访问系统。查看系统运行日志,发现是无法连接clichhouse的原因。
Cause: java.lang.RuntimeException: ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 127.0.0.1, port: 8123; Connect to 127.0.0.1:8123 [/127.0.0.1] failed: 拒绝连接
  • 多次尝试service clickhouse-server restart,仍无法启动clickhouse服务;
  • ps -ef命令查看服务启动情况,发现clickhouse无法启动,使用systemctl status clickhouse-server命令,得到结果如下:
● clickhouse-server.service - ClickHouse Server (analytic DBMS for big data)
   Loaded: loaded (/etc/systemd/system/clickhouse-server.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since 五 2023-08-18 08:57:20 CST; 7s ago
  Process: 43336 ExecStart=/usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml --pid-file=/runlickhouse-server.pid (code=exited, status=70)
 Main PID: 43336 (code=exited, status=70)

8月 18 08:57:20 localhost.localdomain systemd[1]: Unit clickhouse-server.service entered failed state.
8月 18 08:57:20 localhost.localdomain systemd[1]: clickhouse-server.service failed.
[root@localhost ~]# systemctl start clickhouse-server
[root@localhost ~]# systemctl status clickhouse-server
● clickhouse-server.service - ClickHouse Server (analytic DBMS for big data)
   Loaded: loaded (/etc/systemd/system/clickhouse-server.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since 五 2023-08-18 08:57:51 CST; 1s ago
  Process: 43388 ExecStart=/usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml --pid-file=/runlickhouse-server.pid (code=exited, status=70)
 Main PID: 43388 (code=exited, status=70)
  • clickhouse无法启动,具体还是要看clickhouse的报错日志,默认路径为/var/log/clickhouse-server/,查看clickhouse-server.err.log日志,发现问题:
2023.08.18 10:01:12.850478 [ 2911 ] {} <Error> bds_data.GkImsiInfo: Detaching broken part /home/web-server/db_data/data/bds_data/GmsiInfo/20230817_20230817_2103307_2103307_0. If it happened after update, it is likely because of backward incompability. You needo resolve this manually
2023.08.18 10:01:12.860249 [ 2886 ] {} <Error> Application: Caught exception while loading metadata: Code: 231. DB::Exception: Susciously many (12) broken parts to remove.: Cannot attach table `bds_data`.`GkImsiInfo` from metadata file /home/web-server/db_dataetadata/bds_data/GkImsiInfo.sql from query ATTACH TABLE bds_data.GkImsiInfo (`area` String, `createDate` DateTime, `imei` String, msi` String, `latitude` String, `longitude` String, `moduleId` UInt32, `opType` Int8, `positionId` UInt32, `rptTime` DateTime, `rpate` Date, `fcn` String, `groupId` UInt32, `country` String, `mobile` String, `tmsi` String, `rssi` String, `dataFrom` String, `deceId` UInt32) ENGINE = MergeTree(rptDate, intHash32(deviceId), (imsi, rptDate, intHash32(deviceId), rptTime), 8192): while loadingatabase `bds_data` from path /home/web-server/db_data/metadata/bds_data. (TOO_MANY_UNEXPECTED_DATA_PARTS), Stack trace (when copyi this message, always include the lines below):

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, boo @ 0x945ba5a in /usr/bin/clickhouse
1. DB::MergeTreeData::loadDataParts(bool) @ 0x116a2340 in /usr/bin/clickhouse
2. DB::StorageMergeTree::StorageMergeTree(DB::StorageID const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1allocator<char> > const&, DB::StorageInMemoryMetadata const&, bool, std::__1::shared_ptr<DB::Context>, std::__1::basic_string<charstd::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::MergeTreeData::MergingParams const&, std::__1::unique_ptr<DB:ergeTreeSettings, std::__1::default_delete<DB::MergeTreeSettings> >, bool) @ 0x118e602b in /usr/bin/clickhouse
3. ? @ 0x118db137 in /usr/bin/clickhouse
4. DB::StorageFactory::get(DB::ASTCreateQuery const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocatochar> > const&, std::__1::shared_ptr<DB::Context>, std::__1::shared_ptr<DB::Context>, DB::ColumnsDescription const&, DB::ConstrainDescription const&, bool) const @ 0x1132eca1 in /usr/bin/clickhouse
5. DB::createTableFromAST(DB::ASTCreateQuery, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::Conxt>, bool) @ 0x107c4e65 in /usr/bin/clickhouse
6. ? @ 0x107c2fb3 in /usr/bin/clickhouse
7. ? @ 0x107c3f9f in /usr/bin/clickhouse

总结一下报错关键字:

  • Detaching broken part
  • DB::Exception: Suspiciously many (15) broken parts to remove

三、解决方案

3.1 原因

  • 异常断电后,文件系统可能会出现问题,尤其是一些在不断进行读写合并的文件。由于clickhouse是开源列式存储数据库,服务器断电后,写入数据导致元数据与数据不一致。clickhouse在重启服务的时候会重新加载MergeTree表引擎数据,数据可能存在损坏情况。
  • clickhouse配置参数当中包含一个参数max_suspicious_broken_parts(默认路径 /etc/clickhouse-server/config.xml),默认值是10,可选范围是任意正整数。如果单个分区的损坏部分数量超过max_suspicious_broken_parts配置的值,则拒绝自动修复或者拒绝删除损坏部分的数据,并且在服务启动时直接报错退出。
  • 目前需要尽量避免该错误以免服务启动失败,推荐把该参数配置为1000或者更大的值。

3.2 操作修复

单表配置方式

在创建MergeTree表的时候特别配置一下max_suspicious_broken_parts参数。

CREATE TABLE foo
(
    `A` Int64
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS max_suspicious_broken_parts = 1000;

命令行方式

使用ALTER TABLE ... MODIFY SETTING命令修改。

ALTER TABLE foo
    MODIFY SETTING max_suspicious_broken_parts = 1000;

-- 恢复默认值
-- reset to default (use value from system.merge_tree_settings)
ALTER TABLE foo
    RESET SETTING max_suspicious_broken_parts;

配置文件方式

如果服务起不来,只能通过配置文件的形式修改max_suspicious_broken_parts值,有两种方式。

  • 法一:修改/etc/clickhouse-server/config.xml文件中max_suspicious_broken_parts的值,将其改为1000,重启clickhouse.
<merge_tree>
    <max_suspicious_broken_parts>1000</max_suspicious_broken_parts>
</merge_tree>
  • 法二:新建max_suspicious_broken_parts.xml文件写入如下内容。clickhouse的配置文件推荐放置在/etc/clickhouse-server/config.d/文件夹下生效。如果是在Ubuntu或者Centos上面以DEB或RPM安装包的形式启动的,需要把该文件放到/etc/clickhouse-server/config.d/,最后通过service clickhouse-server restart重启clickhouse就可以了。
<?xml version="1.0"?>
<yandex>
     <merge_tree>
         <max_suspicious_broken_parts>1000</max_suspicious_broken_parts>
     </merge_tree>
</yandex>

 

验证配置是否生效

连接到clickhouse之后执行查询,max_suspicious_broken_parts的值为1000 表示配置生效。

SELECT *
FROM system.merge_tree_settings
WHERE name LIKE '%max_suspicious_broken_parts%'

 

四、阅读参考

https://segmentfault.com/a/1190000042179953

记录一次服务器异常重启,CK启动失败 - 简书

https://clickhouse.com/docs/zh/operations/settings/merge-tree-settings

 

  • 3
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值