Clickhouse备份与恢复
一、安装clickhouse-backup
1、下载安装包
cd /server/tools
wget https://github.com/Altinity/clickhouse-backup/releases/download/v2.4.33/clickhouse-backup-linux-amd64.tar.gz
2、创建存放clickhouse-backup的目录,并解压
mkdir -pv /data/clickhouse-backup
tar xvf clickhouse-backup-linux-amd64.tar.gz -C /data/clickhouse-backup/
3、创建软连接
ln -sv /data/clickhouse-backup/build/linux/amd64/clickhouse-backup /usr/local/bin
4、测试clickhouse-backup命令
clickhouse-backup -v
5、创建配置文件
#注意:下面的配置文件是按本地标准来的,配置文件中默认data_path:"/var/lib/clickhouse",如果clickhouse-server存储数据的路径变了,这个地方也需要加上data_path
mkdir -p /etc/clickhouse-backup/
cd /etc/clickhouse-backup/
vim config.yml
general:
remote_storage: none
backups_to_keep_local: 7 # 本地备份保留个数,默认0表示不自动做备份清理
backups_to_keep_remote: 31 # 远程备份保留个数
clickhouse:
username: default
password: "XXXXXX"
host: localhost
port: 9000
#data_path: "/var/lib/clickhouse"
二、自动备份数据
1、查看可备份的表
clickhouse-backup tables
2、全库备份
clickhouse-backup create
备份存储在中 $data_path/backup 下(默认在/var/lib/clickhouse/backup),备份名称默认为时间戳,可手动指定备份名称。例如:
clickhouse-backup create ch_bk_20240308
备份包含两个目录:
- 'metadata’目录: 包含重新创建所需的DDL SQL
- 'shadow’目录: 包含作为ALTER TABLE … FREEZE操作结果的数据。
3、单表备份
语法:
clickhouse-backup create [-t, --tables=<db>.<table>] <backup_name>
备份test库中的t_order表
clickhouse-backup create -t test.t_order ch_t_order_20240308
4、备份多表
备份test库中的t_order表和t_new表(两个表用","隔开)
clickhouse-backup create -t test.t_order,test.t_new ch_two_bak_20240308
5、备份到远程对象存储(例如s3)
(1)修改配置文件
#vim /etc/clickhouse-backup/config.yml #添加下面内容,参数根据自己情况填写
s3:
access_key: <AWS访问密钥>
secret_key: <AWS SECRET KEY>
bucket: <存储桶BUCKET名称>
region: us-east-1
path: "/some/path/in/bucket" <备份路径>
(2)上传备份(upload)
clickhouse-backup upload 2024-03-08T07-32-37
**注意:**使用clickhouse-backup upload 上传备份可以使用参数 --diff-from
将文件与以前的本地备份进行比较,仅上载新的/更改的文件。
必须保留先前的备份,以便从新备份中进行还原
(3)下载备份(download)
clickhouse-backup download 2024-03-08T07-32-37
6、查看备份文件
clickhouse-backup list
#2024-03-08T07-32-37 244.00MiB 08/03/2024 07:32:37 local
7、删除备份文件
注意:删除时要考虑一下
clickhouse-backup delete local 2024-03-08T07-32-37
8、删除shadow下的临时备份文件
注意:删除时要考虑一下
clickhouse-backup clean
三、(自动备份数据)数据恢复
1、语法:
clickhouse-backup restore 备份名
NAME:
clickhouse-backup restore - Create schema and restore data from backup
USAGE:
clickhouse-backup restore [-t, --tables=<db>.<table>] [-m, --restore-database-mapping=<originDB>:<targetDB>[,<...>]] [--partitions=<partitions_names>] [-s, --schema] [-d, --data] [--rm, --drop] [-i, --ignore-dependencies] [--rbac] [--configs] <backup_name>
OPTIONS:
--config value, -c value Config 'FILE' name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG]
--table value, --tables value, -t value Restore only database and objects which matched with table name patterns, separated by comma, allow ? and * as wildcard
--restore-database-mapping value, -m value Define the rule to restore data. For the database not defined in this struct, the program will not deal with it.
--partitions partition_id Restore backup only for selected partition names, separated by comma
If PARTITION BY clause returns numeric not hashed values for partition_id field in system.parts table, then use --partitions=partition_id1,partition_id2 format
If PARTITION BY clause returns hashed string values, then use --partitions=('non_numeric_field_value_for_part1'),('non_numeric_field_value_for_part2') format
If PARTITION BY clause returns tuple with multiple fields, then use --partitions=(numeric_value1,'string_value1','date_or_datetime_value'),(...) format
Values depends on field types in your table, use single quotes for String and Date/DateTime related types
Look at the system.parts partition and partition_id fields for details https://clickhouse.com/docs/en/operations/system-tables/parts/
--schema, -s Restore schema only
--data, -d Restore data only
--rm, --drop Drop exists schema objects before restore
-i, --ignore-dependencies Ignore dependencies when drop exists schema objects
--rbac, --restore-rbac, --do-restore-rbac Restore RBAC related objects
--configs, --restore-configs, --do-restore-configs Restore 'clickhouse-server' CONFIG related files
--rbac-only Restore RBAC related objects only, will skip backup data, will backup schema only if --schema added
--configs-only Restore 'clickhouse-server' configuration files only, will skip backup data, will backup schema only if --schema added
重要选项
- –table 只恢复特定表,可以用正则(如还原指定数据库:–table=dbname.*)
- –schema 只还原表结构
- –data 只还原数据
2、全库恢复
clickhouse-backup restore /var/lib/clickhouse/backup/备份名 #需要保证原来的库完全被删除,否则可能会报错,默认会去/var/lib/clickhouse/backup路径下去找备份文件夹
注意:
恢复过程可能会存在这个报错,warn can’t create table ‘test.t_new’: code: 57, message: Directory for table data store/05c/05cbcdec-fa1c-4cf9-98d6-98868b16fa38/ already exists, will try again backup=2024-03-11T02-06-53 operation=restore
解决方法:
需要删除/var/lib/clickhouse/backup/2024-03-11T02-06-53/metadata/test/t_new.json中的UUID '05cbcdec-fa1c-4cf9-98d6-98868b16fa38’删除掉,重新执行clickhouse-backup restore /var/lib/clickhouse/backup/备份名。
#vim /var/lib/clickhouse/backup/2024-03-11T02-06-53/metadata/test/t_new.json
{
"table": "t_new",
"database": "test",
"parts": {
"default": [
{
"name": "202103_1_1_0"
},
{
"name": "202203_2_2_0"
}
]
},
"query": "CREATE TABLE test.t_new (`id` Int64 COMMENT '订单id', `datetime` DateTime COMMENT '订单日期', `name` String COMMENT '商品名称', `price` Decimal(9, 2) COMMENT '商品价格', `user_id` Int64 COMMENT '用户id') ENGINE = MergeTree PARTITION BY toYYYYMM(datetime) ORDER BY id SETTINGS index_granularity = 8192",
"size": {
"default": 2248
},
"total_bytes": 1328,
"metadata_only": false
}
3、只恢复表结构
#使用 --schema 恢复表的表结构
clickhouse-backup restore 备份名 --table 库名.表名 --schema
4、只恢复数据
注意:由于是ATTACH PARTITION操作,如果执行2次的话,数据会翻倍
#用 --data 恢复表中数据(注意:由于是ATTACH PARTITION操作,如果执行2次的话,数据会翻倍)
clickhouse-backup restore 备份名 --table 库名.表名 --data
还原测试数据(可参考)
#创建库
create database if not exists test;
#创建表
create table if not exists test.t_order
(
id Int64 COMMENT '订单id',
datetime DateTime COMMENT '订单日期',
name String COMMENT '手办名称',
price Decimal32(2) COMMENT '手办价格',
user_id Int64 COMMENT '用户id'
) engine = MergeTree
partition by toYYYYMM(datetime)
order by id ;
#插入数据
insert into t_order values (1, '2022-03-08 12:40:00', '欧阳锋', 999.99, 202203080001)
insert into t_order values (2, '2023-03-08 12:44:00', '梅超风', 1111, 202303080002)
insert into t_order values (3, '2022-03-08 12:45:00', '黄蓉', 8989, 202203080001)
#用全库备份先预留一份,再进行下面操作:
#删除数据
alter table t_order delete where id = 2;
#清空表
truncate table t_order;
#删除表
drop table t_order;
#删除库
drop database test;
四、手动备份数据
ClickHouse通过其
ALTER TABLE…FREEZE
功能提供了对即时时间点备份的本地支持。
1、确保/var/lib/clickhouse/shadow/目录为空(默认目录)
ls /var/lib/clickhouse/shadow/
2、执行备份表命令
echo -n 'alter table 库名.表名 freeze' | clickhouse-client --password "XXXXXX"
3、保存备份
cd /var/lib/clickhouse/
mkdir backup
cp -r shadow/ backup/表名_20240308
4、清空shadow目录
rm -rf /var/lib/clickhouse/shadow/*
5、注意:备份表结构
手动备份数据的时候,只恢复数据,不恢复表结构,所以也需要提前将表结构备份出来。
cp -r /var/lib/clickhouse/metadata/库名/表名.sql /var/lib/clickhouse/backup
vim /var/lib/clickhouse/backup/表名.sql
#需要进行两处修改
#1、需要将ATTACH替换成CREATE
#2、需要将 _ UUID '2fbd6c71-1ccc-47eb-b767-e6580bf18e47'替换成表名
五、(手动备份数据)数据恢复
1、表被删除的话,创建表
#执行上面备份的sql文件
clickhouse-client --password="XXXXX" --database=库名 -m -n --query="$(cat /var/lib/clickhouse/backup/表名.sql)"
2、将backup/表名_20240308备份复制到表的“ detached”目录中
cp -r /var/lib/clickhouse/backup/表名_20240308/1/store/b75/b7506b44-4286-4cd8-9925-d17fa0e0fa0b/* /var/lib/clickhouse/data/库名/表名/detached/
3、重新给/var/lib/clickhouse授权
chown -R clickhouse.clickhouse /var/lib/clickhouse
4、附上分离的零件
#注意这里需要选择一下分区,分区的选择可以根据手动备份的目录名进行获取,例如:/var/lib/clickhouse/backup/表名_20240308/1/store/b75/b7506b44-4286-4cd8-9925-d17fa0e0fa0b/202103_1_1_0,就是在202103分区,手动备份是按照分区来进行恢复的,并不是一次性都恢复。
echo 'alter table 库名.表名 attach partition 202103' | clickhouse-client --password="XXXXXX"
5、确认数据已还原
echo 'select count() from 库名.表名' | clickhouse-client --password="XXXXXX"
六、编写shell脚本设置定时任务
1、自动备份脚本
#创建存放脚本的目录
mkdir -pv /var/lib/clickhouse/script
#编写自动备份脚本
#vim /var/lib/clickhouse/script/ch_full_backup.sh
#!/bin/bash
/usr/local/bin/clickhouse-backup create ch_full_backup_`date +%Y-%m-%dT%H-%M-%S`
2、设置定时任务
#每天凌晨1点进行备份
0 1 * * * sh /var/lib/clickhouse/script/ch_full_backup.sh