记一次 ClickHouse 数据迁移,实战经验

Clickhouse备份与恢复+异机远程备份脚本

Clickhouse 的数据备份有多种方式,其官方网站上就提供了多种备份方式可以参考,但是不同公司有不同的可用资源和业务需求,因此不存在一个通用的解决方案可以应对各种情况下的ClickHouse备份和恢复。 适用于 1GB 数据的方案可能并不适用于几十 PB 数据的情况。 有多种具备各自优缺点的可能方法,将在下面对其进行讨论。最好使用几种方法而不是仅仅使用一种方法来弥补它们的各种缺点。

背景:

某项目clickhouse 数据库迁移,数据量大约9亿条单表,版本为clickhouse20,迁移前我先在本地做了一些列的备份测试,下面我把我在生产中选用的备份方法和遇到的种种小插曲和大家分享下

实验环境:

官网下载的demo数据,大约有8G的测试数据

clickhouse 20.4.6

一、 文本文件导入导出

#说白了就是手动的把数据库里的数据导出成特定的格式,再导入;这种方式!!不推荐!!!!除非你的数据库真的很少,我当时从官网下载了一个demo 数据,手动导入导出,差点把我累死

导出:

clickhouse-client --password helloword --query="select * from iot_dev.t_indicator_record FORMAT CSV" > iot_dev.t_indicator_record.csv

导入: 注意FORMAT后面大写

cat iot_dev.t_indicator_record.csv | clickhouse-client --port 9008 --password helloword --query="INSERT INTO iot_dev.t_indicator_record FORMAT CSV"

二、 拷贝数据目录

#此方案和mysql 的冷数据恢复是一个方案,直接拷贝走clickhouse 的数据到另一台机器上,修改下相关配置就可以直接启动了,仔细的观察一下 ClickHouse 在文件系统上的目录结构(配置文件
/ect/clickhouse-server/config.xml 里面配置的 <path>),为了便于查看,只保留了 data 和 metadata 目录。

.
├── data
│   ├── default
│   ├── system
│   │   ├── asynchronous_metric_log
│   │   ├── metric_log
│   │   ├── query_log
│   │   ├── query_thread_log
│   │   └── trace_log
├── metadata
│   ├── default
│   │   └── v_table_size.sql
│   ├── default.sql
│   ├── system
│   │   ├── asynchronous_metric_log.sql
│   │   ├── metric_log.sql
│   │   ├── query_log.sql
│   │   ├── query_thread_log.sql
│   │   └── trace_log.sql

Clickhouse备份与恢复+异机远程备份脚本

data 目录里保存的是数据,每个数据库一个目录,内部每个表一个子目录。
metadata 目录里保存的是元数据,即数据库和表结构。其中
        <database>.sql 是 创建数据库的 DDL(ATTACH DATABASE default ENGINE = Ordinary)
        <database>/<table>.sql 是建表的 DDL (ATTACH TABLE ...)

基于这个信息,直接把data和metadata目录(要排除 system 库)复制到新集群,即可实现数据迁移

实操步骤:

1、停止原先的clickhouse数据库,并打包好 对应数据库或表的 data 和 metadata 数据
2、拷贝到目标clickhouse数据库对应的目录,比如/var/lib/clickhouse 目录下
3、给clickhouse 赋予权限, chown -Rf clickhouse:clickhouse /var/lib/clickhouse/*
						                              chown -Rf clickhouse:clickhouse /var/lib/clickhouse
4、重启目标clickhouse数据库
5、验证数据
        select count(1) form iot_dev.t_indicator_record;

三、 使用第三方工具,clickhouse-backup

此方案,是我在测试的过程中使用最方便的备份与恢复工具,因此我会详细的说下此工具的使用方法及需要注意的地方。

clickhouse-backup 是社区开源的一个 ClickHouse 备份工具,可用于实现数据迁移。其原理是先创建一个备份,然后从备份导入数据,类似 MySQL 的 mysqldump + SOURCE。这个工具可以作为常规的异地冷备方案

# 使用限制:

  • 支持1.1.54390以上的ClickHouse
  • 仅MergeTree系列表引擎
  • 不支持备份Tiered storage或storage_policy
  • 云存储上的最大备份大小为5TB
  • AWS S3上的parts数最大为10,000

(1)、下载clickhouse-backup 软件包

官方提供了二进制版本和rpm包的方式,我这里采用了二进制安装

github地址: https://github.com/AlexAkulov/clickhouse-backup

下载地址: https://github.com/AlexAkulov/clickhouse-backup/releases/download/v1.0.0/clickhouse-backup.tar.gz

#解压软件包
tar -xf clickhouse-backup.tar.gz -C /root

#复制二进制文件到系统中
cp /root/clickhouse-backup /usr/bin

#创建clickhouse-backup 配置文件目录
mkdir -p /etc/clickhouse-backup

#拷贝模板配置文件到clickhouse-backup 配置文件目录下
cp /root/config.yml /etc/clickhouse-backup/

(2)、修改clickhouse-backup 配置文件config.yml

# 根据clickhouse自身的配置来修改 此配置文件,比如 clickhouse的数据目录,数据库密码,监控地址及端口

官方的配置说明:

clickhouse-backup 除了备份到本机,此外还支持远程备份的方式,备份到s3 上【对象存储】,ftp,sftp 上,还支持 使用 api 接口 访问

general:
  remote_storage: none           # REMOTE_STORAGE,远程备份的方式,对应下面的【azblo,s3,gcs,api,ftp,sftp】
  max_file_size: 107374182400    # MAX_FILE_SIZE
  disable_progress_bar: false    # DISABLE_PROGRESS_BAR
  backups_to_keep_local: 0       # BACKUPS_TO_KEEP_LOCAL, 本地备份保留个数 ,0为不限制
  backups_to_keep_remote: 0      # BACKUPS_TO_KEEP_REMOTE,远程的备份保留个数,0为不限制
  log_level: info                # LOG_LEVEL,日志等级
  allow_empty_backups: false     # ALLOW_EMPTY_BACKUPS
clickhouse:
  username: default                # CLICKHOUSE_USERNAME,数据库用户名,默认
  password: ""                     # CLICKHOUSE_PASSWORD,数据库密码
  host: localhost                  # CLICKHOUSE_HOST,数据库实例地址
  port: 9000                       # CLICKHOUSE_PORT,数据库连接端口
  disk_mapping: {}                 # CLICKHOUSE_DISK_MAPPING
  skip_tables:                     # CLICKHOUSE_SKIP_TABLES ,备份时,需要忽略的库表
    - system.*
  timeout: 5m                      # CLICKHOUSE_TIMEOUT
  freeze_by_part: false            # CLICKHOUSE_FREEZE_BY_PART
  secure: false                    # CLICKHOUSE_SECURE
  skip_verify: false               # CLICKHOUSE_SKIP_VERIFY
  sync_replicated_tables: true     # CLICKHOUSE_SYNC_REPLICATED_TABLES
  log_sql_queries: true            # CLICKHOUSE_LOG_SQL_QUERIES

  config_dir:      "/etc/clickhouse-server"              # CLICKHOUSE_CONFIG_DIR
  restart_command: "systemctl restart clickhouse-server" # CLICKHOUSE_RESTART_COMMAND

azblob:
  endpoint_suffix: "core.windows.net" # AZBLOB_ENDPOINT_SUFFIX
  account_name: ""             # AZBLOB_ACCOUNT_NAME
  account_key: ""              # AZBLOB_ACCOUNT_KEY
  sas: ""                      # AZBLOB_SAS
  container: ""                # AZBLOB_CONTAINER
  path: ""                     # AZBLOB_PATH
  compression_level: 1         # AZBLOB_COMPRESSION_LEVEL
  compression_format: tar      # AZBLOB_COMPRESSION_FORMAT
  sse_key: ""                  # AZBLOB_SSE_KEY
s3:
  access_key: ""                   # S3_ACCESS_KEY,<AWS访问密钥>
  secret_key: ""                   # S3_SECRET_KEY
  bucket: ""                       # S3_BUCKET,<存储桶BUCKET名称>
  endpoint: ""                     # S3_ENDPOINT
  region: us-east-1                # S3_REGION
  acl: private                     # S3_ACL
  force_path_style: false          # S3_FORCE_PATH_STYLE
  path: ""                         # S3_PATH , <备份路径>
  disable_ssl: false               # S3_DISABLE_SSL
  compression_level: 1             # S3_COMPRESSION_LEVEL
  # supports 'tar', 'gzip', 'zstd', 'brotli'
  compression_format: tar          # S3_COMPRESSION_FORMAT
  # empty (default), AES256, or aws:kms
  sse: AES256                      # S3_SSE
  disable_cert_verification: false # S3_DISABLE_CERT_VERIFICATION
  storage_class: STANDARD          # S3_STORAGE_CLASS
  debug: false                     # S3_DEBUG
gcs:
  credentials_file: ""         # GCS_CREDENTIALS_FILE
  credentials_json: ""         # GCS_CREDENTIALS_JSON
  bucket: ""                   # GCS_BUCKET
  path: ""                     # GCS_PATH
  compression_level: 1         # GCS_COMPRESSION_LEVEL
  compression_format: tar      # GCS_COMPRESSION_FORMAT
cos:
  url: ""                      # COS_URL
  timeout: 2m                  # COS_TIMEOUT
  secret_id: ""                # COS_SECRET_ID
  secret_key: ""               # COS_SECRET_KEY
  path: ""                     # COS_PATH
  compression_format: tar      # COS_COMPRESSION_FORMAT
  compression_level: 1         # COS_COMPRESSION_LEVEL
api:
  listen: "localhost:7171"     # API_LISTEN
  enable_metrics: true         # API_ENABLE_METRICS
  enable_pprof: false          # API_ENABLE_PPROF
  username: ""                 # API_USERNAME
  password: ""                 # API_PASSWORD
  secure: false                # API_SECURE
  certificate_file: ""         # API_CERTIFICATE_FILE
  private_key_file: ""         # API_PRIVATE_KEY_FILE
  create_integration_tables: false # API_CREATE_INTEGRATION_TABLES
ftp:
  address: ""                  # FTP_ADDRESS
  timeout: 2m                  # FTP_TIMEOUT
  username: ""                 # FTP_USERNAME
  password: ""                 # FTP_PASSWORD
  tls: false                   # FTP_TLS
  path: ""                     # FTP_PATH
  compression_format: tar      # FTP_COMPRESSION_FORMAT
  compression_level: 1         # FTP_COMPRESSION_LEVEL
  debug: false                 # FTP_DEBUG
sftp:
  address: ""                  # SFTP_ADDRESS
  username: ""                 # SFTP_USERNAME
  password: ""                 # SFTP_PASSWORD
  key: ""                      # SFTP_KEY
  path: ""                     # SFTP_PATH
  compression_format: tar      # SFTP_COMPRESSION_FORMAT
  compression_level: 1         # SFTP_COMPRESSION_LEVEL

我本次使用的配置文件,就是最简单的配置,直接本地备份,然后通过执行脚本的方式去推送备份文件到备份主机上

[root@localhost clickhouse-backup]# cat config.yml 

general:
  max_file_size: 1099511627776
  disable_progress_bar: false
  backups_to_keep_local: 0
  backups_to_keep_remote: 0
  log_level: info
  allow_empty_backups: false
clickhouse:
  username: default
  password: ""
  host: localhost
  port: 9000
  disk_mapping: {}
  skip_tables:
  - system.*
  - default.*
  timeout: 5m
  freeze_by_part: false
  secure: false
  skip_verify: false
  sync_replicated_tables: true
  skip_sync_replica_timeouts: true
  log_sql_queries: false

(3)、查看clickhouse-backup 相关命令

NAME:
   clickhouse-backup - Tool for easy backup of ClickHouse with cloud support

USAGE:
   clickhouse-backup <command> [-t, --tables=<db>.<table>] <backup_name>

VERSION:
   1.0.0

DESCRIPTION:
   Run as 'root' or 'clickhouse' user

COMMANDS:
   tables          Print list of tables
   create          Create new backup
   create_remote   Create and upload
   upload          Upload backup to remote storage
   list            Print list of backups
   download        Download backup from remote storage
   restore         Create schema and restore data from backup
   restore_remote  Download and restore
   delete          Delete specific backup
   default-config  Print default config
   server          Run API server
   help, h         Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --config FILE, -c FILE  Config FILE name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG]
   --help, -h              show help
   --version, -v           print the version

1、 查看全部默认的配置项

 clickhouse-backup default-config

2、 查看可备份的表【已在配置文件中过滤掉system和default 库下面的所有表】

 [root@localhost clickhouse-backup]# clickhouse-backup tablesbrdatasets.hits_v1  1.50GiB  default 

3、 创建备份

#全库备份

clickhouse-backup create
2021/09/06 10:03:23  info done                      backup=2021-09-06T14-03-23 operation=create table=datasets.hits_v1
2021/09/06 10:03:23  info done                      backup=2021-09-06T14-03-23 operation=create

备份存储在中 $data_path/backup 下,备份名称默认为时间戳,可手动指定备份名称

 clickhouse-backup create fxkjnj_0906

备份包含两个目录:

  • metadata目录: 包含重新创建所需的DDL SQL
  • shadow目录: 包含作为ALTER TABLE ... FREEZE操作结果的数据
[root@localhost backup]# ll /var/lib/clickhouse/backup/2021-09-06T14-03-23/
total 4
drwxr-x--- 3 clickhouse clickhouse  22 Sep  6 10:03 metadata
-rw-r----- 1 clickhouse clickhouse 865 Sep  6 10:03 metadata.json
drwxr-x--- 3 clickhouse clickhouse  22 Sep  6 10:03 shadow

#单表备份

 clickhouse-backup create [-t, --tables=<db>.<table>] <backup_name>

备份表datasets.hits_v1

 clickhouse-backup create  -t datasets.hits_v1

备份多个表datasets.hits_v1, datasets.hits_v2

 clickhouse-backup create  -t datasets.hits_v1,datasets.hits_v2

4、查看备份记录

[root@localhost datasets]# clickhouse-backup list
2021-09-06T14-03-23   1.50GiB   06/09/2021 14:03:23   local      
2021/09/06 10:10:50 error InvalidParameter: 1 validation error(s) found.
- minimum field size of 1, ListObjectsV2Input.Bucket.

5、删除备份文件

 [root@localhost datasets]# clickhouse-backup delete local 2021-09-06T14-03-23

(4)、数据恢复

语法:

clickhouse-backup restore 备份名

[root@localhost shadow]# clickhouse-backup restore --help
NAME:
   clickhouse-backup restore - Create schema and restore data from backup

USAGE:
   clickhouse-backup restore  [-t, --tables=<db>.<table>] [-s, --schema] [-d, --data] [--rm, --drop] <backup_name>

OPTIONS:
   --config FILE, -c FILE                   Config FILE name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG]
   --table value, --tables value, -t value  
   --schema, -s                             Restore schema only
   --data, -d                               Restore data only
   --rm, --drop                             Drop table before restore

参数:

  • --table 只恢复特定表,可使用正则。如针对特定的数据库:--table=dbname.*
  • --schema 只还原表结构
  • --data 只还原数据
  • --rm 数据恢复之前,先删除需要恢复的表

四、使用clickhouse-backup备份与恢复数据-实战

4.1 、本机备份与恢复

1、备份前,先查看数据库里的数据量,一共是17747796条数据
[root@localhost ~]# clickhouse-client
ClickHouse client version 20.4.6.53 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.4.6 revision 54434.
localhost :) 
localhost :) select count(1) from datasets.hits_v1;

SELECT count(1)
FROM datasets.hits_v1

┌─count(1)─┐
│ 17747796 │
└──────────┘

1 rows in set. Elapsed: 0.016 sec. 

localhost :) exit



2、创建clickhouse-backup配置文件,本地备份文件最多保留7个

[root@localhost ~]# vim /etc/clickhouse-backup/config.yml 
general:
  max_file_size: 1099511627776
  disable_progress_bar: false
  backups_to_keep_local: 7
  log_level: info
  allow_empty_backups: false
clickhouse:
  username: default
  password: "helloword"
  host: localhost
  port: 9000
  disk_mapping: {}
  skip_tables:
  - system.*
  - default.*
  timeout: 5m
  freeze_by_part: false
  secure: false
  skip_verify: false
  sync_replicated_tables: true
  skip_sync_replica_timeouts: true
  log_sql_queries: false


3、查看可备份的表
[root@localhost backup]# clickhouse-backup tables
datasets.hits_v1  1.50GiB  default  


4、使用clickhouse-backup 做个备份
[root@localhost clickhouse]# clickhouse-backup create
2021/09/06 10:50:42  info done                      backup=2021-09-06T14-50-42 operation=create table=datasets.hits_v1
2021/09/06 10:50:42  info done                      backup=2021-09-06T14-50-42 operation=create



5、模拟进行数据破坏,!!!生产环境勿用!!!此处为测试
[root@localhost clickhouse]# clickhouse-client 
ClickHouse client version 20.4.6.53 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.4.6 revision 54434.

localhost :) drop database datasets;
DROP DATABASE datasets
Ok.
0 rows in set. Elapsed: 0.002 sec. 

SHOW DATABASES
┌─name───────────────────────────┐
│ _temporary_and_external_tables │
│ default                        │
│ system  
│ test
└────────────────────────────────┘
3 rows in set. Elapsed: 0.002 sec. 




6、恢复数据库
[root@localhost backup]# clickhouse-backup restore 2021-09-06T14-50-42 -s -d --rm
2021/09/06 10:57:04  info done                      backup=2021-09-06T14-50-42 operation=restore table=datasets.hits_v1
2021/09/06 10:57:04  info done                      backup=2021-09-06T14-50-42 operation=restore


7、验证数据库,查看数据是否复原,数据量是否匹配

[root@localhost backup]# clickhouse-client 
ClickHouse client version 20.4.6.53 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.4.6 revision 54434.

localhost :) show databases;

SHOW DATABASES
┌─name───────────────────────────┐
│ _temporary_and_external_tables │
│ datasets                       │
│ default                        │
│ system                         │
│ test                           │
└────────────────────────────────┘
5 rows in set. Elapsed: 0.002 sec. 

localhost :) select count(1) from datasets.hits_v1;

SELECT count(1)
FROM datasets.hits_v1

┌─count(1)─┐
│ 17747796 │
└──────────┘

1 rows in set. Elapsed: 0.020 sec. 

localhost :) 

4.2、异机远程备份与恢复

所谓异机远程备份,就是指将备份好的文件,同步传输到远程 专业的存储服务器,往往这些存储服务器都是大存储,可以备份存储1年甚至几年的数据库备份都没问题

环境:

clickhouse 数据库 192.168.99.102

存储服务器 192.168.99.101

1、备份前,先查看数据库里的数据量,一共是17747796条数据
[root@localhost ~]# clickhouse-client
ClickHouse client version 20.4.6.53 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.4.6 revision 54434.
localhost :) 
localhost :) select count(1) from datasets.hits_v1;

SELECT count(1)
FROM datasets.hits_v1

┌─count(1)─┐
│ 17747796 │
└──────────┘

1 rows in set. Elapsed: 0.016 sec. 

localhost :) exit



2、创建clickhouse-backup配置文件,本地备份文件最多保留7个

[root@localhost ~]# vim /etc/clickhouse-backup/config.yml 
general:
  max_file_size: 1099511627776
  disable_progress_bar: false
  backups_to_keep_local: 7
  log_level: info
  allow_empty_backups: false
clickhouse:
  username: default
  password: "helloword"
  host: localhost
  port: 9000
  disk_mapping: {}
  skip_tables:
  - system.*
  - default.*
  timeout: 5m
  freeze_by_part: false
  secure: false
  skip_verify: false
  sync_replicated_tables: true
  skip_sync_replica_timeouts: true
  log_sql_queries: false


3、查看可备份的表
[root@localhost backup]# clickhouse-backup tables
datasets.hits_v1  1.50GiB  default  


4、使用clickhouse-backup 做个备份
[root@localhost clickhouse]# clickhouse-backup create
2021/09/06 10:50:42  info done                      backup=2021-09-06T14-50-42 operation=create table=datasets.hits_v1
2021/09/06 10:50:42  info done                      backup=2021-09-06T14-50-42 operation=create



5、使用SCP 将备份文件,传输到存储服务器上
[root@localhost clickhouse]#  scp -rp /var/lib/clickhouse/backup/2021-09-06T14-50-42 fxkjnj@192.168.99.101:/data/clickhouse-backup/

6、登录到存储服务器 ,查看备份数据是否存在
[root@localhost clickhouse]# ll /data/clickhouse-backup/2021-09-06T14-50-42
total 4
drwxr-x--- 3 clickhouse clickhouse  22 Sep  6 10:50 metadata
-rw-r----- 1 clickhouse clickhouse 865 Sep  6 10:50 metadata.json
drwxr-x--- 3 clickhouse clickhouse  22 Sep  6 10:50 shadow

五、使用脚本定期异机远程备份


环境: clickhouse 数据库 192.168.99.102 存储服务器 192.168.99.101
条件:

  • 存储服务器 建立备份目录,/data/clickhouse-back
  • clickhouse 数据库 可以免密到 存储服务器 上,免密传输备份文件
vim  /var/lib/clickhouse/clickhouse-backup.sh
#!/bin/bash
#Author        fxkjnj.com
####################################################
##
## 		clickhouse-back script
##      backup data at remote host
##      you should config ssh trust
## 
####################################################



MSNAME=yz
BAKFILE=$MSNAME-`date +%Y%m%d%H%M%S`
LOCAL_BAKDIR=/var/lib/clickhouse/backup

REMOTE_BAKDIR=/data/clickhouse-back
REMOTE_HOST=root@192.168.99.101



#备份到本地
/usr/bin/clickhouse-backup create $BAKFILE
if [[ $? != 0 ]]; then
	echo "clickhouse-backup Create FAILED" > /var/log/clickhouse-backup.log
	exit
else
#SCP备份到远程主机
scp -rp $LOCAL_BAKDIR/$BAKFILE $REMOTE_HOST:$REMOTE_BAKDIR/
	if [[ $? != 0 ]]; then
		echo "clickhouse-backup FAILED" > /var/log/clickhouse-backup.log
	else
		echo "clickhouse-backup successful" > /var/log/clickhouse-backup.log
	fi
fi


#定期删除远程备份文件
ssh $REMOTE_HOST  "find $REMOTE_BAKDIR/yz* -maxdepth 0 -mtime +30 -type d | xargs rm -rf {}"

六、常见问题

1、问题现象:使用clickhouse-backup 恢复数据时,提示UUID 问题

clickhouse-backup restore 2021-08-21T06-35-10 -s -d --rm
2021/08/21 14:40:51 error can't create table `default`.`t`: code: 57, message: 
Directory for table data store/c57/c5780d8a-7d5a-47a3-8578-0d8a7d5a37a3/ already exists after 1 times, please check your schema depencncies

解决方法:

去掉  备份文件中  ${backup_path}/2021-08-21T06-35-10/metadata/default/t.json 中的UUID
UUID '80ea6411-9c37-4d47-80ea-64119c374d47'


再次执行恢复
clickhouse-backup restore 2021-08-21T06-35-10 -s -d --rm

SELECT count(1)
FROM datasets.hits_v1

┌─count(1)─┐
│ 17747796 │
└──────────┘

1 rows in set. Elapsed: 0.016 sec. 

localhost :) exit
  • 2
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 10
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值