postgres 使用 wal 备份
Administrator
Unable to backup data into minio
Minio test
PG Environment Variables
描述
什么是备份啊,什么是 wal 备份啊这些通通省略。 通过 wal 备份到 s3, s3 用的是 ceph 的 rgw。
自定义配置文件
custom-env.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: pod-env-overrides
data:
BACKUP_SCHEDULE: "*/10 * * * *" # Every 10 minus,测试方便嘛
BACKUP_NUM_TO_RETAIN: "14"
USE_WALG_BACKUP: "true"
AWS_ENDPOINT: "http://rook-ceph-rgw-my-store.rook-ceph.svc:80"
AWS_ACCESS_KEY_ID: "8DPG4580SPX27TDG2RL7"
AWS_SECRET_ACCESS_KEY: "TgKL0S7pgqcOo7SnF7S0h5pbT7XqCUaGMN1uKetX"
WALG_DISABLE_S3_SSE: "true"
AWS_S3_FORCE_PATH_STYLE: "true"
# USE_WALG_RESTORE 和 CLONE_USE_WALG_RESTORE 从 s3 中恢复的时候需要
USE_WALG_RESTORE: "true"
CLONE_USE_WALG_RESTORE: "true"
### 访问 PG 的用户、密码
PGPASSWORD: u14b3MrK
PGUSER: postgres
# 如果没有用户和密码的话,那么可能会出现下面的情况
# envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"
## 2022-03-02 10:01:44.872 - /scripts/postgres_backup.sh - I was called as: /scripts/postgres_backup.sh /home/postgres/pgdata/pgroot/data
## psql: error: could not connect to server: FATAL: role "root" does not exist
##2022-03-02 10:01:45.220 - /scripts/postgres_backup.sh - ERROR: Recovery state unknown:
kubectl -n default apply -f custom-env.yaml
修改 operator 的 configmap
kubectl edit configmaps postgres-operator
pod_environment_configmap: default/pod-env-overrides
wal_s3_bucket: pg-backup
note: 如果集群已经运行起来的,那可能需要 delete pg 的 operator pod, 然后 pg 的 pod 会重启。
check 一下下
进入一个 pg 的 pod 里面
# envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"
2022-03-02 10:08:15.281 - /scripts/postgres_backup.sh - I was called as: /scripts/postgres_backup.sh /home/postgres/pgdata/pgroot/data
2022-03-02 10:08:15.679 - /scripts/postgres_backup.sh - Cluster is in recovery, not running backup
# envdir "/run/etc/wal-e.d/env" wal-g backup-list
INFO: 2022/03/02 10:09:12.437445 No backups found
### 先忽略输出,说明方式对了。只不过还在 recovery 嘛(说明我进入到了备库节点,可以换一个进去试试)
# psql -tXqAc "select pg_is_in_recovery()"
t
-------
# envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"
2022-03-03 03:23:44.254 - /scripts/postgres_backup.sh - I was called as: /scripts/postgres_backup.sh /home/postgres/pgdata/pgroot/data
2022-03-03 03:23:44.932 - /scripts/postgres_backup.sh - producing a new backup
INFO: 2022/03/03 03:23:45.108109 Doing full backup.
INFO: 2022/03/03 03:23:45.208363 Calling pg_start_backup()
INFO: 2022/03/03 03:23:51.828511 Walking ...
INFO: 2022/03/03 03:23:51.829479 Starting part 1 ...
INFO: 2022/03/03 03:23:55.497596 Finished writing part 1.
INFO: 2022/03/03 03:23:56.387468 Starting part 2 ...
INFO: 2022/03/03 03:23:56.387568 /global/pg_control
INFO: 2022/03/03 03:23:56.394214 Finished writing part 2.
INFO: 2022/03/03 03:23:56.402528 Calling pg_stop_backup()
INFO: 2022/03/03 03:24:01.003815 Starting part 3 ...
INFO: 2022/03/03 03:24:01.030920 backup_label
INFO: 2022/03/03 03:24:01.031287 tablespace_map
INFO: 2022/03/03 03:24:01.031559 Finished writing part 3.
INFO: 2022/03/03 03:24:01.125669 Wrote backup with name base_000000100000000000000032
# envdir "/run/etc/wal-e.d/env" wal-g backup-list
name last_modified wal_segment_backup_start
base_0000000F000000000000001C 2022-03-03T01:56:33Z 0000000F000000000000001C
base_0000000F000000000000001E 2022-03-03T02:00:19Z 0000000F000000000000001E
base_0000000F0000000000000020 2022-03-03T02:10:18Z 0000000F0000000000000020
base_0000000F0000000000000022 2022-03-03T02:11:49Z 0000000F0000000000000022
base_0000000F0000000000000024 2022-03-03T02:20:17Z 0000000F0000000000000024
base_0000000F0000000000000026 2022-03-03T02:30:17Z 0000000F0000000000000026
base_0000000F0000000000000028 2022-03-03T02:40:21Z 0000000F0000000000000028
base_0000000F000000000000002A 2022-03-03T02:50:21Z 0000000F000000000000002A
base_00000010000000000000002E 2022-03-03T03:10:17Z 00000010000000000000002E
base_000000100000000000000030 2022-03-03T03:20:26Z 000000100000000000000030
base_000000100000000000000032 2022-03-03T03:24:01Z 000000100000000000000032
Clone
创建 2 个用于验证 clone 的库
### 创建 2 个库试试
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(3 rows)
postgres=# create database clone_test_1;
CREATE DATABASE
postgres=# create database clone_test_2;
CREATE DATABASE
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
--------------+----------+----------+-------------+-------------+-----------------------
clone_test_1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
clone_test_2 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(5 rows)
手动触发一次备份
# envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"
2022-03-07 08:24:27.748 - /scripts/postgres_backup.sh - I was called as: /scripts/postgres_backup.sh /home/postgres/pgdata/pgroot/data
2022-03-07 08:24:28.499 - /scripts/postgres_backup.sh - producing a new backup
INFO: 2022/03/07 08:24:28.629076 Doing full backup.
INFO: 2022/03/07 08:24:28.720848 Calling pg_start_backup()
INFO: 2022/03/07 08:24:32.295808 Walking ...
INFO: 2022/03/07 08:24:32.296847 Starting part 1 ...
INFO: 2022/03/07 08:24:36.499136 Finished writing part 1.
INFO: 2022/03/07 08:24:38.118588 Starting part 2 ...
INFO: 2022/03/07 08:24:38.118700 /global/pg_control
INFO: 2022/03/07 08:24:38.121875 Finished writing part 2.
INFO: 2022/03/07 08:24:38.127394 Calling pg_stop_backup()
INFO: 2022/03/07 08:24:41.387076 Starting part 3 ...
INFO: 2022/03/07 08:24:41.418115 backup_label
INFO: 2022/03/07 08:24:41.418362 tablespace_map
INFO: 2022/03/07 08:24:41.418678 Finished writing part 3.
INFO: 2022/03/07 08:24:41.821536 Wrote backup with name base_00000006000000000000002D
### 这就是刚才的备份了 `base_00000006000000000000002D 2022-03-07T08:24:41Z 00000006000000000000002D`
# envdir "/run/etc/wal-e.d/env" wal-g backup-list
name last_modified wal_segment_backup_start
base_000000010000000000000006 2022-03-03T09:42:33Z 000000010000000000000006
base_000000020000000000000009 2022-03-03T09:52:22Z 000000020000000000000009
base_00000002000000000000000B 2022-03-03T12:00:10Z 00000002000000000000000B
base_00000002000000000000000D 2022-03-04T00:00:14Z 00000002000000000000000D
base_00000002000000000000000F 2022-03-04T12:00:08Z 00000002000000000000000F
base_000000020000000000000011 2022-03-05T00:00:21Z 000000020000000000000011
base_000000030000000000000016 2022-03-05T12:00:20Z 000000030000000000000016
base_000000030000000000000018 2022-03-06T00:00:12Z 000000030000000000000018
base_00000003000000000000001A 2022-03-06T12:00:08Z 00000003000000000000001A
base_00000004000000000000001E 2022-03-07T00:00:17Z 00000004000000000000001E
base_00000006000000000000002D 2022-03-07T08:24:41Z 00000006000000000000002D
Clone
## 定义一下新的集群
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
name: clone-pg
namespace: pg-test
spec:
numberOfInstances: 3
patroni:
initdb:
data-checksums: "true"
encoding: UTF8
locale: en_US.UTF-8
pg_hba:
- hostssl all all 0.0.0.0/0 md5
- host all all 0.0.0.0/0 md5
postgresql:
version: "12"
resources:
limits:
cpu: 500m
memory: 1024Mi
requests:
cpu: 500m
memory: 1024Mi
teamId: clone
volume:
size: 128Gi
storageClass: rook-ceph-block
### 重点开始
clone:
# 可以不填写,填的话就填下面 s3_wal_path 中的 uuid, 其他的值没有测试过。
#uid: "ece6b770-4aef-4a17-b4a4-1b808c6f6f47"
# 需要恢复的集群的名称
cluster: "xxx-pg"
s3_endpoint: http://rook-ceph-rgw-my-store.rook-ceph.svc
s3_access_key_id: 8DPG4580SPN27TDG2RL7
s3_secret_access_key: TgKL0S7pgdcOo7SnF7S0h5pbT7XqCUaGMN1uKetX
s3_force_path_style: true
# 时间戳很关键, 是关于需要恢复到哪一个点的备份,这个就是上面测试手动触发的创建的备份点,注意一下时间戳的格式。
## https://github.com/zalando/postgres-operator/blob/master/docs/user.md#clone-from-s3
## Note, that a time zone is required for timestamp in the format of +00:00 which is UTC.
timestamp: "2022-03-07T08:24:41+00:00"
# 不清楚的话可以用 s5cmd 看看
## https://github.com/peak/s5cmd
s3_wal_path: "s3://pg-backup/spilo/xxx-pg/ece6b770-4aef-4a17-b4a4-1b808c6f6f47/wal/"
$ s5cmd --endpoint-url http://10.233.13.148 ls
2022/03/03 09:39:24 s3://pg-backup
hchen@node1:~$ s5cmd --endpoint-url http://10.233.13.148 ls s3://pg-backup/
DIR spilo/
hchen@node1:~$ s5cmd --endpoint-url http://10.233.13.148 ls s3://pg-backup/spilo/
DIR xxx-pg/
hchen@node1:~$ s5cmd --endpoint-url http://10.233.13.148 ls s3://pg-backup/spilo/xxx-pg/
DIR ece6b770-4aef-4a17-b4a4-1b808c6f6f47/
hchen@node1:~$ s5cmd --endpoint-url http://10.233.13.148 ls s3://pg-backup/spilo/xxx-pg/ece6b770-4aef-4a17-b4a4-1b808c6f6f47/
DIR wal/
hchen@node1:~$ s5cmd --endpoint-url http://10.233.13.148 ls s3://pg-backup/spilo/xxx-pg/ece6b770-4aef-4a17-b4a4-1b808c6f6f47/wal
DIR wal/
hchen@node1:~$ s5cmd --endpoint-url http://10.233.13.148 ls s3://pg-backup/spilo/xxx-pg/ece6b770-4aef-4a17-b4a4-1b808c6f6f47/wal/
DIR basebackups_005/
DIR wal_005/
验证一下
## clone_test_1, clone_test_2 都在,说明成功了吧?
# psql
psql (12.4 (Ubuntu 12.4-1.pgdg18.04+1))
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
--------------+----------+----------+-------------+-------------+-----------------------
clone_test_1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
clone_test_2 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(5 rows)