[postgres@db1 .ssh]$ ssh-copy-id -i id_rsa.pub db2

The authenticity of host 'db2 (192.168.100.38)' can't be established.

RSA key fingerprint is 70:02:66:0a:f5:3a:62:52:55:a2:98:b1:1c:d7:6c:73.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'db2,192.168.100.38' (RSA) to the list of known hosts.

postgres@db2 's password:

Now try logging into the machine, with "ssh 'db2'", and check in:

 

  .ssh/authorized_keys

 

to make sure we haven't added extra keys that you weren't expecting.

 

验证:

[postgres@db1 .ssh]$ ssh db2 date

Thu Dec 19 01:25:48 EST 2013

 

【当然也可通过说明文档中指出的使用walmgr3 –ssh-keygen 命令做免密码登录认证】

2.5.3 配置并启动主库

[root@db1 ~]# cd /opt/pg93/

[root@db1 pg93]# mkdir data

[root@db1 pg93]# chown postgres:postgres data/

[root@db1 pg93]# su - postgres

[postgres@db1 ~]$ initdb -D /opt/pg93/data/

[postgres@db1 data]$ vi postgresql.conf

listen_addresses = '*'

port = 5432

logging_collector = on

log_filename = 'postgresql.log'

log_connections = on

log_disconnections = on

[postgres@master  data]$ pg_ctl start

2.5.4 配置walmgr3

主端配置文件:

[postgres@db1 pg93]$ cat wal-master.ini

[walmgr]

job_name             = wal-master

logfile              = /opt/pg93/data/log/%(job_name)s.log

pidfile              = /opt/pg93/data/pid/%(job_name)s.pid

use_skylog           = 0

 

master_db            = port=5432 host=db1 dbname=template1

master_data          = /opt/pg93/data

master_config        = /opt/pg93/data/postgresql.conf

master_bin           = /opt/pg93/bin

 

# set this only if you can afford database restarts during setup and stop.

# master_restart_cmd   = pg_ctlcluster 9.1 main restart

 

slave = db2

slave_config = /opt/pg93/wal-slave.ini

 

walmgr_data          = db2:/opt/pg93/slave_walmanager/backup

 

completed_wals       = %(walmgr_data)s/logs.complete

partial_wals         = %(walmgr_data)s/logs.partial

full_backup          = %(walmgr_data)s/data.master

config_backup        = %(walmgr_data)s/config.backup

 

# syncdaemon update frequency

loop_delay           = 10.0

# use record based shipping available since 8.2

use_xlog_functions   = 0

 

# pass -z to rsync, useful on low bandwidth links

compression          = 0

 

# keep symlinks for pg_xlog and pg_log

keep_symlinks        = 1

 

# tell walmgr to set wal_level to hot_standby during setup

hot_standby          = 1

 

# periodic sync

#command_interval     = 600

#periodic_command     = /var/lib/postgresql/walshipping/periodic.sh

 

备端配置文件:

[postgres@db2  pg93]$ cat wal-slave.ini

[walmgr]

job_name             = wal-standby

logfile              = /opt/pg93/data/log/%(job_name)s.log

use_skylog           = 0

 

slave_data           = /opt/pg93/data

slave_bin            = /opt/pg93/bin

slave_stop_cmd       = /opt/pg93/bin/pg_ctl -D /opt/pg93/data stop

slave_start_cmd      = /opt/pg93/bin/pg_ctl -D /opt/pg93/data start

#slave_config_dir     = /tmp/test_slave

slave_config_dir     = /opt/pg93/data

 

walmgr_data          = /opt/pg93/slave_walmanager/backup

completed_wals       = %(walmgr_data)s/logs.complete

partial_wals         = %(walmgr_data)s/logs.partial

full_backup          = %(walmgr_data)s/data.master

config_backup        = %(walmgr_data)s/config.backup

 

backup_datadir       = no

keep_backups         = 0

# archive_command =

 

# primary database connect string for hot standby -- enabling

# this will cause the slave to be started in hot standby mode.

primary_conninfo     = host=db1 user=postgres port=5432

 

配置复制:

[postgres@db1 pg93]$ walmgr3 wal-master.ini setup

2013-12-20 02:05:19,289 32538 INFO Configuring WAL archiving

2013-12-20 02:05:19,294 32538 INFO Sending SIGHUP to postmaster

LOG:  received SIGHUP, reloading configuration files

LOG:  parameter "archive_command" changed to "/opt/skytools/bin/walmgr3 /opt/pg93/wal-master.ini xarchive %p %f"

2013-12-20 02:05:19,491 32538 INFO Done

此时配置文件postgresql.conf中如下参数发生变化:

wal_level = 'hot_standby'

archive_mode = 'on'

archive_command = '/opt/skytools/bin/walmgr3 /opt/pg93/data/wal-master.ini xarchive %p %f'

并且,在db2中生成了备份目录:

[postgres@db2  pg93]$ ll slave_walmanager/backup/

total 16

drwxrwxr-x 2 postgres postgres 4096 Dec 19 22:04 config.backup

drwxrwxr-x 2 postgres postgres 4096 Dec 19 22:04 data.master

drwxrwxr-x 2 postgres postgres 4096 Dec 19 22:04 logs.complete

drwxrwxr-x 2 postgres postgres 4096 Dec 19 22:04 logs.partial

 

修改max_wal_senders参数:

max_wal_senders = 3

在pg_hba.conf中加入以下两条:

[postgres@db1 data]$ vi pg_hba.conf

host    postgres      postgres        192.168.100.0/24        trust

host    replication     postgres        192.168.100.0/24        trust

 

执行基础备份:

[postgres@db1 pg93]$ walmgr3 wal-master.ini backup

2013-12-20 02:05:58,339 28752 INFO Backup lock obtained.

2013-12-20 02:05:58,347 32547 INFO Execute SQL: select pg_start_backup('FullBackup'); [port=5432 host=db1 dbname=template1]

2013-12-20 02:05:59,015 32550 INFO {count: 1, duration: 0.518224000931}

2013-12-20 02:06:00,098 32547 INFO Checking tablespaces

2013-12-20 02:06:00,099 32547 INFO pg_log does not exist, skipping

2013-12-20 02:06:00,274 32547 INFO Backup conf files from /opt/pg93/data

2013-12-20 02:06:00,648 28826 INFO First useful WAL file is: 00000003000000000000002B

2013-12-20 02:06:00,660 32547 INFO Execute SQL: select pg_stop_backup(); [port=5432 host=db1 dbname=template1]

2013-12-20 02:06:01,313 32564 INFO {count: 1, duration: 0.562467098236}

2013-12-20 02:06:01,681 32568 INFO {count: 1, duration: 0.282044887543}

2013-12-20 02:06:02,876 28878 INFO Backup lock released.

2013-12-20 02:06:02,884 32547 INFO Full backup successful

 

Backup过程会在walmgr_data中生成基础备份数据,如下:

[postgres@db2  pg93]$ ls slave_walmanager/backup/

config.backup/ data.master/   logs.complete/ logs.partial/ 

[postgres@db2  pg93]$ ls slave_walmanager/backup/

config.backup  data.master  logs.complete  logs.partial

[postgres@db2  pg93]$ ls slave_walmanager/backup/data.master/

PG_VERSION    backup_label.old  global  pg_clog       pg_notify  pg_snapshots  pg_stat_tmp  pg_twophase  pid

backup_label  base              log     pg_multixact  pg_serial  pg_stat       pg_subtrans  pg_xlog      recovery.conf

 

在备端执行恢复:

[postgres@db2  pg93]$ walmgr3 wal-slave.ini restore

2013-12-20 02:09:38,190 28909 WARNING backup_datadir is disabled, deleting old data dir

2013-12-20 02:09:38,191 28909 INFO Move /opt/pg93/slave_walmanager/backup/data.master to /opt/pg93/data

2013-12-20 02:09:38,197 28909 INFO Write /opt/pg93/data/recovery.conf

2013-12-20 02:09:38,198 28909 INFO Restoring configuration files

2013-12-20 02:09:38,200 28909 INFO Starting postmaster: /opt/pg93/bin/pg_ctl -D /opt/pg93/data start

server starting

[postgres@db2  pg93]$ LOG:  database system was interrupted; last known up at 2013-12-20 02:05:58 EST

LOG:  entering standby mode

2013-12-20 02:09:38,494 28916 INFO 00000003.history: not found (ignored)

2013-12-20 02:09:38,576 28917 INFO 00000003000000000000002B: Found

2013-12-20 02:09:38,603 28917 INFO {count: 1}

LOG:  restored log file "00000003000000000000002B" from archive

LOG:  redo starts at 0/2B000028

LOG:  consistent recovery state reached at 0/2B0000F0

2013-12-20 02:09:38,776 28921 INFO 00000003000000000000002C: not found (ignored)

LOG:  fetching timeline history file for timeline 3 from primary server

LOG:  started streaming WAL from primary at 0/2C000000 on timeline 3

 

在备库上生成的recovery.conf内容如下:

[postgres@db2  pg93]$ cat data/recovery.conf

restore_command = '/opt/skytools/bin/walmgr3 /opt/pg93/wal-slave.ini xrestore %f "%p" %r'

standby_mode = 'on'

trigger_file = '/opt/pg93/slave_walmanager/backup/logs.complete/STOP'

primary_conninfo = 'host=db1 user=postgres port=5432'

archive_cleanup_command = '/opt/pg93/bin/pg_archivecleanup /opt/pg93/slave_walmanager/backup/logs.complete %r'

 

【至此,流复制已经实现。 可将db1与db2的角色互换重复以上过程进行配置,便可实现db1与db2之间互相切换。】

2.5.5 验证

主端:

[postgres@db1 pg93]$ ps -ef | grep post

postgres   605     1  0 02:48 pts/1    00:00:00 /opt/pg93/bin/postgres

postgres   607   605  0 02:48 ?        00:00:00 postgres: checkpointer process  

postgres   608   605  0 02:48 ?        00:00:00 postgres: writer process  

postgres   609   605  0 02:48 ?        00:00:00 postgres: wal writer process  

postgres   610   605  0 02:48 ?        00:00:00 postgres: autovacuum launcher process  

postgres   611   605  0 02:48 ?        00:00:00 postgres: archiver process   last was 000000030000000000000030.00000028.backup

postgres   612   605  0 02:48 ?        00:00:00 postgres: stats collector process  

postgres   660   605  0 02:50 ?        00:00:00 postgres: wal sender process postgres 192.168.100.38(56960) streaming 0/310000C8

 

[postgres@db1 pg93]$ createdb pgbench

[postgres@db1 pg93]$ pgbench -i -s 1 pgbench

NOTICE:  table "pgbench_history" does not exist, skipping

NOTICE:  table "pgbench_tellers" does not exist, skipping

NOTICE:  table "pgbench_accounts" does not exist, skipping

NOTICE:  table "pgbench_branches" does not exist, skipping

creating tables...

100000 of 100000 tuples (100%) done (elapsed 0.20 s, remaining 0.00 s).

vacuum...

set primary keys...

done.

[postgres@db1 pg93]$ psql

psql (9.3.2)

Type "help" for help.

 

postgres=# select pg_current_xlog_location();

 pg_current_xlog_location

--------------------------

 0/31CB94B0

(1 row)

 

 

备端:

[postgres@db2  pg93]$ ps -ef | grep post

root     24384 24063  0 Dec19 pts/1    00:00:00 su - postgres

postgres 24385 24384  0 Dec19 pts/1    00:00:00 -bash

postgres 29539     1  0 02:50 pts/1    00:00:00 /opt/pg93/bin/postgres -D /opt/pg93/data

postgres 29540 29539  0 02:50 ?        00:00:00 postgres: startup process   recovering 000000030000000000000031

postgres 29544 29539  0 02:50 ?        00:00:00 postgres: checkpointer process         

postgres 29545 29539  0 02:50 ?        00:00:00 postgres: writer process               

postgres 29546 29539  0 02:50 ?        00:00:00 postgres: stats collector process      

postgres 29548 29539  0 02:50 ?        00:00:00 postgres: wal receiver process   streaming 0/310000C8

 

[postgres@db2  pg93]$ psql pgbench

psql (9.3.2)

Type "help" for help.

 

pgbench=# \d+

                          List of relations

 Schema |       Name       | Type  |  Owner   |  Size   | Description

--------+------------------+-------+----------+---------+-------------

 public | pgbench_accounts | table | postgres | 13 MB   |

 public | pgbench_branches | table | postgres | 16 kB   |

 public | pgbench_history  | table | postgres | 0 bytes |

 public | pgbench_tellers  | table | postgres | 16 kB   |

(4 rows)

 

pgbench=# select pg_last_xlog_receive_location();

 pg_last_xlog_receive_location

-------------------------------

 0/31CB94B0

(1 row)

 

pgbench=# select pg_last_xlog_replay_location();

 pg_last_xlog_replay_location

------------------------------

 0/31CB94B0

(1 row)

三、管理篇

【接2.2节环境进行以下实验】

3.1、变更拓扑关系

3.1.1 change-provider

【改变一个节点的provider】

 

当前拓扑关系如下:

[postgres@londiste1 londiste3]$ londiste3 db1.ini status

Queue: replika   Local node: node1

 

node1 (root)

  |                           Tables: 1/0/0

  |                           Lag: 15s, Tick: 439

  +--: node2 (branch)

  |  |                        Tables: 1/0/0

  |  |                        Lag: 15s, Tick: 439

  |  +--: node4 (branch)

  |                           Tables: 1/0/0

  |                           Lag: 15s, Tick: 439

  +--: node3 (branch)

     |                        Tables: 1/0/0

     |                        Lag: 15s, Tick: 439

     +--: node5 (branch)

                              Tables: 1/0/0

                              Lag: 15s, Tick: 439

 

将node4的provider更改为node3,如下:

[postgres@londiste1 londiste3]$ londiste3 db4.ini change-provider --provider=node3

2013-09-21 10:14:40,353 25458 INFO [node4] Consumer londiste_db4 tagged as paused

2013-09-21 10:14:40,366 25458 INFO Waiting for worker to accept

2013-09-21 10:14:41,371 25458 INFO Consumer 'londiste_db4' on node 'node4' paused

2013-09-21 10:14:41,386 25458 INFO [node3] Location registered

2013-09-21 10:14:41,409 25458 INFO [node3] Subscriber registered: node4

2013-09-21 10:14:41,416 25458 INFO [node4] Consumer provider node set to : node3

2013-09-21 10:14:41,424 25458 INFO [node4] Consumer londiste_db4 tagged as resumed

2013-09-21 10:14:41,431 25458 INFO Waiting for worker to accept

2013-09-21 10:14:42,436 25458 INFO Consumer 'londiste_db4' on node 'node4' resumed

2013-09-21 10:14:42,448 25458 INFO [node2] Subscriber unregistered: node4

[postgres@londiste1 londiste3]$ londiste3 db1.ini status

Queue: replika   Local node: node1

 

node1 (root)

  |                           Tables: 1/0/0

  |                           Lag: 1s, Tick: 448

  +--: node2 (branch)

  |                           Tables: 1/0/0

  |                           Lag: 1s, Tick: 448

  +--: node3 (branch)

     |                        Tables: 1/0/0

     |                        Lag: 1s, Tick: 448

     +--: node4 (branch)

     |                        Tables: 1/0/0

     |                        Lag: 1s, Tick: 448

     +--: node5 (branch)

                              Tables: 1/0/0

                              Lag: 1s, Tick: 448

{node3已顺利接管node4}

 

将node5更改为node4的子节点,如下:

[postgres@londiste1 londiste3]$ londiste3 db5.ini change-provider --provider=node4

2013-09-21 10:16:11,661 25488 INFO [node5] Consumer londiste_db5 tagged as paused

2013-09-21 10:16:11,672 25488 INFO Waiting for worker to accept

2013-09-21 10:16:12,677 25488 INFO Consumer 'londiste_db5' on node 'node5' paused

2013-09-21 10:16:12,693 25488 INFO [node4] Location registered

2013-09-21 10:16:12,713 25488 INFO [node4] Subscriber registered: node5

2013-09-21 10:16:12,720 25488 INFO [node5] Consumer provider node set to : node4

2013-09-21 10:16:12,731 25488 INFO [node5] Consumer londiste_db5 tagged as resumed

2013-09-21 10:16:12,738 25488 INFO Waiting for worker to accept

2013-09-21 10:16:13,743 25488 INFO Consumer 'londiste_db5' on node 'node5' resumed

2013-09-21 10:16:13,756 25488 INFO [node3] Subscriber unregistered: node5

[postgres@londiste1 londiste3]$ londiste3 db1.ini status

Queue: replika   Local node: node1

 

node1 (root)

  |                           Tables: 1/0/0

  |                           Lag: 3s, Tick: 451

  +--: node2 (branch)

  |                           Tables: 1/0/0

  |                           Lag: 3s, Tick: 451

  +--: node3 (branch)

     |                        Tables: 1/0/0

     |                        Lag: 3s, Tick: 451

     +--: node4 (branch)

        |                     Tables: 1/0/0

        |                     Lag: 3s, Tick: 451

        +--: node5 (branch)

                              Tables: 1/0/0

                              Lag: 3s, Tick: 451

{node5已成为node4的子节点}

 

将node4更改为node2的子节点,如下:

[postgres@londiste1 londiste3]$ londiste3 db4.ini change-provider --provider=node2

2013-09-21 10:17:20,633 25517 INFO [node4] Consumer londiste_db4 tagged as paused

2013-09-21 10:17:20,643 25517 INFO Waiting for worker to accept

2013-09-21 10:17:21,647 25517 INFO Consumer 'londiste_db4' on node 'node4' paused

2013-09-21 10:17:21,660 25517 INFO [node2] Location registered

2013-09-21 10:17:21,683 25517 INFO [node2] Subscriber registered: node4

2013-09-21 10:17:21,691 25517 INFO [node4] Consumer provider node set to : node2

2013-09-21 10:17:21,701 25517 INFO [node4] Consumer londiste_db4 tagged as resumed

2013-09-21 10:17:21,709 25517 INFO Waiting for worker to accept

2013-09-21 10:17:22,713 25517 INFO Consumer 'londiste_db4' on node 'node4' resumed

2013-09-21 10:17:22,725 25517 INFO [node3] Subscriber unregistered: node4

[postgres@londiste1 londiste3]$ londiste3 db1.ini status

Queue: replika   Local node: node1

 

node1 (root)

  |                           Tables: 1/0/0

  |                           Lag: 45s, Tick: 452

  +--: node3 (branch)

  |                           Tables: 1/0/0

  |                           Lag: 45s, Tick: 452

  +--: node2 (branch)

     |                        Tables: 1/0/0

     |                        Lag: 45s, Tick: 452

     +--: node4 (branch)

        |                     Tables: 1/0/0

        |                     Lag: 45s, Tick: 452

        +--: node5 (branch)

                              Tables: 1/0/0

                              Lag: 45s, Tick: 452

{发现node4会携带其子节点受node2接管}

3.1.2 takeover

【使A节点直接接管B节点,接上小节】

 

使node3接管node5,如下:

[postgres@londiste1 londiste3]$ londiste3 db3.ini takeover node5

2013-09-21 10:19:58,929 25566 INFO old: node5

2013-09-21 10:19:58,998 25566 INFO [node5] Consumer londiste_db5 tagged as paused

2013-09-21 10:19:59,008 25566 INFO Waiting for worker to accept

2013-09-21 10:20:00,013 25566 INFO Consumer 'londiste_db5' on node 'node5' paused

2013-09-21 10:20:00,024 25566 INFO [node3] Location registered

2013-09-21 10:20:00,043 25566 INFO [node3] Subscriber registered: node5

2013-09-21 10:20:00,055 25566 INFO [node5] Consumer provider node set to : node3

2013-09-21 10:20:00,066 25566 INFO [node5] Consumer londiste_db5 tagged as resumed

2013-09-21 10:20:00,073 25566 INFO Waiting for worker to accept

2013-09-21 10:20:01,077 25566 INFO Consumer 'londiste_db5' on node 'node5' resumed

2013-09-21 10:20:01,091 25566 INFO [node4] Subscriber unregistered: node5

[postgres@londiste1 londiste3]$ londiste3 db1.ini status

Queue: replika   Local node: node1

 

node1 (root)

  |                           Tables: 1/0/0

  |                           Lag: 2s, Tick: 459

  +--: node2 (branch)

  |  |                        Tables: 1/0/0

  |  |                        Lag: 2s, Tick: 459

  |  +--: node4 (branch)

  |                           Tables: 1/0/0

  |                           Lag: 2s, Tick: 459

  +--: node3 (branch)

     |                        Tables: 1/0/0

     |                        Lag: 2s, Tick: 459

     +--: node5 (branch)

                              Tables: 1/0/0

                              Lag: 2s, Tick: 459

{已顺利接管}