Postgresql HA cluster-Sync Rep+pg_rewind

This article explain how to build a data consistency capable Postgresql HA architecture by using Chained Cascading Replication.

First of all, async replication will cause 'time-delay data inconsistency' is expected and some type of applications can accept this kind of data inconsistency. Within secnonds, the data in the stand-by machine will become consistent with the master machine.

On the other hand, 'failure caused data inconsistency' may not be accpted by applications. After failover, an async replicated stand-by machine may contain a transaction commit WAL record which is NOT in the WAL records of the promoted master machine. The async stand-by machine may already send the transaction records to the customer thru read-only query.

Consider the following "parallel replication" topology:

A Master machine - MA1

A hot-standby machine connected to MA1 thru sync replication - SB1

Another hot-standby machine connected to MA1 thru async replication - SB2

If the below worst case scenario happen:

Here, T1, T2,… represent time point in the time line

T1. MA1 issue a transaction (xid: TX1) Commit

T2. MA1 Flush TX1 WAL to disk WAL records

T3. WAL-sender from MA1 to SB1 is corrupted and MA1 CANNOT send TX1 WAL records to SB1

T4. MA1 send TX1 WAL records to SB2 thru another WAL sender (async replication)

T5. TX1 is NOT committed in MA1 buffer memory and db storage, MA1 still wait for ack from SB1

T6. SB2 apply the TX1 WAL records and answered a read-only query

T7. MA1 fail

T8. some how, SB2 fail also

T9. Failover execute, SB1 become the new master, name it MA-SB1

The WAL-sender from MA1 to SB1 is corrupted at time T3, there is no commit record for transaction TX1 MA-SB1. Since SB1 is now the new master MA-SB1, SB2 should sync with MA-SB1. However, at time T6, SB2 apply the TX1 WAL records and answered a read-only query. In this case, 'failure caused data inconsistency'! Application need to take care and rectify this kind of data inconsistency

Consider the following "chained cascading replication" topology:

A Master machine - MA1

A hot-standby machine connected to MA1 thru sync replication - SB1

Another hot-standby machine connected to SB1 thru cascading async replication - SB2

The below scenario happen:

Here, T1, T2,… represent time point in the time line

T1. MA1 issue a transaction (xid: TX1) Commit

T2. MA1 Flush TX1 WAL to disk WAL records

T3. WAL-sender from MA1 to SB1 is corrupted and MA1 CANNOT send TX1 WAL records to SB1

T4. SB2 can only receive WAL records from SB1 (async replication), on disk WAL record in SB2 is a SUBSET of SB1 on disk WAL records.

T5. TX1 is NOT committed in MA1 buffer memory and db storage, MA1 still wait for ack from SB1

T6. TX1 WAL records DOES NOT EXIST in SB2, NO record selected for the read-only query about TX1

T7. MA1 fail

T8. Failover execute, SB1 become the new master, name it MA-SB1, on disk WAL record in SB2 is still a SUBSET of SB1 on disk WAL records.

T9. Upgrade SB2 to sync rep from MA-SB1

The 'failure caused data inconsistency' happened in "paralle replicatiion" CANNOT happen in "chained cascading replication" topology.

To make use of postgresql hot-standy HA, a load-balance component is needed. For postgresql, pgpool2 is a popular one. I will talk about it in the next article.

Postgresql HA cluster-Sync Rep+pg_rewind - part 1

http://my.oschina.net/u/2399919/blog/469330

转载于:https://my.oschina.net/u/2399919/blog/471459