Data migration Approaches:

Model1:

wKioL1MvAd7BuNS0AAENJa8Vkh4553.jpg


Advantages:

Simple


Disadvantages:

Business disrupted

Migration needs to complete in limited time


Model2:

wKiom1MvAiqgry3-AAHNBfFMhgA791.jpg


Require each table to have an additional GMT_MODIFIED column


Advantages:

Shorter down time, data can be read from original DB when performing full data replication


Disadvantages

Hard to predict whether new DB can fully withstand all read/write load. If after switching to new DB, and it is overloaded, the result can be disastrous


Timestamp based scanning creates a lot of pressure on DB. If scan interval is too large, data is out of sync.


Model3:

wKiom1MvAlKiPNS-AAF_gE0oh18083.jpg


Advantages:

Low data sync latency

No downtime


Disadvantages:

Intrusion into business logic to write to both database


Full replication and dual write occurs the same time, data in new DB may have conflict. We need to logically enforce that only records with newer timestamp are written to new DB


Date integrity may be comprosied: writing to old/new database need to be in same transaction. XA is very costly.


Incremental data replication is more widely adopted

For oracle it is harder to replicate journal, so use a two phase protocol

Write to a log table

Write to data table

Write data record id to log table

Asynchronously retrieve log based on id and replicate to new DB


Following modules are needed for data migration program

Control Module

Zookeeper

Timer

Task configuration


Full migration

Data fetch module

Customer defined functions

Data write module


Incremental migration

Data fetch module

Customer defined functions

Data write module

Incremental trigger

Data validation module



Model4:

wKiom1MvAmeT8DdQAAHFglsVs3Q646.jpg

Step 1:

Stop updated business data during full migration in local


Step 2:

Start full migration, need to consider exceptional cases where the target database   server is down unexpectedly during migration. Solution is to store id range to replicate in zookeeper, so that migration progress can be queried from zookeeper on target server recovery


Step 3:

Start incremental migration, replicate local stored update to target database server until all local updates are consumed


Step 4:

Start data validation to ensure that migrated data and original data are the same (no missing or duplicated data). We can do full validation or sample validation based on importance of data. After validation, we can tentatively switch some of the read to new DB


Step 5:

Swith to use the new DB. After switch, write will be applied to the new DB and it is very hard to rollback


Multi-threaded migration consideration

1. Mulit-threaded migration increases write speed, but consumes more DB resources. Number of reades need to be dynamically adjusted based on time (business load)


2. Multi-threaded migration may cause deadlock, need to ensure that per write thread  per table.


3. Data recovery becomes a challenge as it is more difficult to find out data failure point. The solution is to use a central dispatcher to allocate data range to migrate to different worker thread.