DataX-Migration数据库迁移,数据库同步实现解决方案

DataX-Migration

DataX-Migration is Yxt (Yunxuetang) Full Database Migration Tool based on Alibaba DataX 3.0. Support Database Migration among Mysql, Oracle, SqlServer, PostgreSql. And support where condition when migration.
DataX-Migration云学堂开源的基于阿里巴巴DataX 3.0的数据库迁移工具。支持对Mysql,Oracle,SqlServer, PostgreSql之间的相互迁移, 支持迁移时带where查询条件,并生成迁移数据报表。

DataX是什么?

DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
DataX 在阿里巴巴集团内被广泛使用,承担了所有大数据的离线同步业务,并已持续稳定运行了6年之久。目前每天完成同步8w多道作业,每日传输数据量超过300TB。

DataX-Framework
关于更详细的介绍请看这里:https://github.com/alibaba/DataX/wiki/DataX-Introduction

为什么还需要DataX-Migration

DataX专注于对数据的同步,它使用脚本以及可配置的方式,以一个个独立的脚本任务,非常方便地对单表的数据进行同步操作。但我们需要更加智能或自动的方式同步整个数据库,所以我们对DataX进行了包装,以更方便地进行整个数据库的迁移工作。

DataX-Migration的功能

DataX-Migration 能根据用户配置数据库表tables的查询条件,生成这些数据库表的单独的DataX json配置,然后启动DataX的脚本来开始这些表的数据迁移,并生成相应的cvs报表。当表的数量过多时,可以配置切分策略来划分出多个线程来同时做迁移已加快迁移数据。

DataX-Migration Report

Support Data Channels

DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图,详情请点击:DataX数据源参考指南

类型数据源Reader(读)Writer(写)文档
RDBMS 关系型数据库MySQL
Oracle
SQLServer
PostgreSQL
DRDS
通用RDBMS(支持所有关系型数据库)
阿里云数仓数据存储ODPS
ADS
OSS
OCS
NoSQL数据存储OTS
Hbase0.94
Hbase1.1
Phoenix4.x
MongoDB
Hive
无结构化数据存储TxtFile
FTP
HDFS
Elasticsearch

Datax-Migration目前对关系型数据库的直接使用是基本没有问题,至于非关系型数据库还需验证。

Quick Start

1. download [DataX可以运行的bin下载地址](http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz)
2. git clone https://github.com/Jawf/datax-migration.git (也可从下面的链接直接下载)
3. mvn clean install
4. copy target/datax-migration.jar and target/datax-migration_lib to datax/ home directory.
5. open the datax-migration.jar, edit the config.properties, config migration db information source/target url, dbname, user, password, etc.
6. open the datax-migration.jar, edit the job/jobtemplate.json accordingly, default it migration from mysqlreader->mysqlwriter
7. java -jar datax-migration.jar

DataX-Migration可以运行的bin下载地址
DataX可以运行的bin下载地址

Special Function

java -jar datax-migration.jar json #Only generate json files config for all tables.
java -jar datax-migration.jar report #Only generate cvs report to view migration status.
java -jar datax-migration.jar run #Only run the migration and generate the cvs reports. Before run this, need generate json config files for all tables first.

Config.properties Detail Properties

Properties NameMandatory ConfigValue SampleRemark
source.db.url*jdbc:mysql://192.168.0.188:3306/sourcedbname?useUnicode=true&characterEncoding=UTF-8迁移源数据库连接url
source.db.name*sourcedbname源数据库名
source.db.username*username源数据库连接用户名
source.db.password*password源数据库连接密码
target.db.url*jdbc:mysql://192.168.0.189:3306/targetdbname?useUnicode=true&characterEncoding=UTF-8迁移目标数据库名
target.db.name*targetdbname目标数据库名
target.db.username*username目标数据库连接用户名
target.db.password*password目标数据库连接密码
source.db.global.where.clauseorgId=‘410e7127-d969-4e0b-8326-4828182363cc’global where clause to filter the migration data, the clause also be used in get status of report, ensure it able to be run in source and target db
source.db.global.where.second.clauseuserid in (select id from CORE_USERPROFILE where orgid=‘d7f8dffb-8ae9-4a97-857b-59f395942781’)if the source table contain the column in where clause, will use the first where clause and ignore the second. if the source table does not contain the column in the where clause, but contain column in the second clause, will use the second clause and igonre the first clause. if the source table does not contain both where column, will ignore both
migration.query.target.tables.sql*select ut.table_name from information_schema.tables ut where ut.table_schema=‘targetdbname’ and ut.table_type=‘base table’target db query sql: select migration tables
migration.query.target.table.columns.sql*select column_name from information_schema.columns t where table_schema=‘targetdbname’ and table_name=’{0}’target db query sql: select migration table columns
migration.query.target.table.primarykeys.sql*select column_name from information_schema.columns t where column_key=‘pri’ and table_name=’{0}’#target db query sql: select migration table primary keys
migration.query.source.tables.status.sql*select ut.table_name,(ut.data_length+ut.index_length)/1024/1024 as size_MB, ut.table_rows from information_schema.tables ut where ut.table_schema=‘targetdbname’ and ut.table_type=‘base table’ order by size_MB desc;must contain 1:tablename,2:size,3:numOfRows. And must order by size desc.
migration.datax.channel.multipletruemutiple channel used within one job to speed the migration, 2 channel will open 2*5 thread for one job. Caution: Mutiple channels may able to cause records consistency.
migration.datax.channel.2channels.records.over1000000if migration records more than this value, will use 2 channel in DataX json config. Caution: Mutiple channels may able to cause records consistency.
migration.error.continuetrueif got error whether terminate the running thread
migration.ingore.tablesemptyconfig ingore tables to ignore migration. empty=ignore none
migration.ingore.bigtables.size.than.mb1define the table size bigger than the value and ignore to migration. 1=igonre all tables that size > 1MB
migration.jobthread.split.typesizejob thread to group a number of tables in thread by split type, available value: index:tables list index in the cvs reports, size:table size
migration.jobthread.split.maxcount40job thread max tables, if between size:20-10 got 60 tables, will be grouped to 40,20. similar for split by index numbers. Adjust this value according to the big table size to encrease the migration speed.
migration.jobthread.split.tablesize.mb“40000,30000,20000,10000,5000,1000,500,200,100,50,20,10,1,0.4375,0.25,0.1875,0.125,0.0625”size unit is MB, split the table groups by table size, enabled when type=size
migration.jobthread.split.indexes“0,1,2,5,10,50,60,90,100,200,300,310”split the table groups by index, enabled when type=index

Feedback & Bug Report

欢迎使用,或加入我们使其变得更加完善。

  • 问题可以直接 issue 我们
  • Email: homcto@gmail.com
  • Wechat: jawfneo
  • 目前此项目已经被不少同学使用过了,也积赞了一些star,希望同学们多提供宝贵都意见或提交代码,多Pull Request

FAQ

1. config.properties 里面需要,替换sourcedbname, targetdbname,要用搜索替换:)   已有同学在此踩坑了..
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值