由于最近公司有需求,需要将oracle数据实时同步到hdfs和olap数据库,当时想到的是flink cdc来实现,但是每一个同步任务都需要写java程序去实现,所有想写一个工具去做,通过参数的方式来灵活配置采集的数据源。
参数说明
{
"flinkConf":{
"sourceType": "oracle",
"parallelism":1,
"checkpointPath":"hdfs://xxxxx",
"checkpointInterval": 1800000
},
"connect":{
"ip":"IP地址",
"port":1521,
"username":"用户名",
"password":"",
"dbName":"wind11",
"schema":"SCHEMA",
"startupMode":"LATEST_OFFSET",
"specificOffsetFile":"",
"specificOffsetPos":"",
"startupTimestampMillis":""
},
"tasks":[
{
"tableName":"StringValue",
"targetType":"starrocks",
"sourceField":["sField1","sField2","sField3"],
"targetConfig":{
"jdbcUrl":"test",
"loadUrl":"test",
"user":"test",
"password":"",
"dbName":"test",
"tableName":"test"
}
},
{
"tableName":"StringValue",
"targetType":"hdfs",
"sourceField":["sField1","sField2","sField3"],
"targetConfig":{
"targetPath":"hdfs://xxxxx",
"partition":true
}
}
]
}
flinkConf
Flink相关的参数配置,包含并发数、 checkpoint周期、checkpoint路径,以及source类型,目前支持oracle
、 mysql
和kafka
sourceType
数据源类型,支持oracle、mysql、kafka,必填parallelism
flink任务并发数,必填checkpointPath
checkpoint路径,选填,默认从flink-conf.yaml读取checkpoint路径配置checkpointInterval
checkpoint 周期,必填
connect
数据源相关的配置
password
是加密后的密文,不能传明文ip
数据库服务IP,sourceType
等于oracle
或mysql
时必填port
端口,sourceType
等于oracle
或mysql
时必填username
用户名,sourceT