该文章 介绍安装dataX,以及Oracle 到 pg 库,遇到的一些坑。
1.前提条件:
1.1 JDK 1.8以上
这个比较简单,百度一大堆
1.2 安装python2
查看是否安装python:
python -V
没有的话,执行下边:
yum install python2
建立软连接:
https://blog.csdn.net/hzp666/article/details/127368103
cd /usr/bin
ln -s /usr/local/python2/bin/python2.7 /usr/bin/python2
验证:python2
出现下图则安装成功:
ps: (之前试了 下载python包,然后 make & make install ,
总是报错 : “段错误 (核心已转储)./python -E -S -m sysconfig --generate-posix-vars”)
然后试了很多方法,都没解决.........手里握紧了一把草!!!
)
2.安装datax
下载
wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
解压
tar -zxvf datax.tar.gz
验证:
cd /opt/datax/bin
python2 datax.py /opt/datax/job/job.json
报错记录:
安装 datax 提示[/usr/local/datax/plugin/reader/._drdsreader/plugin.json] 不存在. 请检查您的配置文件.
解决办法:需要删除隐藏文件 (重要)rm -rf /opt/datax/plugin/*/._*
报错信息:datax报错:org.postgresql.util.PSQLException: 不支援 10 验证类型
原因:pg 数据库驱动不对
解决办法:下载自己pg库对应版本的驱动,地址如下:
然后把下载的驱动替换对应的目录(目录位置后边会讲)下,旧版本的驱动,(一般是jdbc4 结尾的文件):
替换驱动的目录:
这里我的 dataX安装目录是 /opt/datax,
ps: 这里datax 是把读取和 写入分开的,即两个不同模块。所以要看datax报错日志,是读取报错还是写入报错:
如果是 读取有问题: /opt/datax/plugin/reader/postgresqlreader/libs
如果是写入有问题:/opt/datax/plugin/writer/postgresqlwriter/libs
dataX json Oracle到 pg库的 脚本示例1:
"job": {
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [{
"reader": {
"name": "oraclereader",
"parameter": {
"username": "*****",
"password": "*******",
"column": [
"ID" ,
"name",
"age"
],
"connection": [{
"table": [
"库名.表名"
],
"jdbcUrl": [
"jdbc:oracle:thin:@10.51.7.7:1521/库名"
]
}]
}
},
"writer": {
"name": "postgresqlwriter",
"parameter": {
"username": "*****",
"password": "*****",
"column": [
"ID" ,
"name",
"age"
],
"preSql": ["truncate table 库名.表名"],
"connection": [{
"jdbcUrl": "jdbc:postgresql://10.31.7.7:5432/库名",
"table": ["库名.表名"]
}]
}
}
}]
}
dataX json Oracle到 pg库的 脚本示例2(列名直接用 * 替代):
"job": {
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [{
"reader": {
"name": "oraclereader",
"parameter": {
"username": "*****",
"password": "*******",
"column": [
"*"
],
"connection": [{
"table": [
"库名.表名"
],
"jdbcUrl": [
"jdbc:oracle:thin:@10.51.7.7:1521/库名"
]
}]
}
},
"writer": {
"name": "postgresqlwriter",
"parameter": {
"username": "*****",
"password": "*****",
"column": [
"*"
],
"preSql": ["truncate table 库名.表名"],
"connection": [{
"jdbcUrl": "jdbc:postgresql://10.31.7.7:5432/库名",
"table": ["库名.表名"]
}]
}
}
}]
}
datax json Oracle到 pg库的 脚本示例3(使用querySQL):
当用户配置了 querySQL之后,DataX系统就会忽略table,column,where
这些配置型,
例如需要进行多表join后同步数据,使用select a,b from table_a
join table_b on table_a.id = table_b.id
querySql优先级大于table、column、where选项。
{
"job": {
"setting": {
"speed": {
"channel": 5
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"connection": [
{
"querySql": [
"select IDNO,COL1,COL2,COL3,DT,COL5,COL6,COL7,COL8,COL9,COL10 from otbs1 where IDNO <= 1000"
],
"jdbcUrl": [
"jdbc:mysql://127.0.0.1:3306/datax"
]
}
],
"username": "root",
"password": "123456"
}
},
"writer": {
"name": "oraclewriter",
"parameter": {
"column": [
"IDNO",
"COL1",
"COL2",
"COL3",
"DT",
"COL5",
"COL6",
"COL7",
"COL8",
"COL9",
"COL10"
],
"connection": [
{
"jdbcUrl": "jdbc:oracle:thin:@192.xxx.xxx.xxx:1521:orcl",
"table": [
"DB.OTBS1"
]
}
],
"username": "username",
"password": "password"
}
}
}
]
}
}