datax介绍和用法

Datax

简介

DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS 等各种异构数据源之间高效的数据同步功能。

DataX本身作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向目标端写入数据的Writer插件,理论上DataX框架可以支持任意数据源类型的数据同步工作。同时DataX插件体系作为一套生态系统, 每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。

使用说明:

DataX支持数据源链接:https://github.com/alibaba/DataX/blob/master/introduction.md

DataX安装

DataX安装

1、上传依赖、
2、解压
tar -zxvf datax.tar.gz
3、配置环境变量
vim /etc/profile
添加环境变量
source /etc/profile

注意:DataX所使用的环境为Python2,Python3环境下执行会报错

在这里插入图片描述

DataX使用

测试案例

Mysql2Mysql

stream2stream.json

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "sliceRecordCount": 10,
            "column": [
              {
                "type": "long",
                "value": "10"
              },
              {
                "type": "string",
                "value": "hello,你好,世界-DataX"
              }
            ]
          }
        },
        "writer": {
          "name": "streamwriter",
          "parameter": {
            "encoding": "UTF-8",
            "print": true
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 5
       }
    }
  }
}

channel表示 并发数

“print”: true 表示打印到控制台

“sliceRecordCount”: 10 表示每个并发生成的测试数据条数

运行

 datax.py stream2stream.json

结果显示五十条数据,因为通道channel:5和"sliceRecordCount": 10

Mysql2Mysql .json

{
    "job": {
        "setting": {
            "speed": {
                 "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        },
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        "column": [
                            "id",
                            "name",
                            "age",
                            "gender",
                         	"clazz",
                         	"last_mod"
                        ],
                        "where": "clazz like '文科%'",
                        "splitPk": "id",
                        "connection": [
                            {
                                "table": [
                                    "student"
                                ],
                                "jdbcUrl": [
     "jdbc:mysql://master:3306/data?useSSL=false"
                                ]
                            }
                        ]
                    }
                },
               "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "writeMode": "insert",
                        "username": "root",
                        "password": "123456",
                        "column": [
                           	"id",
                            "name",
                            "age",
                            "gender",
                         	"clazz",
                         	"last_mod"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://master:3306/data?useSSL=false&useUnicode=true&characterEncoding=utf-8",
                                "table": [
                                    "datax_stu"
                                ]
                            }
                        ]
                    }
                }
            }
        ]
    }
}

writeMode: insert表示数据插入 update表示根据目标表的主键进行数据的更新操作 replace 表示主键覆盖写入

Mysql2Mysql-qureySQL

{
    "job": {
        "setting": {
            "speed": {
                 "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        },
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        
                        "splitPk": "id",
                        "connection": [
                            {
                                "querySql": [
                                    "select t2.*,t1.total_score FROM( SELECT student_id ,sum(score) as total_score from score  GROUP BY student_id ) t1 join student t2 on t1.student_id = t2.id
"
                                ],
                                "jdbcUrl": [
     "jdbc:mysql://master:3306/student?useSSL=false"
                                ]
                            }
                        ]
                    }
                },
               "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "writeMode": "insert",
                        "username": "root",
                        "password": "123456",
                        "column": [
                            "id",
                            "name",
                            "age",
                            "gender",
                            "clazz",
                            "last_mod",
                            "total_score"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://master:3306/data?useSSL=false&useUnicode=true&characterEncoding=utf-8",
                                "table": [
                                    "score_student"
                                ]
                            }
                        ]
                    }
                }
            }
        ]
    }
}
Mysql2HDFS

mysql2hdfs

{
    "job": {
        "setting": {
            "speed": {
                 "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        },
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        
                        "splitPk": "id",
                        "connection": [
                            {
                                "querySql": [
                                    "select t2.*,t1.total_score FROM( SELECT student_id ,sum(score) as total_score from score  GROUP BY student_id ) t1 join student t2 on t1.student_id = t2.id
"
                                ],
                                "jdbcUrl": [
     "jdbc:mysql://master:3306/student?useSSL=false"
                                ]
                            }
                        ]
                    }
                },
               "writer": {
                    "name": "hdfswriter",
                    "parameter": {
                        "defaultFS": "hdfs://master:9000",
                        "fileType": "text",
                        "path": "/user/hive/warehouse/bigdata.db/students_scores",
                        "fileName": "students_scores",
                        "column": [
                            {
                                "name": "id",
                                "type": "STRING"
                            },
                            {
                                "name": "name",
                                "type": "STRING"
                            },
                            {
                                "name": "age",
                                "type": "INT"
                            },
                            {
                                "name": "gender",
                                "type": "STRING"
                            },
                            {
                                "name": "clazz",
                                "type": "STRING"
                            },
                            {
                                "name": "last_mod",
                                "type": "STRING"
                            },
                            {
                                "name": "total_score",
                                "type": "INT"
                            }
                        ],
                        "writeMode": "append",
                        "fieldDelimiter": ","
                    
                    }
                }
            }
        ]
    }
}
  • hdfs2mysql

    {
        "job": {
            "setting": {
                "speed": {
                     "channel": 3
                },
                "errorLimit": {
                    "record": 0,
                    "percentage": 0.02
                }
            },
            "content": [
                {
                    "reader":  {
                        "name": "hdfsreader",
                        "parameter": {
                            "path": "/datax/stu_score/*",
                            "defaultFS": "hdfs://master:9000",
                            "column": [
                                {
                                    "index": "0",
                                    "type": "STRING"
                                },
                                {
                                    "index": "1",
                                    "type": "STRING"
                                },
                                {
                                    "index": "2",
                                    "type": "STRING"
                                },
                                {
                                    "index": "3",
                                    "type": "STRING"
                                },
                                {
                                    "index": "4",
                                    "type": "STRING"
                                },
                                {
                                    "index": "5",
                                    "type": "STRING"
                                }
                            ],
                            "fileType": "text",
                            "encoding": "UTF-8",
                            "fieldDelimiter": ","
                        }
    
                    },
                   "writer": {
                        "name": "mysqlwriter",
                        "parameter": {
                            "writeMode": "insert",
                            "username": "root",
                            "password": "123456",
                            "column": [
                                "id",
                                "name",
                                "age",
                                "gender",
                                "clazz",
                                "total_score"
                            ],
                            "connection": [
                                {
                                    "jdbcUrl": "jdbc:mysql://master:3306/datax?useSSL=false&useUnicode=true&characterEncoding=utf-8",
                                    "table": [
                                        "stu_score_2"
                                    ]
                                }
                            ]
                        }
                    }
                }
            ]
        }
    }
    

    {“message”:“No enum constant com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil.Type.INT”,“record”:[{“byteSize”
    :10,“index”:0,“rawData”:“1500100001”,“type”:“STRING”},{“byteSize”:3,“index”:1,“rawData”:“施笑槐”,“type”:“STRING”}],“type”:“reader”}

    HDFS中读取文本数据时,所有列的类型都可以给定一个String类型,在插入数据到Mysql时,会做自动类型转换

Hbase2Mysql
  • Mysql2Hbase

    {
      "job": {
        "setting": {
          "speed": {
            "channel": 3
          },
          "errorLimit": {
            "record": 0,
            "percentage": 0.02
          }
        },
        "content": [
          {
            "reader": {
              "name": "mysqlreader",
              "parameter": {
                "username": "root",
                "password": "123456",
    
                "splitPk": "id",
                "connection": [
                  {
                    "querySql": [
                      "select t2.*,t1.total_score FROM( SELECT student_id ,sum(score) as total_score from score  GROUP BY student_id ) t1 join student t2 on t1.student_id = t2.id"
                    ],
                    "jdbcUrl": [
                      "jdbc:mysql://master:3306/student?useSSL=false"
                    ]
                  }
                ]
              }
            },
            "writer": {
              "name": "hbase11xwriter",
              "parameter": {
                "hbaseConfig": {
                  "hbase.zookeeper.quorum": "master:2181,node1:2181,node2:2181"
                },
                "table": "score_stu",
                "mode": "normal",
                "rowkeyColumn": [
                  {
                    "index":0,
                    "type":"string"
                  },
                  {
                    "index":-1,
                    "type":"string",
                    "value":"_"
                  },{
                    "index":6,
                    "type":"string"
                  }
                ],
                "column": [
                  {
                    "index":1,
                    "name": "info:name",
                    "type": "string"
                  },
                  {
                    "index":2,
                    "name": "info:age",
                    "type": "string"
                  },
                  {
                    "index":3,
                    "name": "info:gender",
                    "type": "string"
                  },
                  {
                    "index":4,
                    "name": "info:clazz",
                    "type": "string"
                  },
                  {
                    "index":6,
                    "name": "info:total_score",
                    "type": "string"
                  }
                ],
                "encoding": "utf-8"
              }
            }
          }
        ]
      }
    }
    
    
  • hbase2mysql

    {
        "job": {
            "setting": {
                "speed": {
                     "channel": 3
                },
                "errorLimit": {
                    "record": 10,
                    "percentage": 0.02
                }
            },
            "content": [
                {
                    "reader":  {
                        "name": "hbase11xreader",
                        "parameter": {
                            "hbaseConfig": {
                                "hbase.zookeeper.quorum": "master:2181"
                            },
                            "table": "api:stu",
                            "encoding": "utf-8",
                            "mode": "normal",
                            "column": [
                                {
                                    "name": "rowkey",
                                    "type": "string"
                                },
                                {
                                    "name": "info: name",
                                    "type": "string"
                                },
                                {
                                    "name": "info: age",
                                    "type": "string"
                                },
                                {
                                    "name": "info: gender",
                                    "type": "string"
                                },
                                {
                                    "name": "info: clazz",
                                    "type": "string"
                                }
                            ]
                        }
                    },
                   "writer": {
                        "name": "mysqlwriter",
                        "parameter": {
                            "writeMode": "insert",
                            "username": "root",
                            "password": "123456",
                            "column": [
                                "id",
                                "name",
                                "age",
                                "gender",
                                "clazz"
                            ],
                            "connection": [
                                {
                                    "jdbcUrl": "jdbc:mysql://master:3306/datax?useSSL=false&useUnicode=true&characterEncoding=utf-8",
                                    "table": [
                                        "stu"
                                    ]
                                }
                            ]
                        }
                    }
                }
            ]
        }
    }
    
  • 23
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值