Datax安装及基本使用

一、Datax概述

1.概述

在这里插入图片描述

2.DataX插件体系

在这里插入图片描述

3.DataX核心架构

在这里插入图片描述

二、安装

2.1下载并解压

源码地址: https://github.com/alibaba/DataX
这里我下载的是最新版本的 DataX3.0 。下载地址为:
http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

# 下载后进行解压
[xiaokang@hadoop ~]$ tar -zxvf datax.tar.gz -C /opt/software/

2.2运行自检脚本

[xiaokang@hadoop ~]$ cd /opt/software/datax/
[xiaokang@hadoop datax]$ bin/datax.py job/job.json

出现以下界面说明DataX安装成功
在这里插入图片描述

三、基本使用

3.1从stream读取数据并打印到控制台

1. 查看官方json配置模板

[xiaokang@hadoop ~]$ python /opt/software/datax/bin/datax.py -r streamreader -w streamwriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

Please refer to the streamreader document:
     https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md 

Please refer to the streamwriter document:
     https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md 

Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "streamreader", 
                    "parameter": {
                        "column": [], 
                        "sliceRecordCount": ""
                    }
                }, 
                "writer": {
                    "name": "streamwriter", 
                    "parameter": {
                        "encoding": "", 
                        "print": true
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}

2. 根据模板编写json文件

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "streamreader", 
                    "parameter": {
                        "column": [
                            {
                                "type":"string",
                                "value":"xiaokang-微信公众号:小康新鲜事儿"
                            },
                            {
                                "type":"string",
                                "value":"你好,世界-DataX"
                            }
                        ], 
                        "sliceRecordCount": "10"
                    }
                }, 
                "writer": {
                    "name": "streamwriter", 
                    "parameter": {
                        "encoding": "utf-8", 
                        "print": true
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "2"
            }
        }
    }
}

3. 运行Job

[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./stream2stream.json

在这里插入图片描述

3.2 Mysql导入数据到HDFS

示例:导出 MySQL 数据库中的 help_keyword 表到 HDFS 的 /datax目录下(此目录必须提前创建)。
在这里插入图片描述

1. 查看官方json配置模板

[xiaokang@hadoop json]$ python /opt/software/datax/bin/datax.py -r mysqlreader -w hdfswriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

Please refer to the mysqlreader document:
     https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md 

Please refer to the hdfswriter document:
     https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md 

Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": [], 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "username": "", 
                        "where": ""
                    }
                }, 
                "writer": {
                    "name": "hdfswriter", 
                    "parameter": {
                        "column": [], 
                        "compress": "", 
                        "defaultFS": "", 
                        "fieldDelimiter": "", 
                        "fileName": "", 
                        "fileType": "", 
                        "path": "", 
                        "writeMode": ""
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}

2. 根据模板编写json文件

在这里插入图片描述

在这里插入图片描述

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": [
                            "help_keyword_id",
                            "name"
                        ], 
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://192.168.1.106:3306/mysql"
                                ], 
                                "table": [
                                    "help_keyword"
                                ]
                            }
                        ], 
                        "password": "xiaokang", 
                        "username": "root"
                    }
                }, 
                "writer": {
                    "name": "hdfswriter", 
                    "parameter": {
                        "column": [
                            {
                                "name":"help_keyword_id",
                                "type":"int"
                            },
                            {
                                "name":"name",
                                "type":"string"
                            }
                        ], 
                        "defaultFS": "hdfs://hadoop:9000", 
                        "fieldDelimiter": "|", 
                        "fileName": "keyword.txt", 
                        "fileType": "text", 
                        "path": "/datax", 
                        "writeMode": "append"
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "3"
            }
        }
    }
}

3. 运行Job

[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./mysql2hdfs.json

3.3 HDFS数据导出到Mysql

1. 将3.2中导入的文件重命名并在数据库创建表

[xiaokang@hadoop ~]$ hdfs dfs -mv /datax/keyword.txt__4c0e0d04_e503_437a_a1e3_49db49cbaaed /datax/keyword.txt

表必须预先创建,建表语句如下:

CREATE  TABLE  help_keyword_from_hdfs_datax LIKE help_keyword;

2. 查看官方json配置模板

[xiaokang@hadoop json]$ python /opt/software/datax/bin/datax.py -r hdfsreader -w mysqlwriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

Please refer to the hdfsreader document:
     https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md 

Please refer to the mysqlwriter document:
     https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md 

Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "hdfsreader", 
                    "parameter": {
                        "column": [], 
                        "defaultFS": "", 
                        "encoding": "UTF-8", 
                        "fieldDelimiter": ",", 
                        "fileType": "orc", 
                        "path": ""
                    }
                }, 
                "writer": {
                    "name": "mysqlwriter", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": "", 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "preSql": [], 
                        "session": [], 
                        "username": "", 
                        "writeMode": ""
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}

3. 根据模板编写json文件

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "hdfsreader", 
                    "parameter": {
                        "column": [
                            "*"
                        ], 
                        "defaultFS": "hdfs://hadoop:9000", 
                        "encoding": "UTF-8", 
                        "fieldDelimiter": "|", 
                        "fileType": "text", 
                        "path": "/datax/keyword.txt"
                    }
                }, 
                "writer": {
                    "name": "mysqlwriter", 
                    "parameter": {
                        "column": [
                            "help_keyword_id",
                            "name"
                        ], 
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://192.168.1.106:3306/mysql", 
                                "table": ["help_keyword_from_hdfs_datax"]
                            }
                        ], 
                        "password": "xiaokang",  
                        "username": "root", 
                        "writeMode": "insert"
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "3"
            }
        }
    }
}

4. 运行Job

[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./hdfs2mysql.json

3.4 mysql同步到mysql

{
	"job": {
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"password": "gee123456",
					"username": "geespace",
					"connection": [{
						"jdbcUrl": ["jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"],
						"querySql": ["SELECT id, name FROM test_test"]
					}]
				}
			},
			"writer": {
				"name": "mysqlwriter",
				"parameter": {
					"column": ["id", "name"],
					"password": "gee123456",
					"username": "geespace",
					"writeMode": "insert",
					"connection": [{
						"table": ["test_test_1"],
						"jdbcUrl": "jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"
					}]
				}
			}
		}],
		"setting": {
			"speed": {
				"channel": 1
			},
			"errorLimit": {
				"record": 0,
				"percentage": 0.02
			}
		}
	}
}

3.5 mysql同步到hbase

{
	"job": {
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"password": "gee123456",
					"username": "geespace",
					"connection": [{
						"jdbcUrl": ["jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"],
						"querySql": ["SELECT id, name FROM test_test"]
					}]
				}
			},
			"writer": {
				"name": "hbase11xwriter",
				"parameter": {
					"mode": "normal",
					"table": "test_test_1",
					"column": [{
						"name": "f:id",
						"type": "string",
						"index": 0
					}, {
						"name": "f:name",
						"type": "string",
						"index": 1
					}],
					"encoding": "utf-8",
					"hbaseConfig": {
						"hbase.zookeeper.quorum": "192.168.20.91:2181",
						"zookeeper.znode.parent": "/hbase"
					},
					"rowkeyColumn": [{
						"name": "f:id",
						"type": "string",
						"index": 0
					}, {
						"name": "f:name",
						"type": "string",
						"index": 1
					}]
				}
			}
		}],
		"setting": {
			"speed": {
				"channel": 1
			},
			"errorLimit": {
				"record": 0,
				"percentage": 0.02
			}
		}
	}
}

3.6 hbase同步到hbase

{
	"job": {
		"content": [{
			"reader": {
				"name": "hbase11xreader",
				"parameter": {
					"mode": "normal",
					"table": "test_test",
					"column": [{
						"name": "f:id",
						"type": "string"
					}, {
						"name": "f:name",
						"type": "string"
					}],
					"encoding": "utf-8",
					"hbaseConfig": {
						"hbase.zookeeper.quorum": "192.168.20.91:2181",
						"zookeeper.znode.parent": "/hbase"
					}
				}
			},
			"writer": {
				"name": "hbase11xwriter",
				"parameter": {
					"mode": "normal",
					"table": "test_test_1",
					"column": [{
						"name": "f:id",
						"type": "string",
						"index": 0
					}, {
						"name": "f:name",
						"type": "string",
						"index": 1
					}],
					"encoding": "utf-8",
					"hbaseConfig": {
						"hbase.zookeeper.quorum": "192.168.20.91:2181",
						"zookeeper.znode.parent": "/hbase"
					},
					"rowkeyColumn": [{
						"name": "f:id",
						"type": "string",
						"index": 0
					}, {
						"name": "f:name",
						"type": "string",
						"index": 1
					}]
				}
			}
		}],
		"setting": {
			"speed": {
				"channel": 1
			},
			"errorLimit": {
				"record": 0,
				"percentage": 0.02
			}
		}
	}
}

3.7 hbase同步到mysql

{
	"job": {
		"content": [{
			"reader": {
				"name": "hbase11xreader",
				"parameter": {
					"mode": "normal",
					"table": "test_test_1",
					"column": [{
						"name": "f:id",
						"type": "string"
					}, {
						"name": "f:name",
						"type": "string"
					}],
					"encoding": "utf-8",
					"hbaseConfig": {
						"hbase.zookeeper.quorum": "192.168.20.91:2181",
						"zookeeper.znode.parent": "/hbase"
					}
				}
			},
			"writer": {
				"name": "mysqlwriter",
				"parameter": {
					"column": ["id", "name"],
					"password": "gee123456",
					"username": "geespace",
					"writeMode": "insert",
					"connection": [{
						"table": ["test_test"],
						"jdbcUrl": "jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"
					}]
				}
			}
		}],
		"setting": {
			"speed": {
				"channel": 1
			},
			"errorLimit": {
				"record": 0,
				"percentage": 0.02
			}
		}
	}
}

四、辅助资料

DataX介绍以及优缺点分析:
https://blog.csdn.net/qq_29359303/article/details/100656445?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522162434122516780357231881%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=162434122516780357231881&biz_id=0&utm_med

datax详细介绍及使用:
https://blog.csdn.net/qq_39188747/article/details/102577017?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522162434122516780357231881%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=162434122516780357231881&biz_id=0&utm_med

  • 6
    点赞
  • 55
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论
在Windows上安装DataX,您可以按照以下步骤进行操作: 1. 首先,确保您的系统已经安装了Java环境。您可以在命令提示符中输入`java -version`命令来检查Java的安装情况。如果未安装,请前往Java官网下载并安装最新版本的Java Development Kit (JDK)。 2. 下载DataX安装包。您可以在DataX的官方GitHub页面(https://github.com/alibaba/DataX)上找到最新的发布版本。选择适合您系统的版本进行下载。 3. 解压下载的安装包。您可以使用解压工具(如WinRAR)将下载的.zip文件解压到您选择的目录中。 4. 配置DataX。进入解压后的DataX目录,编辑conf目录下的core.json文件,根据您的需求配置相应的参数,比如数据源、目标源、插件等。 5. 打开命令提示符,进入DataX根目录。执行以下命令启动DataX: ``` python bin\datax.py {job.json路径} ``` 其中,{job.json路径}是您编写的作业描述文件的路径。您可以在conf目录下创建一个作业描述文件,用于指定数据源和目标源的信息。 6. 检查运行结果。DataX会输出作业运行的详细信息和日志。您可以根据输出信息来检查作业是否成功运行,并查看数据同步的结果。 注意:在Windows上运行DataX时,确保已经正确配置了Python环境,并且能够在命令提示符中直接执行`python`命令。 以上是在Windows上安装DataX基本步骤,希望对您有帮助。如有任何问题,请随时提问。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

刘大猫.

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值