datax数据迁移
hdfs到mysql记录:
- hdfsreaderTomysqlwriter.json
[root@slave1 mytemplate]
[root@slave1 mytemplate]
{
"job": {
"content": [
{
"reader": {
"name": "hdfsreader",
"parameter": {
"column": ["*"],
"hadoopConfig":{
"dfs.client.failover.proxy.provider.mycluster":"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"dfs.ha.namenodes.mycluster":"nn1,nn2",
"dfs.namenode.rpc-address.mycluster.nn1":"leader:8020",
"dfs.namenode.rpc-address.mycluster.nn2":"slave1:8020",
"dfs.nameservices":"mycluster"
},
"defaultFS": "hdfs://mycluster",
"encoding": "UTF-8",
"fieldDelimiter": "\t",
"fileType": "text",
"path": "/datax/datas/firstStudent.csv__527c2cb7_7889_4ef3_a22b_6401e4d15064"
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": ["stu_id","stu_age","stu_name"],
"connection": [
{
"jdbcUrl": "jdbc:mysql://slave1:3306/javaAndBigdata?useUnicode=true&characterEncoding=utf-8",
"table": ["student"]
}
],
"password": "javaAndBigdata",
"username": "root",
"writeMode": "insert"
}
}
}
],
"setting": {
"speed": {
"channel": "2"
}
}
}
}
- 执行脚本:hdfsreaderTomysqlwriter.json
[root@slave1 mytemplate]
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2022-03-24 23:40:04.272 [main] INFO VMInfo - VMInfo
2022-03-24 23:40:04.279 [main] INFO Engine - the machine info =>
osInfo: Oracle Corporation 1.8 25.171-b11
jvmInfo: Linux amd64 3.10.0-1127.el7.x86_64
cpu num: 2
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME | allocation_size | init_size
PS Eden Space | 256.00MB | 256.00MB
Code Cache | 240.00MB | 2.44MB
Compressed Class Space | 1,024.00MB | 0.00MB
PS Survivor Space | 42.50MB | 42.50MB
PS Old Gen | 683.00MB | 683.00MB
Metaspace | -0.00MB | 0.00MB
2022-03-24 23:40:04.298 [main] INFO Engine -
{
"content":[
{
"reader":{
"name":"hdfsreader",
"parameter":{
"column":[
"*"
],
"defaultFS":"hdfs://mycluster",
"encoding":"UTF-8",
"fieldDelimiter":"\t",
"fileType":"text",
"hadoopConfig":{
"dfs.client.failover.proxy.provider.mycluster":"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"dfs.ha.namenodes.mycluster":"nn1,nn2",
"dfs.namenode.rpc-address.mycluster.nn1":"leader:8020",
"dfs.namenode.rpc-address.mycluster.nn2":"slave1:8020",
"dfs.nameservices":"mycluster"
},
"path":"/datax/datas/firstStudent.csv__527c2cb7_7889_4ef3_a22b_6401e4d15064"
}
},
"writer":{
"name":"mysqlwriter",
"parameter":{
"column":[
"stu_id",
"stu_age",
"stu_name"
],
"connection":[
{
"jdbcUrl":"jdbc:mysql://slave1:3306/javaAndBigdata?useUnicode=true&characterEncoding=utf-8",
"table":[
"student"
]
}
],
"password":"********",
"username":"root",
"writeMode":"insert"
}
}
}
],
"setting":{
"speed":{
"channel":"2"
}
}
}
2022-03-24 23:40:04.315 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
2022-03-24 23:40:04.317 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2022-03-24 23:40:04.317 [main] INFO JobContainer - DataX jobContainer starts job.
2022-03-24 23:40:04.319 [main] INFO JobContainer - Set jobId = 0
2022-03-24 23:40:04.347 [job-0] INFO HdfsReader$Job - init() begin...
2022-03-24 23:40:04.623 [job-0] INFO HdfsReader$Job - hadoopConfig details:{"finalParameters":[]}
2022-03-24 23:40:04.623 [job-0] INFO HdfsReader$Job - init() ok and end...
2022-03-24 23:40:04.932 [job-0] INFO OriginalConfPretreatmentUtil - table:[student] all columns:[
stu_id,stu_age,stu_name
].
2022-03-24 23:40:04.940 [job-0] INFO OriginalConfPretreatmentUtil - Write data [
insert INTO %s (stu_id,stu_age,stu_name) VALUES(?,?,?)
], which jdbcUrl like:[jdbc:mysql://slave1:3306/javaAndBigdata?useUnicode=true&characterEncoding=utf-8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2022-03-24 23:40:04.941 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2022-03-24 23:40:04.941 [job-0] INFO JobContainer - DataX Reader.Job [hdfsreader] do prepare work .
2022-03-24 23:40:04.941 [job-0] INFO HdfsReader$Job - prepare(), start to getAllFiles...
2022-03-24 23:40:04.941 [job-0] INFO HdfsReader$Job - get HDFS all files in path = [/datax/datas/firstStudent.csv__527c2cb7_7889_4ef3_a22b_6401e4d15064]
三月 24, 2022 11:40:05 下午 org.apache.hadoop.util.NativeCodeLoader <clinit>
警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2022-03-24 23:40:05.760 [job-0] INFO HdfsReader$Job - [hdfs://mycluster/datax/datas/firstStudent.csv__527c2cb7_7889_4ef3_a22b_6401e4d15064]是[text]类型的文件, 将该文件加入source files列表
2022-03-24 23:40:05.761 [job-0] INFO HdfsReader$Job - 您即将读取的文件数为: [1], 列表为: [hdfs://mycluster/datax/datas/firstStudent.csv__527c2cb7_7889_4ef3_a22b_6401e4d15064]
2022-03-24 23:40:05.762 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2022-03-24 23:40:05.762 [job-0] INFO JobContainer - jobContainer starts to do split ...
2022-03-24 23:40:05.763 [job-0] INFO JobContainer - Job set Channel-Number to 2 channels.
2022-03-24 23:40:05.763 [job-0] INFO HdfsReader$Job - split() begin...
2022-03-24 23:40:05.763 [job-0] INFO JobContainer - DataX Reader.Job [hdfsreader] splits to [1] tasks.
2022-03-24 23:40:05.764 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2022-03-24 23:40:05.774 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2022-03-24 23:40:05.778 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2022-03-24 23:40:05.780 [job-0] INFO JobContainer - Running by standalone Mode.
2022-03-24 23:40:05.792 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2022-03-24 23:40:05.798 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2022-03-24 23:40:05.798 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2022-03-24 23:40:05.816 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2022-03-24 23:40:05.858 [0-0-0-reader] INFO HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2022-03-24 23:40:05.859 [0-0-0-reader] INFO Reader$Task - read start
2022-03-24 23:40:05.860 [0-0-0-reader] INFO Reader$Task - reading file : [hdfs://mycluster/datax/datas/firstStudent.csv__527c2cb7_7889_4ef3_a22b_6401e4d15064]
2022-03-24 23:40:05.879 [0-0-0-reader] INFO UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":"\t","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2022-03-24 23:40:05.883 [0-0-0-reader] INFO Reader$Task - end read source files...
2022-03-24 23:40:05.918 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[109]ms
2022-03-24 23:40:05.918 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
2022-03-24 23:40:15.794 [job-0] INFO StandAloneJobContainerCommunicator - Total 9 records, 38 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2022-03-24 23:40:15.795 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2022-03-24 23:40:15.795 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2022-03-24 23:40:15.795 [job-0] INFO JobContainer - DataX Reader.Job [hdfsreader] do post work.
2022-03-24 23:40:15.795 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2022-03-24 23:40:15.796 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /opt/datax/hook
2022-03-24 23:40:15.798 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 1 | 1 | 1 | 0.032s | 0.032s | 0.032s
PS Scavenge | 1 | 1 | 1 | 0.028s | 0.028s | 0.028s
2022-03-24 23:40:15.798 [job-0] INFO JobContainer - PerfTrace not enable!
2022-03-24 23:40:15.798 [job-0] INFO StandAloneJobContainerCommunicator - Total 9 records, 38 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2022-03-24 23:40:15.801 [job-0] INFO JobContainer -
任务启动时刻 : 2022-03-24 23:40:04
任务结束时刻 : 2022-03-24 23:40:15
任务总计耗时 : 11s
任务平均流量 : 3B/s
记录写入速度 : 0rec/s
读出记录总数 : 9
读写失败总数 : 0
mysql> select * from javaAndBigdata.student;
Empty set (0.00 sec)
mysql> select * from javaAndBigdata.student;
+--------+---------+----------+
| stu_id | stu_age | stu_name |
+--------+---------+----------+
| 1 | 1 | ?? |
| 2 | 2 | ?? |
| 4 | 4 | ?? |
| 5 | 5 | ?? |
| 6 | 6 | ?? |
| 7 | 7 | ?? |
| 8 | 8 | ?? |
| 9 | 9 | ?? |
| 10 | 10 | ?? |
+--------+---------+----------+
9 rows in set (0.00 sec)
mysql> truncate table student;
Query OK, 0 rows affected (0.02 sec)
mysql> select * from javaAndBigdata.student;
Empty set (0.00 sec)
mysql> select * from javaAndBigdata.student;
+--------+---------+----------+
| stu_id | stu_age | stu_name |
+--------+---------+----------+
| 1 | 1 | 李毅 |
| 2 | 2 | 刘二 |
| 4 | 4 | 赵四 |
| 5 | 5 | 王五 |
| 6 | 6 | 赵六 |
| 7 | 7 | 田七 |
| 8 | 8 | 王八 |
| 9 | 9 | 流九 |
| 10 | 10 | 满十 |
+--------+---------+----------+
9 rows in set (0.00 sec)
hdfs文件系统中不需要指明数据类型,如果一定要指定数据类型,要和index共用!
com.alibaba.datax.common.exception.DataXException: Code:[HdfsReader-06], Description:[没有 Index]. - 由于您配置了type, 则至少需要配置 index 或 value
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26)
at com.alibaba.datax.plugin.reader.hdfsreader.HdfsReader$Job.validateColumns(HdfsReader.java:150)
at com.alibaba.datax.plugin.reader.hdfsreader.HdfsReader$Job.validate(HdfsReader.java:111)
at com.alibaba.datax.plugin.reader.hdfsreader.HdfsReader$Job.init(HdfsReader.java:50)
at com.alibaba.datax.core.job.JobContainer.initJobReader(JobContainer.java:673)
at com.alibaba.datax.core.job.JobContainer.init(JobContainer.java:303)
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:113)
at com.alibaba.datax.core.Engine.start(Engine.java:92)
at com.alibaba.datax.core.Engine.entry(Engine.java:171)
at com.alibaba.datax.core.Engine.main(Engine.java:204)
mysql中的中文乱码!jdbcurl中指定utf-8即可解决问题!
mysql> select * from javaAndBigdata.student;
+--------+---------+----------+
| stu_id | stu_age | stu_name |
+--------+---------+----------+
| 1 | 1 | ?? |
| 2 | 2 | ?? |
| 4 | 4 | ?? |
| 5 | 5 | ?? |
| 6 | 6 | ?? |
| 7 | 7 | ?? |
| 8 | 8 | ?? |
| 9 | 9 | ?? |
| 10 | 10 | ?? |
+--------+---------+----------+
9 rows in set (0.00 sec)
- 截图展示: