DataX把mysql数据导入rpdsql(mysql)

首先,在MySQL中准备数据,以lineitem表为例。然后通过DataX将MySQL库中的lineitem表中的数据,迁移到RapidsDB的表lineitem中。

1、在mysql创建lineitem表

CREATE TABLE `lineitem` (
  `l_orderkey` bigint(11) NOT NULL,
  `l_partkey` int(11) NOT NULL,
  `l_suppkey` int(11) NOT NULL,
  `l_linenumber` int(11) NOT NULL,
  `l_quantity` decimal(15,2) NOT NULL,
  `l_extendedprice` decimal(15,2) NOT NULL,
  `l_discount` decimal(15,2) NOT NULL,
  `l_tax` decimal(15,2) NOT NULL,
  `l_returnflag` char(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `l_linestatus` char(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `l_shipdate` date NOT NULL,
  `l_commitdate` date NOT NULL,
  `l_receiptdate` date NOT NULL,
  `l_shipinstruct` char(25) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `l_shipmode` char(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `l_comment` varchar(44) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL);

2、导入数据

mysql> LOAD DATA LOCAL INFILE '/opt/lineitem.csv' into table tpch.lineitem FIELDS TERMINATED BY '|' lines TERMINATED by '|\n' ;
Query OK, 119994608 rows affected (27 min 17.10 sec)
Records: 119994608  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select count(1) from lineitem;
+-----------+
| count(1)  |
+-----------+
| 119994608 |
+-----------+
1 row in set (2 min 14.93 sec)

3、编辑datax配置文件

[root@mysql5 bin]# cat stream2stream_writer.json
{
    "job": {
        "setting": {
            "speed": {
                 "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        },
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        "column": [
                            "*",
                        ],
                        "connection": [
                            {
                                "table": [
                                    "lineitem"
                                ],
                                "jdbcUrl": [
     "jdbc:mysql://192.168.1.67:3306/tpch"
                                ]
                            }
                        ]
                    }
                },
        "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "writeMode": "replace",
                        "username": "test1",
                        "password":"123456",
                        "column": ["*"],
                        "session": [],
                        "preSql": [],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://192.168.1.152:13306/tpch",
                                "table": [
                                    "lineitem"
                                ]
                            }
                        ]
                    }
               }
           }
        ]
    }
}

4、在rpdsql中创建lineitem表(参考1步骤)

5、执行DataX脚本文件

[root@mysql5 bin]# ./datax.py stream2stream_writer.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2022-04-28 12:45:36.649 [main] WARN  ConfigParser - 插件[mysqlreader,mysqlwriter]加载失败,1s后重试... Exception:Code:[Common-00], Describe:[您提供的配置文件存在错                    误信息,请检查您的作业配置 .] - 配置信息错误,您提供的配置文件[/opt/datax/plugin/reader/._drdsreader/plugin.json]不存在. 请检查您的配置文件.
2022-04-28 12:45:37.657 [main] ERROR Engine -

经DataX智能分析,该任务最可能的错误原因是:
com.alibaba.datax.common.exception.DataXException: Code:[Common-00], Describe:[您提供的配置文件存在错误信息,请检查您的作业配置 .] - 配置信息错误,您提供的配置文件[                    /opt/datax/plugin/reader/._drdsreader/plugin.json]不存在. 请检查您的配置文件.
        at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26)
        at com.alibaba.datax.common.util.Configuration.from(Configuration.java:95)
        at com.alibaba.datax.core.util.ConfigParser.parseOnePluginConfig(ConfigParser.java:153)
        at com.alibaba.datax.core.util.ConfigParser.parsePluginConfig(ConfigParser.java:125)
        at com.alibaba.datax.core.util.ConfigParser.parse(ConfigParser.java:63)
        at com.alibaba.datax.core.Engine.entry(Engine.java:137)
        at com.alibaba.datax.core.Engine.main(Engine.java:204)

经DataX智能分析,该任务最可能的错误原因是:
com.alibaba.datax.common.exception.DataXException: Code:[Common-00], Describe:[您提供的配置文件存在错误信息,请检查您的作业配置 .] - 配置信息错误,您提供的配置文件[                    /opt/datax/plugin/reader/._drdsreader/plugin.json]不存在. 请检查您的配置文件.

查看/opt/datax/plugin/reader/目录下有._*文件,删除即可。

删除步骤如下:

[root@mysql5 reader]# find /opt/datax/plugin/reader/ -type f -name "._*er" | xargs rm -rf
[root@mysql5 reader]# find /opt/datax/plugin/writer/ -type f -name "._*er" | xargs rm -rf

6、继续执行脚本

[root@mysql5 bin]# ./datax.py ./stream2stream_writer.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2022-04-28 12:50:13.418 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemI                                                                          mpl
2022-04-28 12:50:13.430 [main] INFO  Engine - the machine info  =>

        osInfo: Oracle Corporation 1.8 25.171-b11
        jvmInfo:        Linux amd64 3.10.0-1160.15.2.el7.x86_64
        cpu num:        8

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [PS MarkSweep, PS Scavenge]

        MEMORY_NAME                    | allocation_size                | init_size
        PS Eden Space                  | 256.00MB                       | 256.00MB
        Code Cache                     | 240.00MB                       | 2.44MB
        Compressed Class Space         | 1,024.00MB                     | 0.00MB
        PS Survivor Space              | 42.50MB                        | 42.50MB
        PS Old Gen                     | 683.00MB                       | 683.00MB
        Metaspace                      | -0.00MB                        | 0.00MB


2022-04-28 12:50:13.454 [main] INFO  Engine -
{
        "content":[
                {
                        "reader":{
                                "name":"mysqlreader",
                                "parameter":{
                                        "column":[
                                                "*"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":[
                                                                "jdbc:mysql://192.168.30.67:3306/tpch"
                                                        ],
                                                        "table":[
                                                                "lineitem1"
                                                        ]
                                                }
                                        ],
                                        "password":"******",
                                        "username":"root"
                                }
                        },
                        "writer":{
                                "name":"mysqlwriter",
                                "parameter":{
                                        "column":[
                                                "*"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":"jdbc:mysql://192.168.30.152:13306/tpch",
                                                        "table":[
                                                                "lineitem"
                                                        ]
                                                }
                                        ],
                                        "password":"******",
                                        "preSql":[],
                                        "session":[],
                                        "username":"test1",
                                        "writeMode":"replace"
                                }
                        }
                }
        ],
        "setting":{
                "errorLimit":{
                        "percentage":0.02,
                        "record":0
                },
                "speed":{
                        "channel":3
                }
        }
}

2022-04-28 12:50:13.475 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: n                                                                          ull
2022-04-28 12:50:13.478 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2022-04-28 12:50:13.478 [main] INFO  JobContainer - DataX jobContainer starts job.
2022-04-28 12:50:13.480 [main] INFO  JobContainer - Set jobId = 0
2022-04-28 12:50:14.270 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://192.168.30                                                                          .67:3306/tpch?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatem                                                                          ents=true.
2022-04-28 12:50:14.271 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因                                                                          为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并                                                                          作出修改.
2022-04-28 12:50:14.860 [job-0] INFO  OriginalConfPretreatmentUtil - table:[lineitem] all columns:[
l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linesta                                                                          tus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment
].
2022-04-28 12:50:14.860 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为                                                                          您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置                                                                          并作出修改.
2022-04-28 12:50:14.862 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
replace INTO %s (l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_ret                                                                          urnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment) VALUES(?,?,?,?                                                                          ,?,?,?,?,?,?,?,?,?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://192.168.30.152:13306/tpch?yearIsDateType=false&zeroDateTimeBehavior=conver                                                                          tToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2022-04-28 12:50:14.863 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2022-04-28 12:50:14.863 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2022-04-28 12:50:14.864 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2022-04-28 12:50:14.864 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2022-04-28 12:50:14.865 [job-0] INFO  JobContainer - Job set Channel-Number to 3 channels.
2022-04-28 12:50:14.870 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2022-04-28 12:50:14.871 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2022-04-28 12:50:14.895 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2022-04-28 12:50:14.901 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2022-04-28 12:50:14.903 [job-0] INFO  JobContainer - Running by standalone Mode.
2022-04-28 12:50:14.914 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] ta                                                                          sks.
2022-04-28 12:50:14.919 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2022-04-28 12:50:14.920 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2022-04-28 12:50:14.929 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is sta                                                                          rted
2022-04-28 12:50:14.934 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * f                                                                          rom lineitem1
] jdbcUrl:[jdbc:mysql://192.168.30.67:3306/tpch?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyIn                                                                          t1isBit=false&rewriteBatchedStatements=true].
2022-04-28 12:50:14.960 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * f                                                                          rom lineitem1
] jdbcUrl:[jdbc:mysql://192.168.30.67:3306/tpch?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyIn                                                                          t1isBit=false&rewriteBatchedStatements=true].
2022-04-28 12:50:15.231 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[303                                                                          ]ms
2022-04-28 12:50:15.232 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2022-04-28 12:50:24.928 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4 records, 424 bytes | Speed 42B/s, 0 records/s | Error 0 records, 0 bytes |  All T                    ask WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2022-04-28 12:50:24.929 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2022-04-28 12:50:24.930 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2022-04-28 12:50:24.930 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do post work.
2022-04-28 12:50:24.930 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2022-04-28 12:50:24.931 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /opt/datax/hook
2022-04-28 12:50:24.932 [job-0] INFO  JobContainer -
         [total cpu info] =>
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu
                -1.00%                         | -1.00%                         | -1.00%


         [total gc info] =>
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
                 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s
                 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s

2022-04-28 12:50:24.933 [job-0] INFO  JobContainer - PerfTrace not enable!
2022-04-28 12:50:24.933 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4 records, 424 bytes | Speed 42B/s, 0 records/s | Error 0 records, 0 bytes |  All T                    ask WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2022-04-28 12:50:24.934 [job-0] INFO  JobContainer -
任务启动时刻                    : 2022-04-28 12:50:13
任务结束时刻                    : 2022-04-28 12:50:24
任务总计耗时                    :                 11s
任务平均流量                    :               42B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   4
读写失败总数                    :                   0

7、查询rpdsql下的lineitem表查看已经有数据同步过来。

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值