DataX系列2- 安装DataX

一.系统要求

  1. Linux
  2. JDK(1.8以上,推荐1.8)
  3. Python(推荐Python 2.6.x)
  4. Apache Maven 3.x(编译DataX时才需要)

  此处使用二进制安装包的方式安装,所以无需使用Maven,相关软件配置信息如下:

[root@10-31-1-119 ~]# java -version
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-b08)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
[root@10-31-1-119 ~]# 
[root@10-31-1-119 ~]# python -V
Python 2.7.5
[root@10-31-1-119 ~]# cat /etc/redhat-release 
CentOS Linux release 7.8.2003 (Core)
[root@10-31-1-119 ~]# 

二.下载及安装

  此处使用下载二进制的方式来安装DataX

2.1 下载

wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

image.png

2.2 安装

  DataX是绿色版的,下载下来解压即可使用。

image.png

  1. bin目录
    有datax.py这个启动脚本
    image.png

  2. conf目录
    conf是配置目录,一般将参数信息放到***.json文件里面
    image.png

  3. job目录
    存放运行的job
    image.png

  4. lib目录
    存放一些依赖的包
    image.png

  5. plugin目录
    存放异构数据源的读和写的jar包
    image.png

  6. script目录
    存放readme.md文件
    image.png

三.启动datax

3.1 创建作业的配置文件

cd $datax_home
cd bin
vi stream2stream.json

将一下内容拷贝到json文件中

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "sliceRecordCount": 10,
            "column": [
              {
                "type": "long",
                "value": "10"
              },
              {
                "type": "string",
                "value": "hello,你好,世界-DataX"
              }
            ]
          }
        },
        "writer": {
          "name": "streamwriter",
          "parameter": {
            "encoding": "UTF-8",
            "print": true
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 5
       }
    }
  }
}

3.2 启动datax

cd $datax_home
cd bin
python datax.py ./stream2stream.json 

测试记录:

[root@10-31-1-119 bin]# python datax.py ./stream2stream.json 

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2021-11-22 17:27:56.774 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2021-11-22 17:27:56.783 [main] INFO  Engine - the machine info  => 

        osInfo: Oracle Corporation 1.8 25.242-b08
        jvmInfo:        Linux amd64 3.10.0-1127.el7.x86_64
        cpu num:        8

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [PS MarkSweep, PS Scavenge]

        MEMORY_NAME                    | allocation_size                | init_size                      
        PS Eden Space                  | 256.00MB                       | 256.00MB                       
        Code Cache                     | 240.00MB                       | 2.44MB                         
        Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
        PS Survivor Space              | 42.50MB                        | 42.50MB                        
        PS Old Gen                     | 683.00MB                       | 683.00MB                       
        Metaspace                      | -0.00MB                        | 0.00MB                         


2021-11-22 17:27:56.797 [main] INFO  Engine - 
{
        "content":[
                {
                        "reader":{
                                "name":"streamreader",
                                "parameter":{
                                        "column":[
                                                {
                                                        "type":"long",
                                                        "value":"10"
                                                },
                                                {
                                                        "type":"string",
                                                        "value":"hello,你好,世界-DataX"
                                                }
                                        ],
                                        "sliceRecordCount":10
                                }
                        },
                        "writer":{
                                "name":"streamwriter",
                                "parameter":{
                                        "encoding":"UTF-8",
                                        "print":true
                                }
                        }
                }
        ],
        "setting":{
                "speed":{
                        "channel":5
                }
        }
}

2021-11-22 17:27:56.812 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2021-11-22 17:27:56.814 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2021-11-22 17:27:56.814 [main] INFO  JobContainer - DataX jobContainer starts job.
2021-11-22 17:27:56.815 [main] INFO  JobContainer - Set jobId = 0
2021-11-22 17:27:56.827 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2021-11-22 17:27:56.827 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2021-11-22 17:27:56.827 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2021-11-22 17:27:56.827 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2021-11-22 17:27:56.828 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
2021-11-22 17:27:56.828 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [5] tasks.
2021-11-22 17:27:56.829 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [5] tasks.
2021-11-22 17:27:56.850 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2021-11-22 17:27:56.859 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2021-11-22 17:27:56.860 [job-0] INFO  JobContainer - Running by standalone Mode.
2021-11-22 17:27:56.873 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [5] channels for [5] tasks.
2021-11-22 17:27:56.878 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2021-11-22 17:27:56.878 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2021-11-22 17:27:56.889 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] attemptCount[1] is started
2021-11-22 17:27:56.891 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[4] attemptCount[1] is started
2021-11-22 17:27:56.894 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] attemptCount[1] is started
2021-11-22 17:27:56.896 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[3] attemptCount[1] is started
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
2021-11-22 17:27:56.898 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
2021-11-22 17:27:56.999 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[101]ms
2021-11-22 17:27:57.000 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] is successed, used[113]ms
2021-11-22 17:27:57.000 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] is successed, used[107]ms
2021-11-22 17:27:57.000 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[3] is successed, used[105]ms
2021-11-22 17:27:57.000 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[4] is successed, used[109]ms
2021-11-22 17:27:57.001 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2021-11-22 17:28:06.882 [job-0] INFO  StandAloneJobContainerCommunicator - Total 50 records, 950 bytes | Speed 95B/s, 5 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.001s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-11-22 17:28:06.883 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2021-11-22 17:28:06.885 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2021-11-22 17:28:06.886 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2021-11-22 17:28:06.886 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2021-11-22 17:28:06.888 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /home/software/datax/hook
2021-11-22 17:28:06.891 [job-0] INFO  JobContainer - 
         [total cpu info] => 
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
                -1.00%                         | -1.00%                         | -1.00%
                        

         [total gc info] => 
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
                 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
                 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2021-11-22 17:28:06.892 [job-0] INFO  JobContainer - PerfTrace not enable!
2021-11-22 17:28:06.893 [job-0] INFO  StandAloneJobContainerCommunicator - Total 50 records, 950 bytes | Speed 95B/s, 5 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.001s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-11-22 17:28:06.893 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2021-11-22 17:27:56
任务结束时刻                    : 2021-11-22 17:28:06
任务总计耗时                    :                 10s
任务平均流量                    :               95B/s
记录写入速度                    :              5rec/s
读出记录总数                    :                  50
读写失败总数                    :                   0

[root@10-31-1-119 bin]# 

参考:

  1. https://github.com/alibaba/DataX/blob/master/userGuid.md
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值