数据集成工具——DataX&DataX-Web

DataX的安装及使用

1、Hive通过外部表与HBase表关联

1)、hive建表语句:

// 第一个字段通常命名为key
CREATE EXTERNAL TABLE hivehbasetable(
    key INT
    ,name STRING
    ,age INT
    ,gender STRING
    ,clazz  STRING
    ,last_mod STRING
) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES (
    "hbase.columns.mapping" = ":key,cf1:name,cf1:age,cf1:gender,cf1:clazz,cf1:last_mod") 
    TBLPROPERTIES("hbase.table.name" = "student");

2)、hbase表

create 'student','cf1'
image-20210126091858197.png

3)、直接执行查询语句:

select key,name from hivehbasetable limit 10;
image-20210126091923753.png

hbase外部表 不能使用sqoop直接导入数据,必须通过例如:insert into这样的形式导入

因为sqoop导入数据,使用的原理是load data,load data只能在表的存储格式为textfile时,才能真正将数据加载到表中

2、DataX的安装

DataX不需要依赖其他服务,直接上传、解压、安装、配置环境变量即可

也可以直接在windows上解压

3、DataX的使用

1)、stream2stream

①、编写配置文件stream2stream.json
# stream2stream.json
{
  "job": {
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "sliceRecordCount": 10,
            "column": [
              {
                "type": "long",
                "value": "10"
              },
              {
                "type": "string",
                "value": "hello,你好,世界-DataX"
              }
            ]
          }
        },
        "writer": {
          "name": "streamwriter",
          "parameter": {
            "encoding": "UTF-8",
            "print": true
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 5
       }
    }
  }
}
②、执行同步任务
datax.py stream2stream.json
③、执行结果

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-YukbL4UP-1611729176897)(https://i.loli.net/2021/01/26/3exW8mAlJ5jX4z7.png)]

2)、mysql2mysql

需要新建student2数据库,并创建student表

①、编写配置文件mysql2mysql.json
{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        "column": [
                            "id",
                            "name",
                            "age",
                            "gender",
                            "clazz",
                            "last_mod"
                        ],
                        "splitPk": "age",
                        "connection": [
                            {
                                "table": [
                                    "student"
                                ],
                                "jdbcUrl": [
                                    "jdbc:mysql://master:3306/student"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "writeMode": "insert",
                        "username": "root",
                        "password": "123456",
                        "column": [
                            "id",
                            "name",
                            "age",
                            "gender",
                            "clazz",
                            "last_mod"
                        ],
                        "preSql": [
                            "truncate student2"
                        ],                        
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://master:3306/student2?useUnicode=true&characterEncoding=utf8",
                                "table": [
                                    "student2"
                                ]
                            }
                        ]
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 6
            }
        }
    }
}
②、执行同步任务
datax.py mysql2mysql.json

3)、mysql2hdfs

写hive跟hdfs时一样的

①、编写配置文件mysql2hdfs.json
{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        "column": [
                            "id",
                            "name",
                            "age",
                            "gender",
                            "clazz",
                            "last_mod"
                        ],
                        "splitPk": "age",
                        "connection": [
                            {
                                "table": [
                                    "student"
                                ],
                                "jdbcUrl": [
                                    "jdbc:mysql://master:3306/student"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "hdfswriter",
                    "parameter": {
                        "defaultFS": "hdfs://master:9000",
                        "fileType": "text",
                        "path": "/user/hive/warehouse/datax.db/students",
                        "fileName": "student",
                        "column": [
                            {
                                "name": "id",
                                "type": "bigint"
                            },
                            {
                                "name": "name",
                                "type": "string"
                            },
                            {
                                "name": "age",
                                "type": "INT"
                            },
                            {
                                "name": "gender",
                                "type": "string"
                            },
                            {
                                "name": "clazz",
                                "type": "string"
                            },
                            {
                                "name": "last_mod",
                                "type": "string"
                            }
                        ],
                        "writeMode": "append",
                        "fieldDelimiter": ","
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 6
            }
        }
    }
}

4)、hbase2mysql

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "hbase11xreader",
                    "parameter": {
                        "hbaseConfig": {
                            "hbase.zookeeper.quorum": "master:2181"
                        },
                        "table": "student",
                        "encoding": "utf-8",
                        "mode": "normal",
                        "column": [
                            {
                                "name": "rowkey",
                                "type": "string"
                            },
                            {
                                "name": "cf1:name",
                                "type": "string"
                            },
                            {
                                "name": "cf1:age",
                                "type": "string"
                            },
                            {
                                "name": "cf1:gender",
                                "type": "string"
                            },
                            {
                                "name": "cf1:clazz",
                                "type": "string"
                            }
                        ],
                        "range": {
                            "startRowkey": "",
                            "endRowkey": "",
                            "isBinaryRowkey": false
                        }
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "writeMode": "insert",
                        "username": "root",
                        "password": "123456",
                        "column": [
                            "id",
                            "name",
                            "age",
                            "gender",
                            "clazz"
                        ],
                        "preSql": [
                            "truncate student2"
                        ],                        
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://master:3306/student2?useUnicode=true&characterEncoding=utf8",
                                "table": [
                                    "student2"
                                ]
                            }
                        ]
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 6
            }
        }
    }
}

5)、mysql2hbase

mysql中的score表需将cource_id改为course_id,并将student_id、course_id设为主键,并将所有字段的类型改为int

hbase需先创建score表:create ‘score’,‘cf1’

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        "column": [
                            "student_id",
                            "course_id",
                            "score"
                        ],
                        "splitPk": "course_id",
                        "connection": [
                            {
                                "table": [
                                    "score"
                                ],
                                "jdbcUrl": [
                                    "jdbc:mysql://master:3306/student"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "hbase11xwriter",
                    "parameter": {
                      "hbaseConfig": {
                        "hbase.zookeeper.quorum": "master:2181"
                      },
                      "table": "score",
                      "mode": "normal",
                      "rowkeyColumn": [
                          {
                            "index":0,
                            "type":"string"
                          },
                          {
                            "index":-1,
                            "type":"string",
                            "value":"_"
                          },
                          {
                            "index":1,
                            "type":"string"
                          }
                      ],
                      "column": [
                        {
                          "index":2,
                          "name": "cf1:score",
                          "type": "int"
                        }
                      ],
                      "encoding": "utf-8"
                    }
                  }
            }
        ],
        "setting": {
            "speed": {
                "channel": 6
            }
        }
    }
}

4、DataX-web的安装(Java 8、支持Python3(默认Python2.7)、Mysql5.7)

★★★注意★★★:一定要先配好java、python环境变量

1)、使用idea同步源代码

2)、在mysql5.1中新建dataxweb数据库

image-20210126151233347.png

3)、执行数据库初始化脚本

image-20210126151351144.png

4)、修改admin配置文件

  • bootstrap.properties

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-yKdmtT8M-1611729176899)(https://i.loli.net/2021/01/26/QtwGqXzximsH84C.png)]

#Database
DB_HOST=master
DB_PORT=3306
DB_USERNAME=root
DB_PASSWORD=123456
DB_DATABASE=dataxweb
  • application.yml
server:
  #port: 8080
  port: 8080
spring:
  #数据源
  datasource:
    #    username: root
    #password: root
    #url: jdbc:mysql://localhost:3306/datax_web?serverTimezone=Asia/Shanghai&useLegacyDatetimeCode=false&useSSL=false&nullNamePatternMatchesAll=true&useUnicode=true&characterEncoding=UTF-8
    password: ${DB_PASSWORD:password}
    username: ${DB_USERNAME:username}
    url: jdbc:mysql://${DB_HOST:127.0.0.1}:${DB_PORT:3306}/${DB_DATABASE:dataxweb}?serverTimezone=Asia/Shanghai&useLegacyDatetimeCode=false&useSSL=false&nullNamePatternMatchesAll=true&useUnicode=true&characterEncoding=UTF-8
    driver-class-name: com.mysql.jdbc.Driver


    hikari:
      ## 最小空闲连接数量
      minimum-idle: 5
      ## 空闲连接存活最大时间,默认600000(10分钟)
      idle-timeout: 180000
      ## 连接池最大连接数,默认是10
      maximum-pool-size: 10
      ## 数据库连接超时时间,默认30秒,即30000
      connection-timeout: 30000
      connection-test-query: SELECT 1
      ##此属性控制池中连接的最长生命周期,值0表示无限生命周期,默认1800000即30分钟
      max-lifetime: 1800000

  # datax-web email
  mail:
    host: smtp.qq.com
    port: 25
    #username: xxx@qq.com
    #password: xxx
    username: root
    password: root
    properties:
      mail:
        smtp:
          auth: true
          starttls:
            enable: true
            required: true
        socketFactory:
          class: javax.net.ssl.SSLSocketFactory


management:
  health:
    mail:
      enabled: false
  server:
    servlet:
      context-path: /actuator

mybatis-plus:
  # mapper.xml文件扫描
  mapper-locations: classpath*:/mybatis-mapper/*Mapper.xml
  # 实体扫描,多个package用逗号或者分号分隔
  #typeAliasesPackage: com.yibo.essyncclient.*.entity
  global-config:
    # 数据库相关配置
    db-config:
      # 主键类型  AUTO:"数据库ID自增", INPUT:"用户输入ID", ID_WORKER:"全局唯一ID (数字类型唯一ID)", UUID:"全局唯一ID UUID";
      id-type: AUTO
      # 字段策略 IGNORED:"忽略判断",NOT_NULL:"非 NULL 判断"),NOT_EMPTY:"非空判断"
      field-strategy: NOT_NULL
      # 驼峰下划线转换
      column-underline: true
      # 逻辑删除
      logic-delete-value: 0
      logic-not-delete-value: 1
      # 数据库类型
      db-type: mysql
    banner: false
  # mybatis原生配置
  configuration:
    map-underscore-to-camel-case: true
    cache-enabled: false
    call-setters-on-nulls: true
    jdbc-type-for-null: 'null'
    type-handlers-package: com.wugui.datax.admin.core.handler

# 配置mybatis-plus打印sql日志
logging:
  #level:
  #  com.wugui.datax.admin.mapper: info
#path: ./data/applogs/admin
  level:
    com.wugui.datax.admin.mapper: error
    path: ./applogs/admin



#datax-job, access token
datax:
  job:
    accessToken:
    #i18n (default empty as chinese, "en" as english)
    i18n:
    ## triggerpool max size
    triggerpool:
      fast:
        max: 200
      slow:
        max: 100
      ### log retention days
    logretentiondays: 30

datasource:
  aes:
    key: AD42F6697B035B75

5)、启动DataX-Web-Admin

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vVgHDtu6-1611729176900)(https://i.loli.net/2021/01/26/NcUTt8AekyHWdoi.png)]

  • 访问: http://127.0.0.1:8080/index.html,用户名:admin,密码:123456

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Vi4qvOOr-1611729176901)(https://i.loli.net/2021/01/26/jhnmH2AQ43SVsoB.png)]

6)、修改datax-web-executor配置文件

  • application.yml

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-DvATDwdW-1611729176902)(https://i.loli.net/2021/01/26/PGTN8zMR6FiSo7O.png)]

# web port
server:
  port: 8081
  #port: 8081

# log config
logging:
  config: classpath:logback.xml
  path: ./applogs/executor/jobhandler
  #path: ./data/applogs/executor/jobhandler

datax:
  job:
    admin:
      ### datax admin address list, such as "http://address" or "http://address01,http://address02"
      #addresses: http://127.0.0.1:8080
      addresses: http://127.0.0.1:8080
    executor:
      appname: datax-executor
      ip:
      #port: 9999
      port: 9999
      ### job log path
      #logpath: ./data/applogs/executor/jobhandler
      logpath: ./applogs/executor/jobhandler
      ### job log retention days
      logretentiondays: 30
    ### job, access token
    accessToken:

  executor:
    #jsonpath: D:\\temp\\executor\\json\\
    jsonpath: ./json

  #pypath: F:\tools\datax\bin\datax.py
  pypath: C:\Users\zzk10\Documents\MacOS\DataIntegrate\datax\datax\bin\datax.py

注意pypath:这个路径需要解压datax.tar.gz到自己的路径,不要使用中文路径

解压后使用datax-web中的python3脚本替换掉 datax/bin/

7)、启动datax-web-executor

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gJevYQSe-1611729176902)(C:/Users/xiaoyoupei/AppData/Roaming/Typora/typora-user-images/image-20210127142754626.png)]

只要进程没有自己停止,一直在后台运行即可

5、DataX-Web的使用

1)、新建数据源

  • mysql

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-C6CZzG3h-1611729176903)(https://i.loli.net/2021/01/26/nI7Kb82M35TyqjV.png)]

  • hive

    需要先启动hiveserver2服务

    下面这条命令需要在linux shell中执行

    hive --service hiveserver2

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BIUOPnPq-1611729176904)(https://i.loli.net/2021/01/26/htrUaYOlkx9Qd2u.png)]

2)新建项目test1

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-r1apg91D-1611729176904)(https://i.loli.net/2021/01/26/nlYF5ka7Zudh3vL.png)]

3)、新建任务模板

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XkrR5aCL-1611729176904)(https://i.loli.net/2021/01/26/3avSFbRHXUY74Tl.png)]

4)、构建任务

  • 创建reader

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-DDaB8vew-1611729176905)(https://i.loli.net/2021/01/26/9R5yMjA7pmvbkPh.png)]

  • 创建writer

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-e3K6EQZ4-1611729176905)(https://i.loli.net/2021/01/26/VA9HL2mGQTaU5kw.png)]

  • 字段映射

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RijJ9g1D-1611729176906)(https://i.loli.net/2021/01/26/luENhq4scbCeZRY.png)]

  • 构建任务

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hL09H3wO-1611729176906)(https://i.loli.net/2021/01/26/Lwtf7eJGgYbDK3B.png)]

  • 执行任务

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bK3NzCOY-1611729176906)(https://i.loli.net/2021/01/26/OJ8kSN5ZDLm9VaE.png)]

  • 查看日志

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rsov3hID-1611729176907)(https://i.loli.net/2021/01/26/2WbKpafwg5iLPmx.png)]

5)、添加hbase 数据源会报错,需要修改两个pom.xml文件(hbase1.4.6)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TSj0LVkT-1611729176907)(https://i.loli.net/2021/01/26/BYowyH76MRbzJDl.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xX3iJFTA-1611729176908)(https://i.loli.net/2021/01/26/V7hOUAXDckv6aIz.png)]

改完pom文件,记得重新reimport,才会生效

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-39GE7YXN-1611729176908)(https://i.loli.net/2021/01/26/HFqoZVM1UL4ewOu.png)]

  • 重新测试hbase连通性

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9IapSTrZ-1611729176909)(https://i.loli.net/2021/01/26/NfkYvGTtdwXx35y.png)]

  • 1
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 9
    评论
datax-web-2.1.2是一个版本为2.1.2的DataX Web工具。要安装datax-web-2.1.2,您可以按照以下步骤进行操作: 1. 下载datax-web-2.1.2.tar.gz压缩包,并将其放置在服务器上。 2. 解压缩datax-web-2.1.2.tar.gz文件,可以使用命令:tar -zxvf datax-web-2.1.2.tar.gz。 3. 进入解压后的目录,并执行安装脚本。如果您想使用交互模式安装,可以运行./bin/install.sh命令,并按照提示填写MySQL信息。如果您不想使用交互模式,可以运行./bin/install.sh --force命令跳过确认过程。 4. 修改控制器datax-admin的配置文件,即./datax-web-2.1.2/modules/datax-admin/conf/application.yml文件。 5. 修改执行器datax-executor的配置文件,即./datax-web-2.1.2/modules/datax-executor/conf/application.yml文件。 6. 如果您的数据库还没有创建和授权,您可以按照以下步骤进行操作: - 使用MySQL 8.0及以上版本创建数据库:create database `datax-web` character set utf8mb4; - 创建用户并授权:CREATE USER 'datax'@'%' IDENTIFIED BY '111111'; grant all privileges on *.* to datax@'%' with grant option; - 修改用户密码:ALTER USER 'datax'@'%' IDENTIFIED WITH mysql_native_password BY '111111'; 7. 启动datax-web,进入./datax-web-2.1.2/bin目录,并执行./start-all.sh命令,即可同时启动控制器和执行器。如果您想单独启动,可以使用命令./start.sh -m datax-admin或./start.sh -m datax-executor。 请注意,以上步骤仅适用于datax-web-2.1.2版本。如果您使用的是其他版本,请参考相应的安装文档。 #### 引用[.reference_title] - *1* *2* *3* [dataxdatax-web安装部署](https://blog.csdn.net/m0_71142057/article/details/124898812)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 9
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

友培

数据皆开源!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值