【DataX】将hive表数据导入ES

目录

一、环境

二、创建hive测试表

三、Es写入插件包

四、配置json

五、数据同步

1、执行命令

2、查看es结果


一、环境

        DataX:windows安装

        Es版本:7.9.0        

二、创建hive测试表

CREATE TABLE teacher(
 name string,
 age int
)row format delimited fields terminated by ','
stored as orc;
insert into teacher(name,age) values("zhangsn",22);
insert into teacher(name,age) values("lisi",30);
insert into teacher(name,age) values("wangwu",66);
insert into teacher(name,age) values("lihua",15);

三、Es写入插件包

下载地址

 将es写入插件包放进plugin\writer文件夹下。

四、配置json

 hive_es.json:

{
    "job": {
        "setting": {
            "speed": {
                "channel": 1
            }
        },
        "content": [
            {
                "reader": {
                    "name": "hdfsreader",
                    "parameter": {
                        "path": "/user/hive/warehouse/teacher/*",
                        "defaultFS": "hdfs://192.168.xx.xx:8020",
                        "column": [
                               {"index": 0,   "type": "string" },
                               {"index": 1,   "type": "long" }
                        ],
                        "fileType": "orc",
                        "encoding": "UTF-8",
                        "fieldDelimiter": ","
                    }
                },
                "writer": {
                  "name": "elasticsearchwriter",
                  "parameter": {
                    "endpoint": "http://192.168.xx.xx:9200/",
                    "accessId": "123",
                    "accessKey": "123",
                    "cleanup": true,
                    "index":"teacher",
                    "type":"_doc",
                    "settings": {
                        "settings":{
                            "index":{
                                "mapping":{"total_fields":{"limit":2000}},
                                "number_of_replicas":2,
                                "number_of_shards":10
                            }
                        }
                    },
                    "batchSize": 1000,
                    "splitter": ",",
                    "column": [
                      {"name":"name","type":"string"},
                      {"name":"age","type":"long"}
                    ]
                  }
                }
            }
        ]
    }
}

注意:

1、将配置好的hive_es.json文件放到job文件夹下;

2、hive的读取配置是hdfsreader,在hive中使用show create table teacher命令可以查看teacher表在hdfs中的位置,得到这个位置可以配置hdfsreader的defaultFs和path;

3、hive表的int类型对应hdfsreader字段配置的long,否则数据同步失败;ES也是对应为long;

4、可以使用下面命令查找读取、写入的标准模板(前提是对应reader和writer文件夹中的插件文件夹中要有plugin_job_template.json文件模板):

D:\workSoftWare\datax\datax\bin>datax.py -r mysqlreader -w elasticsearchwriter

五、数据同步

1、执行命令

D:\workSoftWare\datax\datax\bin>python  datax.py  ../job/hive_es.json

2、查看es结果

        这里使用kibana查询:

GET /teacher/_search
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "pnhlI4gBUNFFJN_x_IxO",
        "_score" : 1.0,
        "_source" : {
          "name" : "wangwu",
          "age" : 66
        }
      },
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "o3hlI4gBUNFFJN_x9Yw7",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsn",
          "age" : 22
        }
      },
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "onhlI4gBUNFFJN_x74zU",
        "_score" : 1.0,
        "_source" : {
          "name" : "lisi",
          "age" : 30
        }
      },
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "pHhlI4gBUNFFJN_x-IyH",
        "_score" : 1.0,
        "_source" : {
          "name" : "lihua",
          "age" : 15
        }
      }
    ]
  }
}

同步成功!!!

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

郝少

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值