Datax 二次开发插件详细过程

目录

 

1.背景

2.需求

3.开发步骤

3.1 去github上下载datax的代码

3.2 本地解压,并导入idea

3.3创建一个模块kafkareader

3.4将任意一个模块的以下两个文件考入到resource目录下

3.5进行修改plugin.json

3.6修改pom.xml(复制其中一个文件的依赖和插件到pom.xml)

3.7将其他模块下面的,这个文件夹复制到我们模块的对应的文件夹,并且修改package.xml

3.8 在最外层的package.xml加上下面这个

4.开发代码

4.1 开发前将datax的开发插件的手册认真观看一遍,对开发有帮助的。

4.2编写代码(要继承什么类实现什么方法,4.1的开发宝典上都写了)

5.打包运行

5.1将其他模块注释只留下公共模块和自己的项目模块

5.2 进入到项目最外层的目录输入cmd(前提配置了本地maven的环境变量)

5.3 使用maven命令打包

5.4 打包后将下图目录下的包上传到集群的datax对应目录

5.5 写好配置文件就可以运行了


1.背景

    公司要求:统一入库平台,使用datax这个工具。需要采集kafka,elasticsearch,mysql,sqlserver等数据源的数据,并且只打算用datax。

2.需求

 开发datax的kafkaReader组件,从kafka数据源读取数据,然后同步到其他的数据。

 1.要求:可以同步json格式的数据,要求可以用正则来解析数据,可以指定数据的分隔符来解析数据。

 2.可以同步到hive,mysql,hbase中

3.开发步骤

  3.1 去github上下载datax的代码

    网址:https://github.com/alibaba/DataX

 3.2 本地解压,并导入idea

File-》open-》选择你解压的包。 然后进入漫长的导包时间,等待他下载好所有依赖。

3.3创建一个模块kafkareader

3.4将任意一个模块的以下两个文件考入到kafkareader的resource目录下

3.5进行修改plugin.json

3.6修改pom.xml(复制其中一个插件的依赖到kafkareader的pom.xml)

复制 下面两个标签内的内容。如果懒得删除没用的依赖也可以不用删除。然后导入自己的依赖。我们这里是kafkareader所以导入一下两个

<dependencies> 

     ......     ......

</dependencies>
<build> 

 .........

</build>

 

 <!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka -->
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.11</artifactId>
            <version>2.0.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>2.0.0</version>
        </dependency>

最终的Pom文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>datax-all</artifactId>
        <groupId>com.alibaba.datax</groupId>
        <version>0.0.1-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>kafkareader</artifactId>
    <dependencies>
        <dependency>
            <groupId>com.alibaba.datax</groupId>
            <artifactId>datax-common</artifactId>
            <version>${datax-project-version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>slf4j-log4j12</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
        </dependency>

        <dependency>
            <groupId>com.alibaba.datax</groupId>
            <artifactId>plugin-rdbms-util</artifactId>
            <version>${datax-project-version}</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.34</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka -->
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.11</artifactId>
            <version>2.0.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>2.0.0</version>
        </dependency>


    </dependencies>

    <build>
        <plugins>
            <!-- compiler plugin -->
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>
                    <encoding>${project-sourceEncoding}</encoding>
                </configuration>
            </plugin>
            <!-- assembly plugin -->
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptors> <!--描述文件路径-->
                        <descriptor>src/main/assembly/package.xml</descriptor>
                    </descriptors>
                    <finalName>datax</finalName>
                </configuration>
                <executions>
                    <execution>
                        <id>dwzip</id>
                        <phase>package</phase> <!-- 绑定到package生命周期阶段上 -->
                        <goals>
                            <goal>single</goal> <!-- 只运行一次 -->
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

 

3.7将其他模块下面的,这个文件夹复制到我们模块的对应的文件夹,并且修改package.xml

要修改的我标记了,下面加粗了打下划线的地方就是。就是把你之前复制过来的reader修改为kafkareader

<assembly
        xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
    <id></id>
    <formats>
        <format>dir</format>
    </formats>
    <includeBaseDirectory>false</includeBaseDirectory>
    <fileSets>
        <fileSet>
            <directory>src/main/resources</directory>
            <includes>
                <include>plugin.json</include>
                <include>plugin_job_template.json</include>
            </includes>
            <!--修改为kafkareader-->
            <outputDirectory>plugin/reader/kafkareader</outputDirectory>
        </fileSet>
        <fileSet>
            <directory>target/</directory>
            <!--修改为kafkareader-->
            <includes>
                <include>kafkareader-0.0.1-SNAPSHOT.jar</include>
            </includes>
            <!--修改为kafkareader-->
            <outputDirectory>plugin/reader/kafkareader</outputDirectory>
        </fileSet>
        <!--<fileSet>-->
            <!--<directory>src/main/cpp</directory>-->
            <!--<includes>-->
                <!--<include>libhadoop.a</include>-->
                <!--<include>libhadoop.so</include>-->
                <!--<include>libhadoop.so.1.0.0</include>-->
                <!--<include>libhadooppipes.a</include>-->
                <!--<include>libhadooputils.a</include>-->
                <!--<include>libhdfs.a</include>-->
                <!--<include>libhdfs.so</include>-->
                <!--<include>libhdfs.so.0.0.0</include>-->
            <!--</includes>-->
            <!--<outputDirectory>plugin/reader/hdfsreader/libs</outputDirectory>-->
        <!--</fileSet>-->

    </fileSets>

    <dependencySets>
        <dependencySet>
            <!--修改为kafkareader-->
            <useProjectArtifact>false</useProjectArtifact>
            <outputDirectory>plugin/reader/kafkareader/libs</outputDirectory>
            <scope>runtime</scope>
        </dependencySet>
    </dependencySets>
</assembly>

 

3.8 在最外层的package.xml加上下面这个

4.开发代码

4.1 开发前将datax的开发插件的手册认真观看一遍,对开发有帮助的。

地址:https://github.com/alibaba/DataX/blob/master/dataxPluginDev.md

4.2编写代码(要继承什么类实现什么方法,4.1的开发宝典上都写了)

主要代码

package com.alibaba.datax.plugin.reader.kafkareader;

import com.alibaba.datax.common.element.Record;
import com.alibaba.datax.common.element.StringColumn;
import com.alibaba.datax.common.exception.DataXException;
import com.alibaba.datax.common.plugin.RecordSender;
import com.alibaba.datax.common.spi.Reader;
import com.alibaba.datax.common.util.Configuration;
import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class KafkaReader extends Reader {

    public static class Job extends Reader.Job {
        private static final Logger LOG = LoggerFactory
                .getLogger(Job.class);

        private Configuration originalConfig = null;


        @Override
        public void init() {
            this.originalConfig = super.getPluginJobConf();
            // warn: 忽略大小写

            String topic = this.originalConfig
                    .getString(Key.TOPIC);
            Integer partitions = this.originalConfig
                    .getInt(Key.KAFKA_PARTITIONS);
            String bootstrapServers = this.originalConfig
                    .getString(Key.BOOTSTRAP_SERVERS);

            String groupId = this.originalConfig
                    .getString(Key.GROUP_ID);
            Integer columnCount = this.originalConfig
                    .getInt(Key.COLUMNCOUNT);
            String split = this.originalConfig.getString(Key.SPLIT);
            String filterContaintsStr = this.originalConfig.getString(Key.CONTAINTS_STR);
            String filterContaintsFlag = this.originalConfig.getString(Key.CONTAINTS_STR_FLAG);
            String conditionAllOrOne = this.originalConfig.getString(Key.CONDITION_ALL_OR_ONE);
            String parsingRules = this.originalConfig.getString(Key.PARSING_RULES);
            String writerOrder = this.originalConfig.getString(Key.WRITER_ORDER);
            String kafkaReaderColumnKey = this.originalConfig.getString(Key.KAFKA_READER_COLUMN_KEY);

            System.out.println(topic);
            System.out.println(partitions);
            System.out.println(bootstrapServers);
            System.out.println(groupId);
            System.out.println(columnCount);
            System.out.println(split);
            System.out.println(parsingRules);
            if (null == topic) {

                throw DataXException.asDataXException(KafkaReaderErrorCode.TOPIC_ERROR,
                        "没有设置参数[topic].");
            }
            if (partitions == null) {
                throw DataXException.asDataXException(KafkaReaderErrorCode.PARTITION_ERROR,
                        "没有设置参数[kafka.partitions].");
            } else if (partitions < 1) {
                throw DataXException.asDataXException(KafkaReaderErrorCode.PARTITION_ERROR,
                        "[kafka.partitions]不能小于1.");
            }
            if (null == bootstrapServers) {
                throw DataXException.asDataXException(KafkaReaderErrorCode.ADDRESS_ERROR,
                        "没有设置参数[bootstrap.servers].");
            }
            if (null == groupId) {
                throw DataXException.asDataXException(KafkaReaderErrorCode.KAFKA_READER_ERROR,
                        "没有设置参数[groupid].");
            }

            if (columnCount == null) {
                throw DataXException.asDataXException(KafkaReaderErrorCode.PARTITION_ERROR,
                        "没有设置参数[columnCount].");
            } else if (columnCount < 1) {
                throw DataXException.asDataXException(KafkaReaderErrorCode.KAFKA_READER_ERROR,
                        "[columnCount]不能小于1.");
            }
            if (null == split) {
                throw DataXException.asDataXException(KafkaReaderErrorCode.KAFKA_READER_ERROR,
                        "[split]不能为空.");
            }
            if (filterContaintsStr != null) {
                if (conditionAllOrOne == null || filterContaintsFlag == null) {
                    throw DataXException.asDataXException(KafkaReaderErrorCode.KAFKA_READER_ERROR,
                            "设置了[filterContaintsStr],但是没有设置[conditionAllOrOne]或者[filterContaintsFlag]");
                }
            }
            if (parsingRules == null) {
                throw DataXException.asDataXException(KafkaReaderErrorCode.KAFKA_READER_ERROR,
                        "没有设置[parsingRules]参数");
            } else if (!parsingRules.equals("regex") && parsingRules.equals("json") && parsingRules.equals("split")) {
                throw DataXException.asDataXException(KafkaReaderErrorCode.KAFKA_READER_ERROR,
                        "[parsingRules]参数设置错误,不是regex,json,split其中一个");
            }
            if (writerOrder == null) {
                throw DataXException.asDataXException(KafkaReaderErrorCode.KAFKA_READER_ERROR,
                        "没有设置[writerOrder]参数");
            }
            if (kafkaReaderColumnKey == null) {
                throw DataXException.asDataXException(KafkaReaderErrorCode.KAFKA_READER_ERROR,
                        "没有设置[kafkaReaderColumnKey]参数");
            }
        }

        @Override
        public void preCheck() {
            init();

        }

        @Override
        public List<Configuration> split(int adviceNumber) {
            List<Configuration> configurations = new ArrayList<Configuration>();

            Integer partitions = this.originalConfig.getInt(Key.KAFKA_PARTITIONS);
            for (int i = 0; i < partitions; i++) {
                configurations.add(this.originalConfig.clone());
            }
            return configurations;
        }

        @Override
        public void post() {
        }

        @Override
        public void destroy() {

        }

    }

    public static class Task extends Reader.Task {

        private static final Logger LOG = LoggerFactory
                .getLogger(CommonRdbmsReader.Task.class);
        //配置文件
        private Configuration readerSliceConfig;
        //kafka消息的分隔符
        private String split;
        //解析规则
        private String parsingRules;
        //是否停止拉去数据
        private boolean flag;
        //kafka address
        private String bootstrapServers;
        //kafka groupid
        private String groupId;
        //kafkatopic
        private String kafkaTopic;
        //kafka中的数据一共有多少个字段
        private int count;
        //是否需要data_from
        //kafka ip 端口+ topic
        //将包含/不包含该字符串的数据过滤掉
        private String filterContaintsStr;
        //是包含containtsStr 还是不包含
        //1 表示包含 0 表示不包含
        private int filterContaintsStrFlag;
        //全部包含或不包含,包含其中一个或者不包含其中一个。
        private int conditionAllOrOne;
        //writer端要求的顺序。
        private String writerOrder;
        //kafkareader端的每个关键子的key
        private String kafkaReaderColumnKey;
        //异常文件路径
        private String exceptionPath;

        @Override
        public void init() {
            flag = true;
            this.readerSliceConfig = super.getPluginJobConf();
            split = this.readerSliceConfig.getString(Key.SPLIT);
            bootstrapServers = this.readerSliceConfig.getString(Key.BOOTSTRAP_SERVERS);
            groupId = this.readerSliceConfig.getString(Key.GROUP_ID);
            kafkaTopic = this.readerSliceConfig.getString(Key.TOPIC);
            count = this.readerSliceConfig.getInt(Key.COLUMNCOUNT);
            filterContaintsStr = this.readerSliceConfig.getString(Key.CONTAINTS_STR);
            filterContaintsStrFlag = this.readerSliceConfig.getInt(Key.CONTAINTS_STR_FLAG);
            conditionAllOrOne = this.readerSliceConfig.getInt(Key.CONTAINTS_STR_FLAG);
            parsingRules = this.readerSliceConfig.getString(Key.PARSING_RULES);
            writerOrder = this.readerSliceConfig.getString(Key.WRITER_ORDER);
            kafkaReaderColumnKey = this.readerSliceConfig.getString(Key.KAFKA_READER_COLUMN_KEY);
            exceptionPath = this.readerSliceConfig.getString(Key.EXECPTION_PATH);
            LOG.info(filterContaintsStr);
        }

        @Override
        public void startRead(RecordSender recordSender) {

            Properties props = new Properties();
            props.put("bootstrap.servers", bootstrapServers);
            props.put("group.id", groupId != null ? groupId : UUID.randomUUID().toString());
            props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
            props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
            props.put("enable.auto.commit", "false");
            KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
            consumer.subscribe(Collections.singletonList(kafkaTopic));
            Record oneRecord = null;
            while (flag) {
                ConsumerRecords<String, String> records = consumer.poll(100);
                for (ConsumerRecord<String, String> record : records) {

                    String value = record.value();
                    //定义过滤标志
                    int ifNotContinue = filterMessage(value);
                    //如果标志修改为1了那么就过滤掉这条数据。
                    if (ifNotContinue == 1) {
                        LOG.info("过滤数据: " + record.value());
                        continue;
                    }
                    oneRecord = buildOneRecord(recordSender, value);
                    //如果返回值不等于null表示不是异常消息。
                    if (oneRecord != null) {
                        recordSender.sendToWriter(oneRecord);
                    }
                }
                consumer.commitSync();
                //判断当前事件是不是0点,0点的话进程他退出
                Date date = new Date();
                if (DateUtil.targetFormat(date).split(" ")[1].substring(0, 2).equals("00")) {
                    destroy();
                }

            }
        }

        private int filterMessage(String value) {
            //如果要过滤的条件配置了
            int ifNotContinue = 0;

            if (filterContaintsStr != null) {
                String[] filterStrs = filterContaintsStr.split(",");
                //所有
                if (conditionAllOrOne == 1) {
                    //过滤掉包含filterContaintsStr的所有项的值。
                    if (filterContaintsStrFlag == 1) {
                        int i = 0;
                        for (; i < filterStrs.length; i++) {
                            if (!value.contains(filterStrs[i])) break;
                        }
                        if (i >= filterStrs.length) ifNotContinue = 1;
                    } else {
                        //留下掉包含filterContaintsStr的所有项的值
                        int i = 0;
                        for (; i < filterStrs.length; i++) {
                            if (!value.contains(filterStrs[i])) break;
                        }
                        if (i < filterStrs.length) ifNotContinue = 1;
                    }

                } else {
                    //过滤掉包含其中一项的值
                    if (filterContaintsStrFlag == 1) {
                        int i = 0;
                        for (; i < filterStrs.length; i++) {
                            if (value.contains(filterStrs[i])) break;
                        }
                        if (i < filterStrs.length) ifNotContinue = 1;
                    }
                    //留下包含其中一下的值
                    else {
                        int i = 0;
                        for (; i < filterStrs.length; i++) {
                            if (value.contains(filterStrs[i])) break;
                        }
                        if (i >= filterStrs.length) ifNotContinue = 1;
                    }
                }
            }
            return ifNotContinue;

        }

        private Record buildOneRecord(RecordSender recordSender, String value) {
            Record record = null;
            if (parsingRules.equals("regex")) {
                record = parseRegex(value, recordSender);
            } else if (parsingRules.equals("json")) {
                record = parseJson(value, recordSender);
            } else if (parsingRules.equals("split")) {
                record = parseSplit(value, recordSender);
            }
            return record;
        }

        private Record parseSplit(String value, RecordSender recordSender) {
            Record record = recordSender.createRecord();
            String[] splits = value.split(this.split);
            if (splits.length != count) {
                writerErrorPath(value);
                return null;
            }
            parseOrders(Arrays.asList(splits), record);
            return record;
        }

        private Record parseJson(String value, RecordSender recordSender) {
            Record record = recordSender.createRecord();
            HashMap<String, Object> map = JsonUtilJava.parseJsonStrToMap(value);
            String[] columns = kafkaReaderColumnKey.split(",");
            ArrayList<String> datas = new ArrayList<String>();
            for (String column : columns) {
                datas.add(map.get(column).toString());
            }
            if (datas.size() != count) {
                writerErrorPath(value);
                return null;
            }
            parseOrders(datas, record);
            return record;
        }

        private Record parseRegex(String value, RecordSender recordSender) {
            Record record = recordSender.createRecord();
            ArrayList<String> datas = new ArrayList<String>();
            Pattern r = Pattern.compile(split);
            Matcher m = r.matcher(value);
            if (m.find()) {
                if (m.groupCount() != count) {
                    writerErrorPath(value);
                }
                for (int i = 1; i <= count; i++) {
                    //  record.addColumn(new StringColumn(m.group(i)));
                    datas.add(m.group(i));
                    return record;
                }
            } else {
                writerErrorPath(value);
            }

            parseOrders(datas, record);

            return null;
        }

        private void writerErrorPath(String value) {
            if (exceptionPath == null) return;
            FileOutputStream fileOutputStream = null;
            try {
                fileOutputStream = getFileOutputStream();
                fileOutputStream.write((value + "\n").getBytes());
                fileOutputStream.close();
            } catch (FileNotFoundException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        private FileOutputStream getFileOutputStream() throws FileNotFoundException {
            return new FileOutputStream(exceptionPath + "/" + kafkaTopic + "errordata" + DateUtil.targetFormat(new Date(), "yyyyMMdd"), true);
        }

        private void parseOrders(List<String> datas, Record record) {
            //writerOrder
            String[] orders = writerOrder.split(",");
            for (String order : orders) {
                if (order.equals("data_from")) {
                    record.addColumn(new StringColumn(bootstrapServers + "|" + kafkaTopic));
                } else if (order.equals("uuid")) {
                    record.addColumn(new StringColumn(UUID.randomUUID().toString()));
                } else if (order.equals("null")) {
                    record.addColumn(new StringColumn("null"));
                } else if (order.equals("datax_time")) {
                    record.addColumn(new StringColumn(DateUtil.targetFormat(new Date())));
                } else if (isNumeric(order)) {
                    record.addColumn(new StringColumn(datas.get(new Integer(order) - 1)));
                }
            }
        }

        public static boolean isNumeric(String str) {
            for (int i = 0; i < str.length(); i++) {
                if (!Character.isDigit(str.charAt(i))) {
                    return false;
                }
            }
            return true;
        }

        @Override
        public void post() {
        }

        @Override
        public void destroy() {
            flag = false;
        }

       
}

5.打包运行

5.1将其他模块注释只留下公共模块和自己的项目模块

在最为外层的pom.xml中注释

5.2 进入到项目最外层的目录输入cmd(前提配置了本地maven的环境变量)

5.3 使用maven命令打包

 mvn -U clean package assembly:assembly -Dmaven.test.skip=true

5.4 打包后将下图目录下的包上传到集群的datax对应目录

reader插件放在datax目录下的plugin/reader目录,write插件放在datax目录下的plugin/writer目录

 

本地地址:D:\DataX-master\kafkareader\target\datax\plugin\reader

集群地址:/opt/module/datax/plugin/reader

\

5.5 写好配置文件就可以运行了

{
    "job": {
        "content": [
            {
               "reader": {
                    "name": "kafkareader",
                    "parameter": {
                        "topic": "Event",
                        "bootstrapServers": "192.168.7.128:9092",
                        "kafkaPartitions": "1",
                        "columnCount":11,
                        "groupId":"ast",
                        "filterContaints":"5^1,6^5",
                        "filterContaintsFlag":1,
                        "conditionAllOrOne":0,
                        "parsingRules":"regex",
                        "writerOrder":"uuid,1,3,6,4,8,9,10,11,5,7,2,null,datax_time,data_from",
                        "kafkaReaderColumnKey":"a",
                        "execptionPath":"/opt/module/datax/log/errorlog"
                         }
                },
                 "writer": {
                    "name": "hdfswriter",
                    "parameter": {
                        "defaultFS": "hdfs://master:8020",
                        "fileType": "orc",
                        "path": "${path}",
                        "fileName": "t_rsd_amber_agent_event_log",
                        "column": [
                             {
                                "name": "id",
                                "type": "string"
                            },
                            {
                                "name": "risk_level",
                                "type": "string"
                            },
                            {
                                "name": "device_uuid",
                                "type": "string"
                            },
                             {
                                "name": "event_device_id",
                                "type": "string"
                            },
                            {
                                "name": "device_type",
                                "type": "string"
                            },
                            {
                                "name": "event_type",
                                "type": "string"
                            },
                            {
                                "name": "event_sub_type",
                                "type": "string"
                            },
                             {
                                "name": "repeats",
                                "type": "string"
                            },
                            {
                                "name": "description",
                                "type": "string"
                            },
                            {
                                "name": "event_time",
                                "type": "string"
                            },
                            {
                                "name": "report_device_type",
                                "type": "string"
                            },
                            {
                                "name": "event_report_time",
                                "type": "string"
                            },
                             {
                                "name": "last_update_time",
                                "type": "string"
                            },
                            {
                                "name": "datax_time",
                                "type": "string"
                            }
                             , {
                                "name": "data_from",
                                "type": "string"
                            },
                        ],
                        "writeMode": "append",
                        "fieldDelimiter": "\t",
                        "compress":"NONE",
                        "scrollFileTime":300000
                    }
                }

            }
        ],
         "setting": {
            "speed": {
                "channel": 3,
                "record": 20000,
                 "byte":5000 ,
                 "batchSize":2048

            }
        }
    }
}

运行命令:

 python /opt/module/datax/bin/datax.py -p "-Dpath=/data/warehouse/rsd/t_rsd_amber_agent_event_log/2019/06/05" /opt/module/datax/job/kafkatohdfs.json

 

  • 24
    点赞
  • 93
    收藏
    觉得还不错? 一键收藏
  • 86
    评论
DataX是一个用于数据同步的开源工具,它提供了丰富的插件来支持不同的数据源和目标。根据引用[2],DataX插件的开发模式是基于Record的抽象,各个插件只需要按照规范进行开发即可。引用[3]中提到,DataX的打包成功后的包结构中包含了插件目录。 对于Elasticsearch读插件二次开发,你可以参考DataX插件开发规范和文档。首先,你需要了解Elasticsearch的数据结构和API,以便在插件中进行数据读取操作。然后,你可以在DataX插件目录中创建一个新的插件目录,并按照规范进行插件的开发。在插件的配置文件中,你需要指定Elasticsearch的连接信息和查询条件等参数。 在插件的开发过程中,你可以使用DataX提供的各种工具和接口来简化开发和测试。例如,你可以使用DataX的RecordReader接口来读取Elasticsearch中的数据,并将其转换为DataX的Record对象。你还可以使用DataX的各种工具类来处理数据转换和批量写入等操作。 最后,你可以使用DataX的命令行工具来运行你开发的插件,并通过配置文件指定插件的参数和数据源信息。例如,你可以使用类似于引用[1]的命令来运行你的Elasticsearch读插件,并指定数据源的路径和插件的配置文件。 总结起来,要进行DataX的Elasticsearch读插件二次开发,你需要了解Elasticsearch的数据结构和API,按照DataX插件开发规范进行插件的开发,使用DataX的工具和接口简化开发和测试,最后使用DataX的命令行工具来运行你开发的插件

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 86
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值