DataX KafkaWriter 插件文档
最近学习使用datax工具, 发现阿里官方提供并没有kafkawriter插件,于是自己写了一个
该插件主要借鉴:datax插件开发宝典
然后在此基础上改造
源码:https://gitee.com/mjlfto/dataX/tree/master/kafkawriter
1 快速介绍
KakfaWriter提供向kafka中指定topic写数据。
2 功能与限制
目前kafkaWriter支持向单个topic中写入文本类型数据或者json格式数据
3 功能说明
3.1 配置样例
{
"job":{
"setting":{
"speed":{
"channel":1
}
},
"content":[
{
"reader":{
"name":"oraclereader",
"parameter":{
"username":"zkcj",
"password":"zkcj2018",
"connection":[
{
"jdbcUrl":[
"jdbc:oracle:thin:@10.1.20.169:1521:GYJG"
],
"querySql":[
"select * from VM_DRV_PREASIGN_A"
]
}
]
}
},
"writer":{
"name": "kafkawriter",
"parameter": {
"topic": "test-topic",
"bootstrapServers": "10.1.20.150:9092",
"fieldDelimiter":"\t",
"batchSize":10,
"writeType":"json",
"notTopicCreate":true,
"topicNumPartition":1,
"topicReplicationFactor":1
}
}
}
]
}
}
3.2 参数说明
-
bootstrapServers
-
描述:kafka服务地址,格式:host1:port,host2:port 样例:10.1.20.111:9092,10.1.20.121:9092
-
必选:是
-
默认值:无
-
-
topic
-
描述:kafka Topic 名称, 目前支持一次写入单个topic
-
必选:是
-
默认值:无
-
-
ack
-
描述:消息的确认机制,默认值是0
acks=0:如果设置为0,生产者不会等待kafka的响应。
acks=1:这个配置意味着kafka会把这条消息写到本地日志文件中,但是不会等待集群中其他机器的成功响应。
acks=all:这个配置意味着leader会等待所有的follower同步完成。这个确保消息不会丢失,除非kafka集群中所有机器挂掉。这是最强的可用性保证。
-
必选:否
-
默认值:0
-
-
batchSize
-
描述:当多条消息需要发送到同一个分区时,生产者会尝试合并网络请求。这会提高client和生产者的效率
默认值:16384 -
必选:否
-
默认值:16384
-
-
retries
-
描述:配置为大于0的值的话,客户端会在消息发送失败时重新发送:
默认值:0
-
必选:否
-
默认值:0
-
-
fieldDelimiter
-
描述:当wirteType为text时,写入时的字段分隔符
默认值:,(逗号)
-
必选:否
-
默认值:,
-
-
keySerializer
-
描述:键序列化,默认org.apache.kafka.common.serialization.StringSerializer
-
必选:否
-
默认值:org.apache.kafka.common.serialization.StringSerializer
-
-
valueSerializer
-
描述:键序列化,默认org.apache.kafka.common.serialization.StringSerializer
-
必选:否
-
默认值:org.apache.kafka.common.serialization.StringSerializer
-
-
noTopicCreate
-
描述:当没有topic时,是否创建topic,默认false
-
必选:haveKerberos 为true必选
-
默认值:false
-
-
topicNumPartition
-
描述:topic Partition 数量
-
必选:否
-
默认值:1
-
-
topicReplicationFactor
-
描述:topic replication 数量
-
必选:否
-
默认值:1
-
-
writeType
-
描述:写入到kafka中的数据格式,可选text, json
text:使用fieldDelimiter拼接所有字段值作为key,value相同,然后写到kafka
json:key和text格式相同,使用fieldDelimiter拼接所有字段值作为key,value使用datx内部column格式, 如下
rawData为数据值,如果对象中没有该字段, 表示该值为null{ "data":[ { "byteSize":13, "rawData":"xxxx", "type":"STRING" }, { "byteSize":1, "rawData":"1", "type":"STRING" }, { "byteSize":12, "rawData":"xxx", "type":"STRING" }, { "byteSize":1, "rawData":"A", "type":"STRING" }, { "byteSize":18, "rawData":"xxx", "type":"STRING" }, { "byteSize":3, "rawData":"xxx", "type":"STRING" }, { "byteSize":1, "rawData":"A", "type":"STRING" }, { "byteSize":1, "rawData":"0", "type":"DOUBLE" }, { "byteSize":8, "rawData":1426740491000, "subType":"DATETIME", "type":"DATE" }, { "byteSize":8, "rawData":1426780800000, "subType":"DATETIME", "type":"DATE" }, { "byteSize":1, "rawData":"E", "type":"STRING" }, { "byteSize":7, "rawData":"5201009", "type":"STRING" }, { "byteSize":6, "rawData":"520101", "type":"DOUBLE" }, { "byteSize":0, "type":"STRING" }, { "byteSize":3, "rawData":"xxx", "type":"STRING" }, { "byteSize":12, "rawData":"520181000400", "type":"STRING" }, { "byteSize":0, "type":"STRING" }, { "byteSize":1, "rawData":"0", "type":"DOUBLE" }, { "byteSize":0, "subType":"DATETIME", "type":"DATE" }, { "byteSize":1, "rawData":"0", "type":"DOUBLE" }, { "byteSize":0, "type":"STRING" }, { "byteSize":0, "type":"STRING" }, { "byteSize":78, "rawData":"xxx", "type":"STRING" }, { "byteSize":1, "rawData":"0", "type":"STRING" }, { "byteSize":8, "rawData":1426694400000, "subType":"DATETIME", "type":"DATE" }, { "byteSize":0, "type":"STRING" }, { "byteSize":0, "subType":"DATETIME", "type":"DATE" }, { "byteSize":0, "type":"STRING" }, { "byteSize":0, "type":"STRING" }, { "byteSize":0, "type":"STRING" }, { "byteSize":0, "type":"DOUBLE" }, { "byteSize":1, "rawData":"0", "type":"STRING" }, { "byteSize":0, "type":"STRING" }, { "byteSize":12, "rawData":"520181000400", "type":"STRING" }, { "byteSize":1, "rawData":"1", "type":"DOUBLE" }, { "byteSize":1, "rawData":"0", "type":"DOUBLE" }, { "byteSize":8, "rawData":1426740491000, "subType":"DATETIME", "type":"DATE" }, { "byteSize":2, "rawData":"xxx", "type":"STRING" }, { "byteSize":0, "type":"STRING" }, { "byteSize":28, "rawData":"YxIC7zeM6xG+eBdzxV4oRDxHses=", "type":"STRING" } ], "size":40 }
-
必选:否
-
默认值:text
-
3.3 类型转换
目前 HdfsWriter 支持大部分 Hive 类型,请注意检查你的类型。
下面列出 HdfsWriter 针对 Hive 数据类型转换列表:
DataX 内部类型 | HIVE 数据类型 |
---|---|
Long | TINYINT,SMALLINT,INT,BIGINT |
Double | FLOAT,DOUBLE |
String | STRING,VARCHAR,CHAR |
Boolean | BOOLEAN |
Date | DATE,TIMESTAMP |
4 配置步骤
5 约束限制
略
6 FAQ
略