1在 flume的配置文件下新建配置文件 flume-conf.properties.kafkaUdp
a1.sources = r1 r2
a1.channels = c1
a1.sinks = k1
a1.sources.r1.type = syslogudp
a1.sources.r1.channels = c1
a1.sources.r1.host = 192.168.1.219
a1.sources.r1.port = 10001
a1.sources.r2.type = syslogtcp
a1.sources.r2.port = 5140
a1.sources.r2.host = 192.168.1.219
a1.sources.r2.channels = c1
a1.sources.r2.batchSize = 1
a1.sources.r2.eventSize = 1
a1.sources.r1.interceptors= il
a1.sources.r1.interceptors.il.type= com.ken.SimpleInterceptor$Builder
a1.sources.r1.interceptors.il.forwardHeads= all,all
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sinks.k1.channel = c1
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = kentest
a1.sinks.k1.kafka.bootstrap.servers = 192.168.1.211:9092,192.168.1.212:9092,192.168.1.213:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 0
a1.sinks.k1.kafka.producer.compression.type = snappy
a1.sinks.k1.kafka.producer.zk.connect=192.168.1.211:2181,192.168.1.212:2181,192.168.1.213:2181
a1.sinks.k1.kafka.producer.serializer.class=kafka.serializer.StringEncoder
a1.sinks.k1.kafka.producer.partitioner.class=com.ken.SimplePartitioner
其中前三行表示基础定义,source表示从哪里获取数据源,channel表示数据通过的管道,sink表示最终你的数据要发送到里去
2要转发到kafka的不同分区,核心就是要重写调用的partitioner类,当你没有自定义分区处理的时候回调用kafka下的DefaultPartitioner类,所以我们可以新建一个java项目,继承DefaultPartitioner并覆写
public class SimplePartitioner extends DefaultPartitioner {
private static final Logger logger = LoggerFactory.getLogger(SimplePartitioner.class);
@Override
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
if (keyBytes != null)
logger.error(new String(keyBytes));
if (key != null)
logger.error(key.toString());
if (valueBytes != null)
logger.error(new String(valueBytes));
if (value != null)
logger.error(value.toString());
logger.error("进入了自定义分区");
return super.partition(topic, key, keyBytes, value, valueBytes, cluster);
}
}
然后我们可以在partition方法下自定义分区的标准,可以通过拦截器传入的key及其值去做分区,也可以直接根据原始数据value去做处理。
要强调的是在配置文件中覆写的时候要写
a1.sinks.k1.kafka.producer.partitioner.class=com.ken.SimplePartitioner
而不是网上资料的
a1.sinks.k1.kafka.partitioner.class=com.ken.SimplePartitioner
使用后者的话会发现自定义的分区处理器没有起作用。
自已新建的java项目的所引用的jar包的版本要跟安装的flume的lib下的jar包版本一致,所以建议直接拷贝Lib下的jar包进行开发