flume avro java 发送数据,Flume - 有没有办法将avro事件(header& body)存储到hdfs中?...

New to flume...

I'm receiving avro events and storing them into HDFS.

I understand that by default only the body of the event is stored in HDFS. I also know there is an avro_event serializer. But I do not know what this serializer is actually doing? How does it effect the final output of the sink?

Also, I can't figure out how to just dump the event into HDFS preserving its header information. Do I need to write my own serializer?

解决方案

As it turns out the serializer avro_event does store both header & body in the file.

Here is how I set up my sink:

a1.sinks.i1.type=hdfs

a1.sinks.i1.hdfs.path=hdfs://localhost:8020/user/my-name

a1.sinks.i1.hdfs.rollInterval=0

a1.sinks.i1.hdfs.rollSize=1024

a1.sinks.i1.hdfs.rollCount=0

a1.sinks.i1.serializer=avro_event

a1.sinks.i1.hdfs.fileType=DataStream

I sent the events using the packaged agent avro-client, injected headers by using the -R headerFile option.

content of headerFile:

machine=localhost

user=myName

Finally tested the results using a simple java app I stole from this posting:

final FileSystem fs = FileSystem.get(getConf());

final Path path = new Path(fs.getHomeDirectory(), "FlumeData.1446072877536");

printWriter.write(path + "-exists: " + fs.exists(path));

final SeekableInput input = new FsInput(path, getConf());

final DatumReader reader = new GenericDatumReader();

final FileReader fileReader = DataFileReader.openReader(input, reader);

for (final GenericRecord datum : fileReader) {

printWriter.write("value = " + datum);

}

fileReader.close();

And sure enough I see my headers for each record, here is one line:

value = {"headers": {"machine": "localhost", "user": "myName"}, "body": {"bytes": "set -x"}}

There is one other serializer that also emits the headers and that is the header_and_text serializer The resulting file is a human-readable text file. Here is a sample line:

{machine=localhost, user=userName} set -x

Finally in the Apache Flume - Distributed Log Collection for Hadoop, there is a mention of the header_and_text serialzer but I couldn't get that to work.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值