flume获取mysql日志到hdfs_实时 – 在写入时使用Flume将日志文件提取到HDFS中

在编写日志文件时,将日志文件摄入HDFS的最佳方法是什么?我正在尝试配置Apache Flume,并且我正在尝试配置可以为我提供数据可靠性的源代码.我试图配置“exec”,后来也查看了“spooldir”,但flume.apache.org上的以下文档对我自己的意图产生怀疑 –

执行来源:

One of the most commonly requested features is the use case like-

“tail -F file_name” where an application writes to a log file on disk and

Flume tails the file, sending each line as an event. While this is

possible, there’s an obvious problem; what happens if the channel

fills up and Flume can’t send an event? Flume has no way of indicating

to the application writing the log file, that it needs to retain the

log or that the event hasn’t been sent for some reason. Your

application can never guarantee data has been received when using a

unidirectional asynchronous interface such as ExecSource!

假脱机目录来源:

Unlike the Exec source, “spooldir” source is reliable and will not

miss data, even if Flume is restarted or killed. In exchange for this

reliability, only immutable files must be dropped into the spooling

directory. If a file is written to after being placed into the

spooling directory, Flume will print an error to its log file and stop

processing.

有什么更好的东西我可以用来确保Flume不会错过任何事件并且还实时读取?

最佳答案 我建议使用假脱机目录源,因为它的可靠性. inmmutability要求的一种解决方法是在第二个目录中组合文件,一旦它们达到一定的大小(按字节或日志量),就将它们移动到假脱机目录.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值