在编写日志文件时,将日志文件摄入HDFS的最佳方法是什么?我正在尝试配置Apache Flume,并且我正在尝试配置可以为我提供数据可靠性的源代码.我试图配置“exec”,后来也查看了“spooldir”,但flume.apache.org上的以下文档对我自己的意图产生怀疑 –
执行来源:
One of the most commonly requested features is the use case like-
“tail -F file_name” where an application writes to a log file on disk and
Flume tails the file, sending each line as an event. While this is
possible, there’s an obvious problem; what happens if the channel
fills up and Flume can’t send an event? Flume has no way of indicating
to the application writing the log file, that it needs to retain the
log or that the event hasn’t been sent for some reason. Your
application can never guarantee data has been received when using a
unidirectional asynchronous interface such as ExecSource!
假脱机目录来源:
Unlike the Exec source, “spooldir” source is reliable and will not
miss data, even if Flume is restarted or killed. In exchange for this
reliability, only immutable files must be dropped into the spooling
directory. If a file is written to after being placed into the
spooling directory, Flume will print an error to its log file and stop
processing.
有什么更好的东西我可以用来确保Flume不会错过任何事件并且还实时读取?
最佳答案 我建议使用假脱机目录源,因为它的可靠性. inmmutability要求的一种解决方法是在第二个目录中组合文件,一旦它们达到一定的大小(按字节或日志量),就将它们移动到假脱机目录.