Fluentd vs Logstash

Fluentd vs Logstash

NOV 19TH, 2013

Fluentd and Logstash are two open-source projects that focus on the problem of centralized logging. Both projects address the collection and transport aspect of centralized logging using different approaches.

This post will walk through a sample deployment to see how each differs from the other. We’ll look at the dependencies, features, deployment architecture and potential issues. The point is not to figure out which one is the best, but rather to see which one would be a better fit for your environment.

The example setup we’ll walk through is collecting web server logs on multiple hosts and archiving them to S3:

Centralized Logs With S3

This type of architecture would be suitable for archival or processing with Hive or Pig.

Another common architecture is storing logs in ElasticSearch to make them searchable with Kibanaor Graylog2. Setting that up is somewhat independent of using Logstash or Fluentd so I’ve left that out to keep things simple.

Installation Requirements

Logstash

Logstash is a JRuby based application which requires the JVM to run. Since it runs on the JVM, it can run anywhere the JVM does, which is usually means Linux, Mac OSX, and Windows. The package is shipped as single executable jar file which makes it very easy to install.

One of the downsides of depending on the JVM is that it’s memory footprint can be higher than you would want for transporting logs. Fortunately, Lumberjack can be run on individual hosts to collect and ship logs and Logstash can be run on the centralized log hosts.

Lumberjack is a Go based project with a much smaller memory and CPU footprint. Deployment is still straightforward as Logstash and is basically installing a single binary. The project provides deband rpm packages to make it easier to deploy. An SSL certificates is required to setup authentication between Lumberjack and Logstash which is a little more complicated, but a nice benefit that encrypted transport is the default.

Fluentd

Fluentd is a CRuby application which requires Ruby 1.9.2 or later. There is an open-source version, fluentd, as well as a commercial version, td-agent. Fluentd runs on Linux and Mac OSX, but does not run on Windows currently.

For larger installs, they recommend using jemalloc to avoid memory fragmentation. This is included in the deb and rpm packages but needs to be installed manually if using the open-source version.

If you use the open-source version, you’ll need to install Fluentd from source or via gem install. Since Fluentd is primarily developed by a commercial company, their deb and rpm packages are configured to send data to their hosted centralized logging platform.

Apart from Ruby, they also recommend running ntpd which you should be running anyways.

Feature Comparison

Logstash supports a number of inputs, codecs, filters and outputs. Inputs are sources of data. Codecs essentially convert an incoming format into an internal Logstash representation as well as convert back out to an output format. These are usually used if the incoming message is not just a single line of text. Filters are processing actions on events and allow you to modify events or drop events as they are processed. Finally, outputs are destinations where events can be routed.

Fluentd is similar in that it has inputs and outputs and a matching mechanism to route log data between destinations. Internally, log messages are converted to JSON which provides structure to an unstructered log message. Messages can be tagged and then routed to different outputs.

Both projects have very similar capabilities and highlighting the difference between them from a feature standpoint is difficult. They both have plugin models that allow you to extend their functionality if needed. They also have rich repository of plugins already available.

Probably the most significant difference between Fluentd and Logstash is their design focus.Logstash emphasizes flexibility and interoperability whereas Fluentd prioritizes simplicity and robustness. This does not mean that Logstash is not robust or Fluentd is not flexible, rather each has prioritized features differently.

Fluentd has fewer inputs than Logstash, out of the box, but many of the inputs and outputs have built-in support for buffering, load-balancing, timeouts and retries. These types of features are necessary for ensuring data is reliably delivered.

For example, the out_forward plugin used to transfer logs from one fluentd instance to another has many robustness options that can be configured to ensure messages are delivered reliably.

Architecture comparison

From a deployment architecture standpoint, both frameworks are very similar. With Logstash, each web server would be configured to run Lumberjack and tail the web server logs. Lumberjack would forward the logs to a server running Logstash with a Lumberjack input. The Logstash server would also have an output configured using the S3 output. Since Lumberjack requires SSL certs, the log transfers would be encrypted from the web server to the log server.

With fluentd, each web server would run fluentd and tail the web server logs and forward them to another server running fluentd as well. This server would be configured to receive logs and write them to S3 using the S3 plugin. Fluentd does not support encryption so logs would be transferred unencrypted.

Update: @repeatedly pointed me to the fluent-plugin-secure-forward that some companies are using for encrypted transport.

Improving Availability

One central log server is a single point of failure. What happens if we wanted to have more than one central log server?

Lumberjack can be configured to use multiple servers but will only send logs to one until that one fails. If that happens, previously collected log data won’t be accessible until that host is resurrected. Essentially, it supports a master with hot-standby servers.

Fluentd on the other hand can forward two copies of the logs to each server if needed, load-balance between multiple hosts or have a master with a hot-standy in case of failure. There are lot of options for not only improving availabilty but also scalability if your log volume increases substantially. Keep in mind, that if you forward multiple copies, this could create duplicate logs in S3 which might need to be handled when you analyze them.

Potential Issues

The Logstash docs suggests using Redis as the receiving output if you run Logstash (not Lumberjack) on each host. This setup is based on Redis Lists and/or Pub/Sub which can lose messages if the receiver dies after removing the message from Redis and before it has had a chance to forward it along. Additionally, Redis would need to be configured with AOF to minimize the chance of lost messages if Redis were to fail.

There is a document describing the life of an event that discusses some of the failure modes and how Logstash addresses them. One important point is that outputs are responsible for retrying events in the case of errors. There are also internal, ephemeral queues within Logstash that can hold up to 20 events. Depending on the failure, there is a window for messages to be lost.

If you absolutely cannot risk losing messages, make sure you investigate all the failure modes and whether the plugins you are using are implemented correctly to handle them.

Update: LOGSTASH-1631 is a bug that demonstrates one way messages can be lost. It appears the internal messaging is going to be replaced with a more reliable implementation in the future.

Conclusion

Both Logstash and Fluentd are viable centralized logging frameworks that can transfer logs from multiple hosts to a central location. Logstash is incredibly flexible with many input and output plugins whereas fluentd provides fewer input and output sources but provides multiple options for reliably and robust transport.

« Realtime Web Server Log MetricsOpen-Source Service Discovery »

 Nov 19th, 2013  architecturefluentdlogginglogstashrealtime

Comments

id="dsq-2" data-disqus-uid="2" allowtransparency="true" frameborder="0" scrolling="no" tabindex="0" title="Disqus" width="100%" src="http://disqus.com/embed/comments/?base=default&disqus_version=ccfdfad2&f=jasonwilder-com&t_i=http%3A%2F%2Fjasonwilder.com%2Fblog%2F2013%2F11%2F19%2Ffluentd-vs-logstash%2F&t_u=http%3A%2F%2Fjasonwilder.com%2Fblog%2F2013%2F11%2F19%2Ffluentd-vs-logstash%2F&t_d=Fluentd%20vs%20Logstash%20-%20Jason%20Wilder%27s%20Blog&t_t=Fluentd%20vs%20Logstash%20-%20Jason%20Wilder%27s%20Blog&s_o=default#2" horizontalscrolling="no" verticalscrolling="no" style="margin: 0px; padding: 0px; font-family: inherit;font-size:undefined; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; vertical-align: baseline; width: 789px; border-style: none !important; overflow: hidden !important; height: 4420px !important;">
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
LogstashFluentd都是流数据处理工具,但它们在设计理念和特点上有一些区别。 Fluentd的设计简洁,通过使用轻量级的消息传递机制,例如JSON,来传递数据。它的数据传递可靠性高,适用于分布式环境。然而,相对于LogstashFluentd的插件支持较少,其中一个常用的插件是logtail。 Logstash则具有更高的灵活性,可以用于验证原型或处理复杂数据的解析。它也有丰富的网络资料可供参考。但是,Logstash的性能和资源消耗是一些人所关注的问题。默认情况下,Logstash的堆大小为1GB,而且在处理大数据量时可能会比它的替代品慢很多。 综上所述,Fluentd适用于需要高可靠性和简洁设计的场景,而Logstash适用于灵活性要求较高、原型验证阶段或处理复杂数据的场景。选择哪个工具取决于您的具体需求和优先级。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [日志客户端(Logstash,Fluentd, Logtail)横评](https://blog.csdn.net/weixin_33788244/article/details/90682043)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] - *2* *3* [Fluentd、Filebeat、Logstash 对比分析](https://blog.csdn.net/weixin_43273856/article/details/124845122)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值