Centralized Logging

Centralized Logging

JAN 3RD, 2012

Logs are a critical part of any system, they give you insight into what a system is doing as well what happened. Virtually every process running on a system generates logs in some form or another. Usually, these logs are written to files on local disks. When your system grows to multiple hosts, managing the logs and accessing them can get complicated. Searching for a particular error across hundreds of log files on hundreds of servers is difficult without good tools. A common approach to this problem is to setup a centralized logging solution so that multiple logs can be aggregated in a central location.

So what are your options?

File Replication

A simple approach is to setup file replication of your logs to a central server on a cron schedule. Usually rsync and cron are used since they are simple and straightforward to setup. This solution can work for a while but it doesn’t provide timely access to log data. It also doesn’t aggregate the logs and only co-locates them.

Syslog

Another option that you probably already have installed is syslog. Most people use rsyslog or syslog-ng which are two syslog implementations. These daemons allow processes to send log messages to them and the syslog configuration determines how the are stored. In a centralized logging setup, a central syslog daemon is setup on your network and the client logging dameons are setup to forward messages to the central daemon. A good write-up of this kind of setup can be found at: Centralized Logging Use Rsyslog

Syslog is great because just about everything uses it and you likely already have it installed on your system. With a central syslog server, you will likely need to figure out how to scale the server and make it highly-available.

Distributed Log Collectors

A new class of solutions that have come about have been designed for high-volume and high-throughput log and event collection. Most of these solutions are more general purpose event streaming and processing systems and logging is just one use case that can be solved using them. All of these have their specific features and differences but their architectures are fairly similar. They generally consist of logging clients and/or agents on each specific host. The agents forward logs to a cluster of collectors which in turn forward the messages to a scalable storage tier. The idea is that the collection tier is horizontally scalable to grow with the increase number of logging hosts and messages. Similarly, the storage tier is also intended to scale horizontally to grow with increased volume. This is gross simplification of all of these tools but they are a step beyond traditional syslog options.

  • Scribe - Scribe is scalable and reliable log aggregation server used and released by Facebook as open source. Scribe is written in C++ and uses Thrift for the protocol encoding. Since it uses thrift, virtually any language can work with it.

  • Flume - Flume is an Apache project for collecting, aggregating, and moving large amounts of log data. It stores all this data on HDFS.

  • logstash - logstash lets you ship, parse and index logs from any source. It works by defining inputs (files, syslog, etc.), filters (grep, split, multiline, etc..) and outputs (elasticsearch, mongodb, etc..). It also provides a UI for accessing and searching your logs. See Getting Started

  • Chukwa - Chukwa is another Apache project that collects logs onto HDFS.

  • fluentd - Fluentd is similar to logstash in that there are inputs and outputs for a large variety of sources and destination. Some of it’s design tenets are easy installation and small footprint. It doesn’t provide any storage tier itself but allows you to easily configure where your logs should be collected.

  • kafka - Kafka was developed at LinkedIn for their activity stream processing and is now an Apache incubator project. Although Kafka could be used for log collection this is not it’s primary use case. Setup requires Zookeeper to manage the cluster state.

  • Graylog2 - Graylog2 provides a UI for searching and analyzing logs. Logs are stored in MongoDB and/or elasticsearch. Graylog2 also provides the GELF logging format to overcome some issues with syslog message: 1024 byte limit and unstructured log messages. If you are logging long stacktraces, you may want to look into GELF.

  • splunk - Splunk is commercial product that has been around for several years. It provides a whole host of features for not only collecting logs but also analyzing and viewing them.

Update: I wrote a post comparing Fluentd vs Logstash.

Hosted Logging Services

There are also several hosted “logging as a service” providers as well. The benefit of them is that you only need to configure your syslog forwarders or agents and they manage the collection, storage and access to the logs. All of the infrastructure that you have to setup and maintain is handled by them, freeing you up to focus on your application. Each service provide a simple setup (usuallysyslog forwarding based), an API and a UI to support search and analysis.

I go into more detail how all of these fit together in Centralized Logging Architecture.

« Octopress Blogging SystemOptimizing MongoDB Indexes »

 Jan 3rd, 2012  architecturefluentdlogginglogstash

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
目标检测(Object Detection)是计算机视觉领域的一个核心问题,其主要任务是找出图像中所有感兴趣的目标(物体),并确定它们的类别和位置。以下是对目标检测的详细阐述: 一、基本概念 目标检测的任务是解决“在哪里?是什么?”的问题,即定位出图像中目标的位置并识别出目标的类别。由于各类物体具有不同的外观、形状和姿态,加上成像时光照、遮挡等因素的干扰,目标检测一直是计算机视觉领域最具挑战性的任务之一。 二、核心问题 目标检测涉及以下几个核心问题: 分类问题:判断图像中的目标属于哪个类别。 定位问题:确定目标在图像中的具体位置。 大小问题:目标可能具有不同的大小。 形状问题:目标可能具有不同的形状。 三、算法分类 基于深度学习的目标检测算法主要分为两大类: Two-stage算法:先进行区域生成(Region Proposal),生成有可能包含待检物体的预选框(Region Proposal),再通过卷积神经网络进行样本分类。常见的Two-stage算法包括R-CNN、Fast R-CNN、Faster R-CNN等。 One-stage算法:不用生成区域提议,直接在网络中提取特征来预测物体分类和位置。常见的One-stage算法包括YOLO系列(YOLOv1、YOLOv2、YOLOv3、YOLOv4、YOLOv5等)、SSD和RetinaNet等。 四、算法原理 以YOLO系列为例,YOLO将目标检测视为回归问题,将输入图像一次性划分为多个区域,直接在输出层预测边界框和类别概率。YOLO采用卷积网络来提取特征,使用全连接层来得到预测值。其网络结构通常包含多个卷积层和全连接层,通过卷积层提取图像特征,通过全连接层输出预测结果。 五、应用领域 目标检测技术已经广泛应用于各个领域,为人们的生活带来了极大的便利。以下是一些主要的应用领域: 安全监控:在商场、银行
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值