FileBeat--技术调研及使用总结_filebeat # ------------------------------ kafka ou-CSDN博客

本文链接：https://blog.csdn.net/wangchaodee/article/details/131617106

Filebeat是一个轻量级的日志收集工具，它可以监控日志文件并转发到Elasticsearch、Logstash等进行处理。文章详细介绍了Filebeat的工作原理，包括Harvester、Prospector和Libbeat的角色，以及如何保持文件状态。此外，还讨论了Filebeat的配置选项，如采集频率、缓存大小和多行日志处理。对于Kubernetes环境，Filebeat可以通过DaemonSet部署以收集Pod日志。文章还提到了监控接口，允许检查Filebeat的状态和统计信息。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

技术调研

一、简介

Filebeat是用于转发和集中日志数据的轻量级传送程序。监视指定的日志文件或目录，收集日志信息，并将它们转发到ES或Logstash进行索引。

采集器对比

采集器对比	优势	劣势
logstash	灵活有很多插件	性能及资源消耗，不支持缓存
filebeat	轻量级可靠性，支持json日志输入	输出方式有限
fluentd	json日志输出，插件多	大的节点下性能受限， fluent bit 用于小的设备
logagent	支持日志格式多样、有本地缓存
logtail	阿里云安装
syslog	快

二、技术原理

filebeat-structure
Filebeat的工作方式如下：

启动Filebeat时，它将启动一个或多个输入，这些输入将在为日志数据指定的位置中查找。
harvester：
	负责读取单个文件的内容。
	如果文件在读取时被删除或重命名，Filebeat将继续读取文件
prospector(register)：
	prospector负责管理harvester并找到所有要读取的文件来源。
	如果输入类型为日志，则查找器将查找路径匹配的所有文件，并为每个文件启动一个harvester。
libbeat:
	会汇总事件并将汇总的数据发送到您为Filebeat配置的输出

Filebeat如何保持文件的状态?
	Filebeat 保存每个文件的状态并经常将状态刷新到磁盘上的注册文件中。
	该状态用于记住harvester正在读取的最后偏移量，并确保发送所有日志行。
	如果输出（例如Elasticsearch或Logstash）无法访问，Filebeat会跟踪最后发送的行，并在输出再次可用时继续读取文件
	在Filebeat运行时，每个prospector内存中也会保存的文件状态信息，当重新启动Filebeat时，将使用注册文件的数据来重建文件状态，Filebeat将每个harvester在从保存的最后偏移量继续读取

三、支持的输入输出类型

Filebeat的处理流程
输入Input
处理Filter
输出Output

Filebeat输入配置支持的输入类型有：
Log
Stdin
Redis
UDP
Docker
TCP
Syslog

Filebeat支持的输出类型：
ElasticSearch
LogStash
Kafka
Redis
File
Console
Cloud

四、使用配置

input:
	scan_frequency   采集频率  默认10s 
	harvester_buffer_size  单个文件采集器harvester每次使用缓存区的大小，也就是读取文件的大小；默认为16KB；提高吞吐的调优项
	tags 给每条日志加标签，便于过滤
	
	ignore_older：可以指定Filebeat忽略指定时间段以外修改的日志内容，比如2h（两个小时）或者5m(5分钟)。
	close_older：如果一个文件在某个时间段内没有发生过更新，则关闭监控的文件handle。默认1h。
	force_close_files：Filebeat会在没有到达close_older之前一直保持文件的handle，如果在这个时间窗内删除文件会有问题，所以可以把force_close_files设置为true，只要filebeat检测到文件名字发生变化，就会关掉这个handle。

	multiline：适用于日志中每一条日志占据多行的情况，比如各种语言的报错信息调用栈。这个配置的下面包含如下配置：
	pattern：多行日志开始的那一行匹配的pattern
	negate：是否需要对pattern条件转置使用，不翻转设为true，反转设置为false。
	match：匹配pattern后，与前面（before）还是后面（after）的内容合并为一条日志
	max_lines：合并的最多行数（包含匹配pattern的那一行），默认为500行。
	timeout：到了timeout之后，即使没有匹配一个新的pattern（发生一个新的事件），也把已经匹配的日志事件发送出去


tail_files：如果设置为true，Filebeat从文件尾开始监控文件新增内容，把新增的每一行文件作为一个事件依次发送，而不是从文件开始处重新发送所有内容。
backoff：Filebeat检测到某个文件到了EOF之后，每次等待多久再去检测文件是否有更新，默认为1s。
max_backoff：Filebeat检测到某个文件到了EOF之后，等待检测文件更新的最大时间，默认是10秒。

spool_size:spooler的大小，spooler中的事件数量超过这个阈值的时候会清空发送出去（不论是否到达超时时间），默认1MB。
idle_timeout:spooler的超时时间，如果到了超时时间，spooler也会清空发送出去（不论是否到达容量的阈值），默认1s。



通用 
queue 存储事件的内部缓存队列，当队列中事件达到最大值，input将不能想queue中写入数据，直到output将数据从队列拿出去消费。
	mem.events  内部缓存队列queue最大事件数，默认为4096
	flush.min_events queue中的最小事件，达到后将被发送给output，默认为2048 
	flush.timeout 定时刷新queue中的事件到output中，默认为1s 
	
	备注：调整mem.events、flush.min_events、flush.timeout，增加内存，牺牲一些实时性，可提高吞吐。
	queue.mem.events = 2 * workers * batch size 
	queue.mem.flush.min_events = batch size





output.kafka: 
	bulk_max_size  单次kafka request请求批量的消息数量，默认2048
	bulk_flush_frequency 批量发送kafka request需要等待的时间，默认0不等待，与linger.ms功能相同
  
output.logstash： 
  workers: 2 
  pipelining: 2   ## 处理新批次数据时，异步发送的批次量
  bulk_max_size: 2048
  timeout: 30s

处理器

processors:
	- rename:  
	    fields:  
	      - from: "a"  
	        to: "b"
	- drop_event:  
	    when:  
	      equals:  
	        tags: "log"
	 - drop_event:  
	     when:  
	       not
		     has_fields: ['sid']              
   timestamp:
      field: start_time
      timezone: Asia/Shanghai
      layouts:
        - '2006-01-02T15:04:05Z'
        - '2006-01-02T15:04:05.999Z'
        - '2006-01-02T15:04:05.999-07:00'
      test:
        - '2019-06-22T16:33:51Z'
        - '2019-11-18T04:59:51.123Z'
        - '2020-08-03T07:10:20.123456+02:00'
   drop_fields:
      fields: [start_time]

参考文档

Filebeat的基本使用

Filebeat采集原理与监控指标梳理

使用总结

采集普通日志

 filebeat.inputs:
   - type: log
     enabled: true
     paths:
       - /var/log/messages
     fields:
       tags: sysLog
       fields_under_root
     ignore_older: 1h
     tail_files: true
     include_lines: ['sometext']
     exclude_lines: ['^DBG']
     harvester_buffer_size: 16384 (16K)
     scan_frequency: 10s
     max_bytes: 10MB (10485760)
     json: 
       keys_under_root: true
       add_error_key: true
       message_key: log
    multiline:    
    backoff:  1s   # 达到EOF , 间隔多久再次检查文件
    max_backoff: 10s  # 间隔检查 时间会增大，   最大值
    harvester_limit：0  ## for one input , the num of harvester 
    tags: ["json" ]   ## 
    processors: xxx
    pipeline： xxx
    publisher_pipeline.disable_host： false  ## 是否禁止 host.name 属性

采集k8s的pod日志

Kubernetes 日志目标并不固定

https://www.elastic.co/cn/blog/kubernetes-observability-tutorial-k8s-log-monitoring-and-analysis-elastic-stack

Kubernetes 通过向可用主机中部署容器执行编排。因此，这种方式自然会令应用程序组件分布到不同主机，根本无法事先获悉组件的目标位置。

Kubernetes pod 内运行的容器会生成日志（以 stdout 或 stderr）。作为以 pod id 命名的文件，这些日志被写到 kubelet 已知的位置。为了将日志与生成日志的组件或 pod 相关联，用户需要找到在当前主机中运行的组件 pod 及其 id 分别为何。

Filebeat 恰好是非固定式目标的完美捕手

要收集 pod 日志，我们只需要将 Filebeat 作为 DaemonSet，在 Kubernetes 集群中运行即可。Filebeat 可以配置为与本地 kubelet API 通信，获取在当前主机运行的 pod 列表，并收集这些 pod 生成的日志。利用所有相关 Kubernetes 元数据（例如 pod id、容器名、容器标签和注释等多种信息）对这些日志进行注释。

Filebeat 会使用这些注释来发现哪些组件正在 pod 中运行，然后可以决定对其正在处理的日志应用哪种日志记录模块。完全无需手动操作！使用 Filebeat 采集 Kubernetes 日志轻而易举

日志目录

K8S中的日志目录有以下三种：

/var/lib/docker/containers/
/var/log/containers/
/var/log/pods/

如果目录有挂载映射则要了解挂载目录，如 /data/docker/containers

解决读取k8s权限问题

filebeat 有对应的脚本创建serviceAccount 并进行角色权限绑定

apiVersion: v1
kind: ServiceAccount
metadata:
  name: filebeat
  namespace: elk
  labels:
    k8s-app: filebeat
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: filebeat
  labels:
    k8s-app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - namespaces
  - pods
  - nodes
  verbs:
  - get
  - watch
  - list
- apiGroups: ["apps"]
  resources:
    - replicasets
  verbs: ["get", "list", "watch"]
- apiGroups: ["batch"]
  resources:
    - jobs
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: filebeat
  namespace: elk
  labels:
    k8s-app: filebeat
rules:
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: filebeat-kubeadm-config
  namespace: elk
  labels:
    k8s-app: filebeat
rules:
  - apiGroups: [""]
    resources:
      - configmaps
    resourceNames:
      - kubeadm-config
    verbs: ["get"]
---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: filebeat
subjects:
- kind: ServiceAccount
  name: filebeat
  namespace: elk
roleRef:
  kind: ClusterRole
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: filebeat
  namespace: elk
subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: elk
roleRef:
  kind: Role
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: filebeat-kubeadm-config
  namespace: elk
subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: elk
roleRef:
  kind: Role
  name: filebeat-kubeadm-config
  apiGroup: rbac.authorization.k8s.io

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config-job
  namespace: elk
  labels:
    app: filebeat
data:
  filebeat.yml: |-
    filebeat.inputs:
    - type: container
      paths:
        - /var/log/containers/test-*.log
      fields:
        tags: jobLog
    processors:
      - add_kubernetes_metadata:
          default_indexers.enabled: false
          default_matchers.enabled: true
          host: ${NODE_NAME}
          matchers:
          - logs_path:
              logs_path: "/var/log/containers/"
      - rename:
          fields:
            - from: "kubernetes.labels.sid"
              to: "sid"
            - from: "kubernetes.pod.name"
              to: "pod"
          ignore_missing: true
      - drop_fields:
          fields: ["kubernetes",  "container", "log", "input", "beat", "offset"]
          ignore_missing: true
    output.logstash:
      hosts: ["xxxx:5044"]
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    k8s-app: filebeat-job
  name: filebeat-job
  namespace: elk
spec:
  selector:
    matchLabels:
      k8s-app: filebeat-job
  template:
    metadata:
      labels:
        k8s-app: filebeat-job
    spec:
      serviceAccountName: filebeat
      terminationGracePeriodSeconds: 30
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      imagePullSecrets:
        - name: registry-pull-secret
      containers:
        - image: xxxx/docker.io/elastic/filebeat:7.17.9
          name: filebeat
          volumeMounts:
            - name: filebeat-config-job
              mountPath: /etc/filebeat.yml
              subPath: filebeat.yml
              readOnly: true
            - name: job-logs
              mountPath: /var/log/containers
              readOnly: true
            - name: data
              mountPath: /usr/share/filebeat/data
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: dockercontainers
              mountPath: /data/docker/containers
              readOnly: true
          args: [
            "-c", "/etc/filebeat.yml",
            "-e",
          ]
          resources:
            requests:
              cpu: 100m
              memory: 100Mi
            limits:
              cpu: 500m
              memory: 500Mi
          securityContext:
            runAsUser: 0
          env:
            - name: TZ
              value: "CST-8"
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
      volumes:
        - name: filebeat-config-job
          configMap:
            name: filebeat-config-job
        - name: job-logs
          hostPath:
            path: /var/log/containers
            type: Directory
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: varlog
          hostPath:
            path: /var/log
        - name: dockercontainers
          hostPath:
            path: /data/docker/containers
        - name: data
          hostPath:
            path: /var/log/filebeat-data
            type: DirectoryOrCreate
      nodeSelector:
        kubernetes.io/hostname: jobserver-101-128
---

filebeat 的监控接口

#http.enabled: true  
#http.port: 5067  
#monitoring.enabled: false

查看filebeat 状态信息 
curl localhost:5067?pretty

查看filebeat 监控信息 
curl localhost:5067/stats?pretty