Logstash 原理分析/配置文件详解 时间 日期 时区 ip 反斜杠 grok在线地址 类型转换

  1. 基本配置

    Logstash 本身不能建立集群,Filebeat 连接 Logstash 后会自动轮询 Logstash 服务器是否可用,把数据发送到可用的 Logstash 服务器上面去

    Logstash 配置,监听5044端口,接收 Filebeat 发送过来的日志,然后利用 grok 对日志过滤,根据不同的日志设置不同的 type,并将日志存储到 Elasticsearch 集群上面

    项目日志跟nginx日志配置在一起,elasticsearch 配置的索引 index 里面不能大写,不然会出现奇怪的bug

    input {
      beats {
        port => "5044"
      }
    }
     
    filter {
     
      date {
          match => ["@timestamp", "yyyy-MM-dd HH:mm:ss"]
      }
      grok {
        match => {
          "source" => "(?<type>([A-Za-z]*-[A-Za-z]*-[A-Za-z]*)|([A-Za-z]*-[A-Za-z]*)|access|error)"
        }
      }
       mutate {
      	convert => [ "upstream_response_time", "float" ]
        }
     
    }
     
    output {
      # 针对不同的项目日志需要写不同的判断项
      if [type] == "MS-System-OTA"{
        elasticsearch {
          hosts => ["172.18.1.152:9200","172.18.1.153:9200","172.18.1.154:9200"]
          index => "logstash-ms-system-ota-%{+YYYY.MM.dd}"
        }
      }else if [type] == "access" or [type] == "error"{
        elasticsearch {
          hosts => ["172.18.1.152:9200","172.18.1.153:9200","172.18.1.154:9200"]
          index => "logstash-nginx-%{+YYYY.MM.dd}"
        }
      }else{
        elasticsearch {
          hosts => ["172.18.1.152:9200","172.18.1.153:9200","172.18.1.154:9200"]
        }
      }
      stdout {
        codec => rubydebug
      }
    }
    
  2. logstash 的 grok-patterns

    Grok 是 Logstash 最重要的插件之一,我们利用 Grok 对日志文件进行分析,取出我们需要的数据

    USERNAME [a-zA-Z0-9._-]+
    USER %{USERNAME}
    INT (?:[+-]?(?:[0-9]+))
    BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
    NUMBER (?:%{BASE10NUM})
    BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
    BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b
    
    POSINT \b(?:[1-9][0-9]*)\b
    NONNEGINT \b(?:[0-9]+)\b
    WORD \b\w+\b
    NOTSPACE \S+
    SPACE \s*
    DATA .*?
    GREEDYDATA .*
    QUOTEDSTRING (?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))
    UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
    
    # Networking
    MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
    CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
    WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
    COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
    IPV6 ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?
    IPV4 (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])
    IP (?:%{IPV6}|%{IPV4})
    HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
    HOST %{HOSTNAME}
    IPORHOST (?:%{HOSTNAME}|%{IP})
    HOSTPORT %{IPORHOST}:%{POSINT}
    
    # paths
    PATH (?:%{UNIXPATH}|%{WINPATH})
    UNIXPATH (?>/(?>[\w_%!$@:.,-]+|\\.)*)+
    TTY (?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))
    WINPATH (?>[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
    URIPROTO [A-Za-z]+(\+[A-Za-z+]+)?
    URIHOST %{IPORHOST}(?::%{POSINT:port})?
    # uripath comes loosely from RFC1738, but mostly from what Firefox
    # doesn't turn into %XX
    URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+
    #URIPARAM \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)?
    URIPARAM \?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]]*
    URIPATHPARAM %{URIPATH}(?:%{URIPARAM})?
    URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?
    
    # Months: January, Feb, 3, 03, 12, December
    MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
    MONTHNUM (?:0?[1-9]|1[0-2])
    MONTHNUM2 (?:0[1-9]|1[0-2])
    MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
    
    # Days: Monday, Tue, Thu, etc...
    DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
    
    # Years?
    YEAR (?>\d\d){1,2}
    HOUR (?:2[0123]|[01]?[0-9])
    MINUTE (?:[0-5][0-9])
    # '60' is a leap second in most time standards and thus is valid.
    SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
    TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
    # datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
    DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
    DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
    ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
    ISO8601_SECOND (?:%{SECOND}|60)
    TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
    DATE %{DATE_US}|%{DATE_EU}
    DATESTAMP %{DATE}[- ]%{TIME}
    TZ (?:[PMCE][SD]T|UTC)
    DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
    DATESTAMP_RFC2822 %{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE}
    DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
    DATESTAMP_EVENTLOG %{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND}
    
    # Syslog Dates: Month Day HH:MM:SS
    SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
    PROG (?:[\w._/%-]+)
    SYSLOGPROG %{PROG:program}(?:\[%{POSINT:pid}\])?
    SYSLOGHOST %{IPORHOST}
    SYSLOGFACILITY <%{NONNEGINT:facility}.%{NONNEGINT:priority}>
    HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT}
    
    # Shortcuts
    QS %{QUOTEDSTRING}
    
    # Log formats
    SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
    COMMONAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
    COMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}
    
    # Log Levels
    LOGLEVEL ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
    
  3. 针对几个不同的message写的几个grok demo 读取日志文件
    1. 对于 nginx 的 error.log 的 message 的处理
    # message:   2018/09/18 16:33:51 [error] 15003#0: *545757 no live upstreams while connecting to upstream, client: 39.108.4.83, server: dev-springboot-admin.tvflnet.com, request: "POST /instances HTTP/1.1", upstream: "http://localhost/instances", host: "dev-springboot-admin.tvflnet.com"
    
    filter {
      #定义数据的格式
      grok {
        match => { "message" => "%{DATA:timestamp}\ \[%{DATA:level}\] %{DATA:nginxmessage}\, client: %{DATA:client}\, server: %{DATA:server}\, request: "%{DATA:request}\", upstream: "%{DATA:upstream}\", host: "%{DATA:host}\""}
      }
    }
    
    1. 对于 nginx 的 error.log 的 message 的处理
    # message:    2018/04/19 20:40:27 [error] 4222#0: *53138 open() "/data/local/project/WebSites/AppOTA/theme/js/frame/layer/skin/default/icon.png" failed (2: No such file or directory), client: 218.17.216.171, server: dev-app-ota.tvflnet.com, request: "GET /theme/js/frame/layer/skin/default/icon.png HTTP/1.1", host: "dev-app-ota.tvflnet.com", referrer: "http://dev-app-ota.tvflnet.com/theme/js/frame/layer/skin/layer.css"
    
    filter {
      #定义数据的格式
      grok {
        match => { "message" => "%{DATA:timestamp}\ \[%{DATA:level}\] %{DATA:nginxmessage}\, client: %{DATA:client}\, server: %{DATA:server}\, request: \"%{DATA:request}\", host: \"%{DATA:host}\", referrer: \"%{DATA:referrer}\""}
      }
    }
    
    1. 对于 lua 的 error.log 的 message 的处理
    # message:    2018/09/05 18:02:19 [error] 2325#0: *17083157 [lua] PushFinish.lua:38: end push statistics, client: 119.137.53.205, server: dev-system-ota-statistics.tvflnet.com, request: "POST /upgrade/push HTTP/1.1", host: "dev-system-ota-statistics.tvflnet.com"
    
    filter {
      #定义数据的格式
      grok {
        match => { "message" => "%{DATA:timestamp}\ \[%{DATA:level}\] %{DATA:luamessage}\, client: %{DATA:client}\, server: %{DATA:server}\, request: \"%{DATA:request}\", host: \"%{DATA:host}\""}
      }
    }
    
    1. 对于 电视端接口日志的 message 的处理
    # message:    traceid:[Thread:943-sn:sn-mac:mac] 2018-09-18 11:07:03.525 DEBUG com.flnet.utils.web.log.DogLogAspect 55 - Params-参数(JSON):{"backStr":"{\"groupid\":5}","build":201808310938,"ip":"119.147.146.189","mac":"mac","modelCode":"SHARP_0_50#SHARP#IQIYI#LCD_50SUINFCA_H","sn":"sn","version":"modelCode"}
    
    filter {
      #定义数据的格式
      grok {
        match => { "message" => "traceid:%{DATA:traceid}\[Thread:%{DATA:thread}\-sn:%{DATA:sn}\-mac:%{DATA:mac}\]\ %{TIMESTAMP_ISO8601:timestamp}\ %{DATA:level}\ %{GREEDYDATA:message}"}
      }
    }
    
    1. 对于 项目日志的 message 的处理
    # message:    traceid:[] 2018-09-14 02:14:48.209 WARN  de.codecentric.boot.admin.client.registration.ApplicationRegistrator 115 - Failed to register application as Application(name=ta-system-ota, managementUrl=http://TV-DEV-API01:10005/actuator, healthUrl=http://TV-DEV-API01:10005/actuator/health, serviceUrl=http://TV-DEV-API01:10005/, metadata={startup=2018-09-10T10:20:41.812+08:00}) at spring-boot-admin ([https://dev-springboot-admin.tvflnet.com/instances]): I/O error on POST request for "https://dev-springboot-admin.tvflnet.com/instances": connect timed out; nested exception is java.net.SocketTimeoutException: connect timed out. Further attempts are logged on DEBUG level
    
    filter {
      #定义数据的格式
      grok {
        match => { "message" => "traceid:\[%{DATA:traceid}\] %{TIMESTAMP_ISO8601:timestamp}\ %{DATA:level}\ %{GREEDYDATA:message}"}
      }
    }
    
    1. nginx 配置的日志
    # message:     {"@timestamp":"2018-09-20T02:47:00+08:00", "http_host":"":"system-ota-tvapi.tvflnet.com", "", "status":"200", "method":"HEAD / HTTP/1.1", "request_body":"-", "url":"/:"/index.html", "", "host":"":"172.18.156.12", "", "clientip":"":"100.116.222.149", "", "size":"0", "responsetime":"0.000", "upstreamtime":"-", "upstreamhost":"-", "xff":"":"140.205.205.25", "", "referer":"-", "agent":"Go-http-client/1.1"}
    filter {
      #定义数据的格式
      grok {
        match => { "message" =>  "{\"@timestamp\":\"%{TIMESTAMP_ISO8601:timestamp}\", \"http_host\":\"%{DATA:http_host}\", \"status\":\"%{DATA:status}\", \"method\":\"%{DATA:method}\", \"request_body\":\"%{DATA:request_body}\", \"url\":\"%{DATA:url}\", \"host\":\"%{DATA:host}\", \"clientip\":\"%{DATA:clientip}\", \"size\":\"%{DATA:size}\", \"responsetime\":\"%{DATA:responsetime}\", \"upstreamtime\":\"%{DATA:upstreamtime}\", \"upstreamhost\":\"%{DATA:upstreamhost}\", \"xff\":\"%{DATA:xff}\", \"referer\":\"%{DATA:referer}\", \"agent\":\"%{DATA:agent}\"}"
      }
    }
    

    对于多项 不同的匹配配置多个grok
    Logstash 启动命令:nohup ./bin/logstash -f ./config/conf.d/logstash-simple.conf >/dev/null 2>&1 &

  4. 对于日期时间的处理
filter {
  date {
    # 有多个项的话能匹配多个不同的格式
    match => [ "logdate", "MMM dd yyyy HH:mm:ss","ISO8601" ]
    target => "fieldName1"
    timezone => "Asia/Shanghai"
  }
}

date插件特有的选项如下:

  • local

    • string类型
    • 没有默认值
      用于指定本地方言,比如设置为en,en-US等.主要用于解析非数字的月,和天,比如Monday,May等.如果是时间日期都是数字的话,不用关心这个值.
  • match

    • array类型
    • 默认为[]
      用于将指定的字段按照指定的格式解析.比如:
    match => ["createtime", "yyyyMMdd","yyyy-MM-dd"]
    

    第一个值为字段名,其余值为解析的格式,如果有多个可能的格式,可以设置多个.

  • tag_on_failure

    • array类型
    • 默认为["_dateparsefailure"]
      添加一个值到tags字段中,如果日期解析失败.
  • target

    • string类型
    • 默认为@timestamp
    • 用于指定转化后的日期保存的字段名
  • timezone

    • string类型
    • 没有默认值
      用于为要被解析的时间指定一个时区,值为时区的canonical ID,可以在这里看到可以使用的值.
      一般不用设置,因为会根据当前系统的时区获取这个值.
      这里设置的时区并不是logstash最终储存的时间的时区,logstash最终储存的时间为 UTC标准时间.
      比如这里设置时间为20171120:

    如果时区为Asia/Shanghai那么转化后的时间为2017-11-19T16:00:00.000Z;
    如果时区为Europe/Vienna那么转化后的时间为2017-11-19T23:00:00.000Z;
    处理时区问题

    ruby { 
    	code => "event.set('timestamp', event.get('@timestamp').time.localtime + 8*60*60)" 
    }
    ruby {
    	code => "event.set('@timestamp',event.get('timestamp'))"
    }
    
  • 转义字符(其他字符)的转换
    mutate {
       gsub => [
         "request_body", "\\x22", '"'
       ]
       gsub => [
         "request_body", "\\x0A", "\n"
       ]
     }
    
  • JSON 处理
    json {
       source => "message"
     }
    
  • 删除某些项
    mutate {
      remove_field => [ "message" ]
    }
    
  • 格式转换
    mutate {
      convert => [ "upstream_response_time", "float" ]
    }
    
    Elasticsearch 字段数据类型

    Elasticsearch 可以支持单个document中含有多个不同的数据类型。

  • 核心数据类型(Core datatypes)
    • 字符型(String datatype):string
    • 数字型(Numeric datatypes):long:64位存储 , integer:32位存储 , short:16位存储 , byte:8位存储 , double:64位双精度存储 , float:32位单精度存储
    • 日期型(Date datatype):date
    • 布尔型(Boolean datatype):boolean
    • 二进制型(Binary datatype):binary
  • 复杂数据类型(Complex datatypes)
    • 数组类型(Array datatype):数组类型不需要专门指定数组元素的type,例如:
      • 字符型数组: [ “one”, “two” ]
      • 整型数组:[ 1, 2 ]
      • 数组型数组:[ 1, [ 2, 3 ]] 等价于[ 1, 2, 3 ]
      • 对象数组:[ { “name”: “Mary”, “age”: 12 }, { “name”: “John”, “age”: 10 }]
    • 对象类型(Object datatype): object 用于单个JSON对象;
    • 嵌套类型(Nested datatype): nested 用于JSON数组;
  • 地理位置类型(Geo datatypes)
    • 地理坐标类型(Geo-point datatype): geo_point 用于经纬度坐标;
    • 地理形状类型(Geo-Shape datatype): geo_shape 用于类似于多边形的复杂形状;
  • 专业类型(Specialised datatypes)
    • IPv4 类型(IPv4 datatype): ip 用于IPv4 地址;
    • Completion 类型(Completion datatype): completion 提供自动补全建议;
    • Token count 类型(Token count datatype): token_count 用于统计做了标记的字段的index数目,该值会一直增加,不会因为过滤条件而减少。
    • mapper-murmur3 类型:通过插件,可以通过 murmur3 来计算index的 hash 值;
    • 附加类型(Attachment datatype):采用mapper-attachments 插件,可支持 attachments 索引,例如Microsoft Office 格式,Open Document 格式,ePub, HTML 等。
  1. Logstash 关于 '\' 反斜杠的处理
    利用 mutategsub 处理字符串要保留斜杠的时候会出现解析失败,
    想要保留反斜杠,必须在反斜杠后面保留一个字符,如下
    mutate {
      gsub => [
        "request_body", "\\x5C\\x22", '\\"'
      ]
    
  2. Logstash 处理 ip
    	geoip {
    		source => "clientip"
    	}
    
  3. Logstash 在线验证地址
  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值