Parsing Logs with Logstash

Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite "stash".

Configuring Filebeat to Send Log Lines to Logstash

Before you create the Logstash pipeline, you’ll configure Filebeat to send log lines to Logstash.

The Filebeat client is a lightweight, resource-friendly tool that collects logs from files on the server and forwards these logs to your Logstash instance for processing.

Filebeat is designed for reliability and low latency. Filebeat has a light resource footprint on the host machine, and the Beats input plugin minimizes the resource demands on the Logstash instance.

In a typical use case, Filebeat runs on a separate machine from the machine running your Logstash instance. 

To install Filebeat on your data source machine, download the appropriate package from the Filebeat product page. You can also refer to Getting Started with Filebeat in the Beats documentation for additional installation instructions.

After installing Filebeat, you need to configure it. Open the filebeat.yml file located in your Filebeat installation directory, and replace the contents with the following lines. Make sure paths points to the example Apache log file, logstash-tutorial.log (prepared earlier):

filebeat.prospectors:

- input_type: log

  paths:

    /home/xx/logstash-tutorial.log

output.logstash:

  hosts: ["localhost:5043"]

Save your changes.

To keep the configuration simple, you won’t specify TLS/SSL settings as you would in a real world scenario.

At the data source machine, run Filebeat with the following command:

./filebeat -e -c filebeat.yml -d "publish"

Filebeat will attempt to connect on port 5043. Until Logstash starts with an active Beats plugin, there won’t be any answer on that port, so any messages you see regarding failure to connect on that port are normal for now.

 

 

Configuring Logstash for Filebeat Input

Next, create a Logstash configuration pipeline that uses the Beats input plugin to receive events from Beats.

The following text represents the skeleton of a configuration pipeline:

# The # character at the beginning of a line indicates a comment. Use

# comments to describe your configuration.

input {

}

# The filter part of this file is commented out to indicate that it is

# optional.

# filter {

#

# }

output {

}

This skeleton is non-functional, because the input and output sections don’t have any valid options defined.

To get started, copy and paste the skeleton configuration pipeline into a file named first-pipeline.conf in your home Logstash directory.

Next, configure the Logstash instance to use the Beats input plugin by adding the following lines to the input section of the first-pipeline.conf file:

beats {

      port => "5043"

  }

Add the following line to the output section so that the output is printed to stdout when you run Logstash:

stdout { codec => rubydebug }

 

When you’re done, the contents of first-pipeline.conf should look like this:

input {

    beats {

        port => "5043"

    }

}

# The filter part of this file is commented out to indicate that it is

# optional.

# filter {

#

# }

output {

    stdout { codec => rubydebug }

}

 

To verify your configuration, run the following command:

bin/logstash -f first-pipeline.conf --config.test_and_exit

The --config.test_and_exit option parses your configuration file and reports any errors.

If the configuration file passes the configuration test, start Logstash with the following command:

 

bin/logstash -f first-pipeline.conf --config.reload.automatic

The --config.reload.automatic option enables automatic config reloading so that you don’t have to stop and restart Logstash every time you modify the configuration file.

If your pipeline is working correctly, you should see a series of events like the following written to the console:

 

{

    "@timestamp" => 2016-10-11T20:54:06.733Z,

        "offset" => 325,

      "@version" => "1",

          "beat" => {

        "hostname" => "My-MacBook-Pro.local",

            "name" => "My-MacBook-Pro.local"

    },

    "input_type" => "log",

          "host" => "My-MacBook-Pro.local",

        "source" => "/path/to/file/logstash-tutorial.log",

       "message" => "83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",

          "type" => "log",

          "tags" => [

        [0"beats_input_codec_plain_applied"

    ]

}

...

Parse logs with the grok filter plugin

 

Now you have a working pipeline that reads log lines from Filebeat. However you’ll notice that the format of the log messages is not ideal. You want to parse the log messages to create specific, named fields from the logs. To do this, you’ll use the grok filter plugin.

 

The grok filter plugin is one of several plugins that are available by default in Logstash. For details on how to manage Logstash plugins, see the reference documentation for the plugin manager.

 

The grok filter plugin enables you to parse the unstructured log data into something structured and queryable.

 

Because the grok filter plugin looks for patterns in the incoming log data, configuring the plugin requires you to make decisions about how to identify the patterns that are of interest to your use case. A representative line from the web server log sample looks like this:

Edit the first-pipeline.conf file and replace the entire filter section with the following text:

filter {

    grok {

        match => { "message" => "%{COMBINEDAPACHELOG}"}

    }

}

When you’re done, the contents of first-pipeline.conf should look like this:

 

input {

    beats {

        port => "5043"

    }

}

filter {

    grok {

        match => { "message" => "%{COMBINEDAPACHELOG}"}

    }

}

output {

    stdout { codec => rubydebug }

}

Save your changes.

Because you’ve enabled automatic config reloading, you don’t have to restart Logstash to pick up your changes. If you haven't enabled automatic config relorading, please restart Logstash to take the changes into effect.

However, you do need to force Filebeat to read the log file from scratch. To do this, go to the terminal window where Filebeat is running and press Ctrl+C to shut down Filebeat. Then delete the Filebeat registry file. For example, run:

sudo rm data/registry

Since Filebeat stores the state of each file it harvests in the registry, deleting the registry file forces Filebeat to read all the files it’s harvesting from scratch.

Next, restart Filebeat with the following command:

sudo ./filebeat -e -c filebeat.yml -d "publish"

 

After processing the log file with the grok pattern, the events will have the following JSON representation:

{

        "request" => "/presentations/logstash-monitorama-2013/images/kibana-search.png",

          "agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",

         "offset" => 325,

           "auth" => "-",

          "ident" => "-",

     "input_type" => "log",

           "verb" => "GET",

         "source" => "/path/to/file/logstash-tutorial.log",

        "message" => "83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",

           "type" => "log",

           "tags" => [

        [0"beats_input_codec_plain_applied"

    ],

       "referrer" => "\"http://semicomplete.com/presentations/logstash-monitorama-2013/\"",

     "@timestamp" => 2016-10-11T21:04:36.167Z,

       "response" => "200",

          "bytes" => "203023",

       "clientip" => "83.149.9.216",

       "@version" => "1",

           "beat" => {

        "hostname" => "My-MacBook-Pro.local",

            "name" => "My-MacBook-Pro.local"

    },

           "host" => "My-MacBook-Pro.local",

    "httpversion" => "1.1",

      "timestamp" => "04/Jan/2015:05:13:42 +0000"

}

 

Index your data into a File

Edit the first-pipeline.conf file and replace the entire output section with the following text:

output {

          

       #stdout { codec => rubydebug }

       file {

        path => "/scratch/temp/example.log"

        }   

}

At this point, your first-pipeline.conf file has input, filter, and output sections properly configured, and looks something like this:

#The #character at the beginning of a line indicates a comment. Use

# comments to describe your configuration.

input {

   beats {

        host => "slc08yld.us.oracle.com"

        port => "5043"

    }

 

}

# The filter part of this file is commented out to indicate that it is

# optional.

 

filter {

    grok {

        match => { "message" => "%{COMBINEDAPACHELOG}"}

    }

 

    geoip {

        source => "clientip"

    }

}

 

output {

          

       #stdout { codec => rubydebug }

       file {

        path => "/scratch/temp/example.log"

        }

}

Save your changes. To force Filebeat to read the log file from scratch, as you did earlier, shut down Filebeat, delete the registry file, and then restart Filebeat.

You can find the logs in /scratch/temp/example.log.

 

 

 

Issues and Solutions

No.

Issue

Solution

No.

Issue

Solution

1

When running  filebeat:

Error: found character that cannot start any token

Check the filebeat.yml file if you are using tabs for indentation. YAML doesn't allow tabs; it requires spaces.
2

When run logstash:

NameError: cannot link Java class org.apache.logging.log4j.core.config.LoggerConfig needs Java 8(java.lang.UnsupportedClassVersionError: org/logstash/log/LogstashLogEventFactory : Unsupported major.minor version 52.0)

Set JAVA_HOME for logstah process

  • Check Java version on the machine:
    $ java -version
    java version "1.7.0_25"
     
  • Download JDK 1.8 and install 
     
  • Export JAVA_HOME for the logstash process
    • find JDK directory, ex: /scratch/software/java
    • Editlogstashunder /bin by adding the following line:
      • export JAVA_HOME='/scratch/software/java/jdk1.8.0_141'

This will change JAVA_HOME foronlylogstashprocess. The other process will not be impacted.

3dial tcp 10.245.226.153:5044: getsockopt: connection refused.Make sure logstash server is running on that machine with the given port and that the server and port are accessible from the machine you are running file beats on
4

Logstash is not able to start since configuration auto reloading was enabled but the configuration contains plugins that don't support it. Quitting... {:pipeline_id=>"main", :plugins=>[LogStash::Inputs::Stdin]}

 Logstash doesn't enable automatic config reloading by default.

Remove config.reload.automatic from the command:

bin/logstash -f first-pipeline.conf --config.reload.automatic

5

Elasticsearch requires at least Java 8 but your Java version from /usr/bin/java does not meet this requirement

Add:

  • export JAVA_HOME='/scratch/software/java/jdk1.8.0_141' to the top ofbin/elasticsearch.

 

6  

 

 

REF

http://blog.csdn.net/u010454030/article/details/49659467

https://www.elastic.co/guide/en/logstash/current/advanced-pipeline.html

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值