Logstash：处理多个 input

Elastic 中国社区官方博客

已于 2022-09-15 11:36:22 修改

阅读量6.7k

点赞数 2

分类专栏： Elastic Logstash 文章标签： elasticsearch 大数据搜索引擎全文检索

于 2019-09-18 16:39:51 首次发布

本文为博主原创文章，未经博主允许不得转载。

本文链接：https://blog.csdn.net/UbuntuTouch/article/details/100980709

版权

Elastic 同时被 2 个专栏收录

1859 篇文章

订阅专栏

Logstash

124 篇文章

订阅专栏

我们知道Logstash的架构如下：

它的整个 pipleline 分为三个部分：

input 插件：提取数据。这可以来自日志文件，TCP 或 UDP 侦听器，若干协议特定插件（如 syslog 或I RC）之一，甚至是排队系统（如 Redis，AQMP 或 Kafka）。此阶段使用围绕事件来源的元数据标记传入事件。
filter 插件：插件转换并丰富数据
output 插件: 将已处理的事件加载到其他内容中，例如 ElasticSearch 或其他文档数据库，或排队系统，如 Redis，AQMP 或Kafka。它还可以配置为与 API 通信。也可以将像 PagerDuty 这样的东西连接到 Logstash 输出。

这里的 input 可以支持多个 input，同时多个 worker 可以处理 filter 及 output:

在今天的介绍中，我们来介绍一下如何使用多个input。

应用文件

为了说明问题的方便，我把所需要用到的文件都传到 github 地址 https://github.com/liu-xiao-guo/logstash_multi-input。我们可以通过如下的方式来下载这些文件：

git clone https://github.com/liu-xiao-guo/logstash_multi-input

Logstash配置文件

Logstash 的配置文件如下：

multi-input.conf

input {
  file {
    path => "/Users/liuxg/data/multi-input/apache.log"
  	start_position => "beginning"
    sincedb_path => "/dev/null"
    # ignore_older => 100000
    type => "apache"
  }
}

input {
  file {
    path => "/Users/liuxg/data/multi-input/apache-daily-access.log"
  	start_position => "beginning"
    sincedb_path => "/dev/null"
    type => "daily"
  }
}

filter {
  	grok {
    	match => {
      		"message" => '%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}'
    	}
  	}

	if[type] == "apache" {
		mutate {
	  		add_tag => ["apache"]
	  	}
	}

	if [type] == "daily" {
		mutate {
			add_tag => ["daily"]
		}
	} 
}


output {
	stdout {
		codec => rubydebug
	}

	if "apache" in [tags] {
	  	elasticsearch {
	    	index => "apache_log"
	    	template => "/Users/liuxg/data/multi-input/apache_template.json"
	    	template_name => "apache_elastic_example"
	    	template_overwrite => true
	  }	
	}

	if "daily" in [tags] {
	  	elasticsearch {
	    	index => "apache_daily"
	    	template => "/Users/liuxg/data/multi-input/apache_template.json"
	    	template_name => "apache_elastic_example"
	    	template_overwrite => true
	  }	
	}	
}

为了说明问题的方便，我们使用了两个 input。它们分别对应不同的 log 文件。对于这两个 input，我们也使用了不同的 type 来表示：apache和 daily。尽管它们的格式是一样的，它们共同使用同样的一个 grok filter，但是我们还是想分别对它们进行处理。为此，我们添加了一个 tag。我们也可以添加一个 field 来进行区别。在 output 的部分，我们根据在 filter 部分设置的 tag来对它们输出到不同的 index里。

运行 Logstash

我们可以通过如下的命令来运行:

$ pwd
/Users/liuxg/elastic/logstash-7.3.0
bogon:logstash-7.3.0 liuxg$ sudo ./bin/logstash -f ~/data/multi-input/multi-input.conf

当你们运行这个例子的时候，你们需要根据自己存放 multi-input.conf 文件的位置改变而改变上面的命令。

运行的结果如下：

根据显示的结果可以看出来 daily 的事件最早被处理及输出。接着 apache 的数据才开始处理。在实际的应用中，我们可能有不同的数据源，比如来自其它 beats 的监听某个端口的数据。

我们可以在 Kibana 中看到我们最终的 index 数据：