问题描述
将类似于:2015/8/26 12:05:00:000000
的数据转化成:2015-08-26T04:05:00.000Z
背景
目标:
将操做日志中的字符串类型的时间格式转化成@timestamp
方法:
首先由于日志的格式有很多种,Logstash自带的正则表达式可能不满足我们的需求,但是我们可以通过grok插件引入自己定义的正则表达式。
具体步骤:
在Logstash的安装目录下/home/hadoop1/bms/logstash-1.5.4/conf下创建patterns目录;
并在该目录下(/home/hadoop1/bms/logstash-1.5.4/conf/patterns)创建mypattern文件,在文件内写好自己需要的正则表达式;
在本实验中我只想将字符串类型的时间转化成@timestamp,所以我写的正则表达式如下:
MYPATTERN (?>\d\d){2,2}\-(?:0?[1-9]|1[0-2])\-(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])\s(0\d{1}|1\d{1}|2[0-3]):[0-5]\d{1}:([0-5]\d{1}):([0-9]{6})
注意:
该这则表达式,不一定是最好的,但是基本满足我自己的需求。我的时间串大概是:2015-12-12 12:12:12:000000
Logstash的配置文件
- input{
- file{
- path => "/home/hadoop1/bms/mylog/http.log"
- start_position => "beginning"
- }
- }
- filter{
- grok{
- patterns_dir => "./patterns"
- match => { "message" => ["%{IP:source_Ip},%{NUMBER:source_Port},%{IP:dest_Ip},%{NUMBER:dest_Port},%{MYPATTERN:create_Time}"]}
- }
- date {
- match => [ "create_Time", "yyyy-MM-dd HH:mm:ss:ssssss" ]
- target => "@timestamp"
- add_tag => [ "tmatch" ]
- }
- mutate {
- convert => { "dest_Port" => "integer" }
- convert => { "source_Port" => "integer" }
- }
- }
- output {
- elasticsearch {
- host => "localhost"
- }
- }
- ~
备注:只有“~”和下面不太一样。
注意:
%{MYPATTERN:create_Time}中的MYPATTERN就是我自己定义的匹配我的时间串的正则表达式的名字。
<strong><span style="font-size:24px;">启动logstash集群,启动的命令是:
../bin/logstash agent -f kafkaInput_esOutPut3.conf </span></strong>
<strong><span style="font-size:24px;">模拟数据发送,发送命令:
echo 1.1.1.1,23,2.2.2.2,223,2015-03-03 12:12:12:000000>> /home/hadoop1/bms/mylog/http.log </span></strong>
测试结果如下:
- [hadoop1@slave2 conf]$ curl 'localhost:9200/logstash-2015.03.03/_search?pretty'
- {
- "took" : 3,
- "timed_out" : false,
- "_shards" : {
- "total" : 5,
- "successful" : 5,
- "failed" : 0
- },
- "hits" : {
- "total" : 1,
- "max_score" : 1.0,
- "hits" : [ {
- "_index" : "logstash-2015.03.03",
- "_type" : "logs",
- "_id" : "AVDNDMeAlr_lum5SU6ix",
- "_score" : 1.0,
- "_source":{"message":"1.1.1.1,23,2.2.2.2,223,2015-03-03 12:12:12:000000","@version":"1","@timestamp":"2015-03-03T04:12:00.000Z","host":"slave2","path":"/home/hadoop1/bms/mylog/http.log","source_Ip":"1.1.1.1","source_Port":23,"dest_Ip":"2.2.2.2","dest_Port":223,"create_Time":"2015-03-03 12:12:12:000000","tags":["tmatch"]}
- } ]
- }
- }
存储到指定的field中
在上面的描述中,我是将我的字符中的时间类型赋值给了@timestamp,但是有些时候是需要保留该字段的额真实值的,所以完全可以进行下面的配置将自己转化过来的时间存储到指定的field中,配置如下:
- input{
- file{
- path => "/home/hadoop1/bms/mylog/http.log"
- start_position => "beginning"
- }
- }
- filter{
- grok{
- patterns_dir => "./patterns"
- match => { "message" => ["%{IP:source_Ip},%{NUMBER:source_Port},%{IP:dest_Ip},%{NUMBER:dest_Port},%{MYPATTERN:create_Time}"]}
- }
- date {
- match => [ "create_Time", "yyyy-MM-dd HH:mm:ss:ssssss" ]
- target => "begin_Time"
- add_tag => [ "tmatch" ]
- }
- mutate {
- convert => { "dest_Port" => "integer" }
- convert => { "source_Port" => "integer" }
- }
- }
- output {
- elasticsearch {
- host => "localhost"
- }
- }
经过上面的配置,我将转化过来的时间类型存储在了begin_Time字段里面,实验结果如下:
- [hadoop1@slave2 conf]$ curl 'localhost:9200/logstash-2015.11.03/_search?pretty'
- {
- "took" : 5,
- "timed_out" : false,
- "_shards" : {
- "total" : 5,
- "successful" : 5,
- "failed" : 0
- },
- "hits" : {
- "total" : 1,
- "max_score" : 1.0,
- "hits" : [ {
- "_index" : "logstash-2015.11.03",
- "_type" : "logs",
- "_id" : "AVDNIYwTGyEV_57_F995",
- "_score" : 1.0,
- "_source":{"message":"1.1.1.1,23,2.2.2.2,223,2015-03-03 12:12:12:000000","@version":"1","@timestamp":"2015-11-03T11:35:38.945Z","host":"slave2","path":"/home/hadoop1/bms/mylog/http.log","source_Ip":"1.1.1.1","source_Port":23,"dest_Ip":"2.2.2.2","dest_Port":223,"create_Time":"2015-03-03 12:12:12:000000","begin_Time":"2015-03-03T04:12:00.000Z","tags":["tmatch"]}
- } ]
- }
- }
原文来自:
http://blog.csdn.net/xuguokun1986/article/details/49620579
和
http://blog.csdn.net/xuguokun1986/article/details/49620797