logstash 数据类型的修改

logstash 数据类型的修改
logstash 中可以设置字段的类型为integer,float,string,boolean
filter{
mutate{
convert => ["request_time","float"]  #设置request_time的类型为float类型
}
}


注意:mutate 除了转化字符值,还支持对数组类型的字段进行转换,即将["1","2"]转换成[1,2],但是不支持哈希字段做类似处理。
input {
file {
path => [
"/home/raw_data/8_31/*.csv"
]
start_position => "beginning" # 从什么位置读取
sincedb_path => "/home/es/sincedb/apk"  #sincedb存放路径
type => "apk"   #设置type
tags => ["hgw", "gather"]
}
}


filter
{
if [type] == "apk"
{
csv
{
columns => ["Type","ProbeOUI","ProbeVersion","MAC",
"UploadTime","IptvAccount","STBId","OUI","ProductClass",
"SoftwareVersion","HardwareVersion","IpAddress","TeleOUI",
"TeleProductClass","TeleResolution","LogUploadInterval",
"ReportSerialNumber","WorkingTime","CPURate","MEMRate",
"FrameLR","FrameDelay","FrameJitter","VideoStreamingRate",
"RTPLossRate","RTPThroughPut","MdiMLR","MdiType","MdiDF","MAXMdiDF",
"Jitter","MAXJitter","ResponseDelay","ChannelSwitchDelay","StreamBreak",
"EPGDelay","ResourceURL","EPGVisitNum","EPGSuccessNum",
"KaNum","KaDelay"]
separator => "|"
quote_char => "‰"
remove_field => ["ProbeVersion",
"UploadTime","IptvAccount","STBId","ProductClass",
"SoftwareVersion","HardwareVersion","IpAddress","TeleOUI",
"TeleProductClass","TeleResolution","LogUploadInterval",
"ReportSerialNumber","WorkingTime","CPURate","MEMRate",
"FrameLR","FrameDelay","FrameJitter","VideoStreamingRate",
"RTPThroughPut","MdiMLR","MdiType","MdiDF","MAXMdiDF",
"Jitter","MAXJitter","ResponseDelay","ChannelSwitchDelay","StreamBreak",
"EPGDelay","ResourceURL","EPGVisitNum","EPGSuccessNum",
"KaNum","KaDelay"]   #删除不需要的字段
}
mutate {
        convert => ["RTPLossRate", "integer"] #修改字段类型
    }
if ([Type]!="1" or [ProbeOUI]!="YUCHUANG"){
drop{}
}
}
}


output{
if [type] == "apk"
{
elasticsearch
{
hosts => ["*:9200"]
index => "ana-%{type}"
document_type => "%{type}"
flush_size => 8000
idle_flush_time => 10
sniffing => true
template_overwrite => true
codec => "json"
}
}
}


filters/mutate 插件是 Logstash 另一个重要插件。它提供了丰富的基础类型数据处理能力。包括类型转换,字符串处理和字段处理等。


类型转换


类型转换是 filters/mutate 插件最初诞生时的唯一功能。其应用场景在之前 Codec/JSON 小节已经提到。


可以设置的转换类型包括:"integer","float" 和 "string"。示例如下:


filter {
    mutate {
        convert => ["request_time", "float"]
    }
}
注意:mutate 除了转换简单的字符值,还支持对数组类型的字段进行转换,即将 ["1","2"] 转换成 [1,2]。但不支持对哈希类型的字段做类似处理。有这方面需求的可以采用稍后讲述的 filters/ruby 插件完成。


字符串处理


gsub
仅对字符串类型字段有效


    gsub => ["urlparams", "[\\?#]", "_"]
split
filter {
    mutate {
        split => ["message", "|"]
    }
}
随意输入一串以|分割的字符,比如 "123|321|adfd|dfjld*=123",可以看到如下输出:


{
    "message" => [
        [0] "123",
        [1] "321",
        [2] "adfd",
        [3] "dfjld*=123"
    ],
    "@version" => "1",
    "@timestamp" => "2014-08-20T15:58:23.120Z",
    "host" => "raochenlindeMacBook-Air.local"
}
join
仅对数组类型字段有效


我们在之前已经用 split 割切的基础再 join 回去。配置改成:


filter {
    mutate {
        split => ["message", "|"]
    }
    mutate {
        join => ["message", ","]
    }
}
filter 区段之内,是顺序执行的。所以我们最后看到的输出结果是:


{
    "message" => "123,321,adfd,dfjld*=123",
    "@version" => "1",
    "@timestamp" => "2014-08-20T16:01:33.972Z",
    "host" => "raochenlindeMacBook-Air.local"
}
merge
合并两个数组或者哈希字段。依然在之前 split 的基础上继续:


filter {
    mutate {
        split => ["message", "|"]
    }
    mutate {
        merge => ["message", "message"]
    }
}
我们会看到输出:


{
       "message" => [
        [0] "123",
        [1] "321",
        [2] "adfd",
        [3] "dfjld*=123",
        [4] "123",
        [5] "321",
        [6] "adfd",
        [7] "dfjld*=123"
    ],
      "@version" => "1",
    "@timestamp" => "2014-08-20T16:05:53.711Z",
          "host" => "raochenlindeMacBook-Air.local"
}
如果 src 字段是字符串,会自动先转换成一个单元素的数组再合并。把上一示例中的来源字段改成 "host":


filter {
    mutate {
        split => ["message", "|"]
    }
    mutate {
        merge => ["message", "host"]
    }
}
结果变成:


{
       "message" => [
        [0] "123",
        [1] "321",
        [2] "adfd",
        [3] "dfjld*=123",
        [4] "raochenlindeMacBook-Air.local"
    ],
      "@version" => "1",
    "@timestamp" => "2014-08-20T16:07:53.533Z",
          "host" => [
        [0] "raochenlindeMacBook-Air.local"
    ]
}
看,目的字段 "message" 确实多了一个元素,但是来源字段 "host" 本身也由字符串类型变成数组类型了!


下面你猜,如果来源位置写的不是字段名而是直接一个字符串,会产生什么奇特的效果呢?


strip
lowercase
uppercase
字段处理


rename
重命名某个字段,如果目的字段已经存在,会被覆盖掉:


filter {
    mutate {
        rename => ["syslog_host", "host"]
    }
}
update
更新某个字段的内容。如果字段不存在,不会新建。


replace
作用和 update 类似,但是当字段不存在的时候,它会起到 add_field 参数一样的效果,自动添加新的字段。


执行次序


需要注意的是,filter/mutate 内部是有执行次序的。其次序如下:


    rename(event) if @rename
    update(event) if @update
    replace(event) if @replace
    convert(event) if @convert
    gsub(event) if @gsub
    uppercase(event) if @uppercase
    lowercase(event) if @lowercase
    strip(event) if @strip
    remove(event) if @remove
    split(event) if @split
    join(event) if @join
    merge(event) if @merge


    filter_matched(event)
而 filter_matched 这个 filters/base.rb 里继承的方法也是有次序的。


  @add_field.each do |field, value|
  end
  @remove_field.each do |field|
  end
  @add_tag.each do |tag|
  end
  @remove_tag.each do |tag|
  end
 

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值