02.pipeline常用processor

最新推荐文章于 2024-06-17 09:47:08 发布

夜月行者

最新推荐文章于 2024-06-17 09:47:08 发布

阅读量1.1k

点赞数

分类专栏： # pipeline 文章标签： elasticsearch

本文链接：https://blog.csdn.net/u013200380/article/details/109326533

版权

pipeline 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文章目录

感觉ingest是es的一个着力点，因为现在ingest的processor越来越多了。

Append Processor: 在一个已经存在的field上增加一些value
Bytes Processor: 用于在"b", “kb”, “mb”, “gb”, “tb”, “pb” 之间进行单位换算
Convert Processor: 对字段的类型进行转换设置
Date Processor: 将原文档中的某个日期字段转换成一个Elasticsearch识别的时间戳字段(一般默认为@timestamp)
Date Index Name Processor: 把文档按照日期分到按天或者月创建的索引当中去
Dissect Processor: 像grok一样，但是语法更加简单
Dot Expander Processor: 这个一般结合其他的processor使用，他使后面定义的processor能够使用.的方式去访问field
Drop Processor: 删除doc的processor
Fail Processor: 该处理器比较简单,就是当文档通过该pipeline的时候,一旦出现异常,该pipeline指定的错误信息就会返回给请求者
Foreach Processor: 一个Foreach Processor是用来处理一些数组字段,数组内的每个元素都会使用到一个相同的处理器,比如
GeoIP Processor: 将ip转成经纬度
Grok Processor: 像logstash中强大的grok一样，可以提供非常强大的日志分割功能
Gsub Processor: 使用正则来完成字符替换等功能
HTML Strip Processor: 脱掉html标签
Join Processor: 将数组内容jion成一个字符串，和python中的字符串的join方法很类似
JSON Processor: 将符合json格式的字符装换成json
KV Processor: 使用某个分隔符，将一个字段分割成k,v 格式
Lowercase Processor: 将某个字段的内容都转成小写
Pipeline Processor: 执行另一个pipeline
Remove Processor: 删除某些字段
Rename Processor: 修改某个field的name
Script Processor: 使用es中的script来处理，直接是script的编程访问模式，script能访问哪些字段这里就能访问那些字段
Set Processor: 指定字段存在时,修改指定字段的值,指定字段不存在时,新增该字段并设置该字段的值,可以修改_index的值哦
Set Security User Processor:
Split Processor: 用于将一个以指定分隔分开的字符串转换成一个数组类型的字段
Sort Processor: 用于处理数组类型的字段,可以将存储在原文档中某个数组类型的字段中的元素按照升序或降序来对原元素进行排序
Trim Processor: 专门用于处理字符串两端的空格问题
Uppercase Processor: 该处理器类似于Lowercase Processor,将字符串文本统一转换成大写.
URL Decode Processor: url翻译成string
User Agent processor: 从http标准的ua信息中获取信息

这里仅仅介绍部分自己认为常用的pipline

1. Set Processor: 指定字段存在时,修改指定字段的值,指定字段不存在时,新增该字段并设置该字段的值,可以修改_index的值哦

使用样例,将一个field的value拷贝到另一个新的field上面

PUT _ingest/pipeline/set_os
{
  "description": "sets the value of host.os.name from the field os",
  "processors": [
    {
      "set": {
        "field": "host.os.name",
        "value": "{{os}}"
      }
    }
  ]
}

POST _ingest/pipeline/set_os/_simulate
{
  "docs": [
    {
      "_source": {
        "os": "Ubuntu"
      }
    }
  ]
}

field: 必须有, The field to insert, upsert, or update. Supports template snippets.

value: 必须有 The value to be set for the field. Supports template snippets.

override: 非必须，默认为true,If processor will update fields with pre-existing non-null-valued field. When set to false, such fields will not be touched.

if: Conditionally execute this processor.

on_failure: Handle failures for this processor. See Handling Failures in Pipelines.

ignore_failure: 默认为false, Ignore failures for this processor. See Handling Failures in Pipelines.

tag: An identifier for this processor. Useful for debugging and metrics.

2. Append Processor: 在一个已经存在的field上增加一些value

field: 必须有, The field to insert, upsert, or update. Supports template snippets.

value: 必须有 The value to be set for the field. Supports template snippets.

override: 非必须，默认为true,If processor will update fields with pre-existing non-null-valued field. When set to false, such fields will not be touched.

if: Conditionally execute this processor.

on_failure: Handle failures for this processor. See Handling Failures in Pipelines.

ignore_failure: 默认为false, Ignore failures for this processor. See Handling Failures in Pipelines.

tag: An identifier for this processor. Useful for debugging and metrics.

使用样例

PUT script_test/_mapping
{
  
    "properties":{
      "name":{
        "type":"keyword"
      },
      "age":{
        "type":"integer"
      },
      "age_arr":{
        "type":"integer"
      }
    }
  
}



PUT script_test/_doc/2
{
  "name":"tengfei",
  "age":[22,23],
  "age_arr":[12,15,13,98,102]
}

PUT script_test/_doc/3
{
  "name":"tengfei",
  "age":22,
  "age_arr":[12,15,13,98,102]
}


PUT _ingest/pipeline/append_pipe
{
  "description": "append to friend",
  "processors": [
    {"append": {
      "field": "age",
      "value": [23,78]
    }}
  ]
}


PUT script_test/_doc/23?pipeline=append_pipe
{
  "name":"append test"
  "age":88

}

对应放进去的doc为
{
        "_index" : "script_test",
        "_type" : "_doc",
        "_id" : "23",
        "_score" : 1.0,
        "_source" : {
          "name" : "append test",
          "age" : [
            23,
            78,
	    88
          ]
        }
}

相对于update_by_query中的script操作

POST script_test/_update_by_query
{
  "query":{
    "match_all":{}
  },
  "script":{
    "lang":"painless",
    "source":"ctx._source.age?.add(params.new_age)",
    "params":{
      "from":"china",
      "new_age":55
    }
  }
}

这个操作会报错，因为,age字段有些不是数组，直接存储的integer

 "script": "ctx._source.age?.add(params.new_age)",
    "lang": "painless",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "dynamic method [java.lang.Integer, add/1] not found"
    }

但是这个操作换做在ingest pipeline当中则是正常可以执行的。

3. Drop Processor: 删除doc的processor

if: Conditionally execute this processor.

on_failure: Handle failures for this processor. See Handling Failures in Pipelines.

ignore_failure: 默认为false, Ignore failures for this processor. See Handling Failures in Pipelines.

tag: An identifier for this processor. Useful for debugging and metrics.

使用样例

PUT _ingest/pipeline/drop_pipeline
{
  "description": "drop doc when name is chen",
  "processors": [
    {
      "drop": {
        "if": "ctx.name == 'chen'"
      }
    }
  ]
}


PUT script_test/_doc/31?pipeline=drop_pipeline
{
  "name":"chen",
  "age":88
}

返回
{
  "_index" : "script_test",
  "_type" : "_doc",
  "_id" : "31",
  "_version" : -3,
  "result" : "noop", # 这里的意思就是跳过了，不处理
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "failed" : 0
  }
}



PUT script_test/_doc/32?pipeline=drop_pipeline
{
  "name":"chenchuang",
  "age":88
}

返回
{
  "_index" : "script_test",
  "_type" : "_doc",
  "_id" : "32",
  "_version" : 1,
  "result" : "created", # created 暗示已经创建成功
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 21,
  "_primary_term" : 1
}

4. Remove Processor: 删除某些字段

field: 必须有, The field to insert, upsert, or update. Supports template snippets.

ignore_missing: 默认为false, If true and field does not exist or is null, the processor quietly exits without modifying the document

if: Conditionally execute this processor.

on_failure: Handle failures for this processor. See Handling Failures in Pipelines.

ignore_failure: 默认为false, Ignore failures for this processor. See Handling Failures in Pipelines.

tag: An identifier for this processor. Useful for debugging and metrics.

使用样例


PUT _ingest/pipeline/remove_pipeline
{
  "description": "remove some fields",
  "processors": [
    {
      "remove": {
        "field": ["age01","age"]
      }
    }
  ]
}

PUT script_test/_doc/33?pipeline=remove_pipeline
{
  "name":"remove test",
  "age":[123,45,67],
  "age01":32,
  "age_arr":[34,21]
}

GET script_test/_doc/33

返回
{
  "_index" : "script_test",
  "_type" : "_doc",
  "_id" : "33",
  "_version" : 1,
  "_seq_no" : 22,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "remove test",
    "age_arr" : [
      34,
      21
    ]
  }
}

5. Rename Processor: 修改某个field的name

field: 必须有, The field to insert, upsert, or update. Supports template snippets.

target_field: 必须要有，The new name of the field. Supports template snippets.

ignore_missing: 默认为false, If true and field does not exist or is null, the processor quietly exits without modifying the document

if: Conditionally execute this processor.

on_failure: Handle failures for this processor. See Handling Failures in Pipelines.

ignore_failure: 默认为false, Ignore failures for this processor. See Handling Failures in Pipelines.

tag: An identifier for this processor. Useful for debugging and metrics.

使用样例


PUT _ingest/pipeline/rename_pipeline
{
  "description": "rename fields",
  "processors": [
    {
      "rename": {
        "field": "age",
        "target_field": "life"
      }
    }
  ]
}

PUT script_test/_doc/35?pipeline=rename_pipeline
{
  "name":"rename test",
  "age":108
}

GET script_test/_doc/35

返回
{
  "_index" : "script_test",
  "_type" : "_doc",
  "_id" : "35",
  "_version" : 1,
  "_seq_no" : 23,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "rename test",
    "life" : 108
  }
}

6. Join Processor: 将某个field的数组内容jion成一个字符串，和python中的字符串的join方法很类似

field: 必须有, The field to insert, upsert, or update. Supports template snippets.

separator: 必须，The separator character

target_field: The field to assign the joined value to, by default field is updated in-place

if: Conditionally execute this processor.

on_failure: Handle failures for this processor. See Handling Failures in Pipelines.

ignore_failure: 默认为false, Ignore failures for this processor. See Handling Failures in Pipelines.

tag: An identifier for this processor. Useful for debugging and metrics.

使用样例


PUT _ingest/pipeline/join_pipe
{
  "description": "join some fields",
  "processors": [
    {"join": {
      "field": "age_arr",
      "separator": "*",
      "target_field":"join_result"
    }}
  ]
}

PUT script_test/_doc/36?pipeline=join_pipe
{
  "name":"rename test",
  "age":108,
  "age_arr":[12,17,123,987,9]
}

GET script_test/_doc/36

返回
"_source" : {
    "name" : "rename test",
    "join_result" : "12*17*123*987*9",
    "age_arr" : [
      12,
      17,
      123,
      987,
      9
    ],
    "age" : 108
  }

7. JSON Processor: 将符合json格式的字符装换成json

field: 必须有, The field to insert, upsert, or update. Supports template snippets.

target_field: The field to insert the converted structured object into

add_to_root: 默认为false,Flag that forces the serialized json to be injected into the top level of the document. target_field must not be set when this option is chosen.

if: Conditionally execute this processor.

on_failure: Handle failures for this processor. See Handling Failures in Pipelines.

ignore_failure: 默认为false, Ignore failures for this processor. See Handling Failures in Pipelines.

tag: An identifier for this processor. Useful for debugging and metrics.

使用样例


PUT _ingest/pipeline/json_pipe
{
  "description": "json pipeline",
  "processors": [
    {
      "json": {
        "field": "child",
        "target_field": "child_obj"
      }
    }
  ]
}

PUT script_test/_doc/37?pipeline=json_pipe
{
  "name":"rename test",
  "age":108,
  "child":"{\"son\":\"datou\"}"
}

GET script_test/_doc/37

返回
{
  "_index" : "script_test",
  "_type" : "_doc",
  "_id" : "37",
  "_version" : 1,
  "_seq_no" : 26,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "rename test",
    "child_obj" : {
      "son" : "datou"
    },
    "age" : 108,
    "child" : """{"son":"datou"}"""
  }
}

8. KV Processor: 使用某个分隔符，将一个字段分割成k,v 格式

这个看起来是挺复杂的，主要是像logstash一样，把一行日志解析为多个filed,比如把ip=1.2.3.4 error=REFUSED解析为ip, error两个field

使用样例


PUT _ingest/pipeline/kv_pipe
{
  "description": "kv pipeline",
  "processors": [
    {
      "kv": {
        "field": "message",
        "field_split": " ",
        "value_split": "="
      }
    }
  ]
}

9. Split Processor: 用于将一个以指定分隔分开的字符串转换成一个数组类型的字段

field: 必须有, The field to insert, upsert, or update. Supports template snippets.

separator: 必须有，A regex which matches the separator, eg , or \s+

target_field: The field to assign the split value to, by default field is updated in-place

ignore_missing: 默认false,If true and field does not exist, the processor quietly exits without modifying the document

if: Conditionally execute this processor.

on_failure: Handle failures for this processor. See Handling Failures in Pipelines.

ignore_failure: 默认为false, Ignore failures for this processor. See Handling Failures in Pipelines.

tag: An identifier for this processor. Useful for debugging and metrics.

使用样例


PUT _ingest/pipeline/split
{
  "description": "split pipeline",
  "processors": [
    {
      "split": {
        "field": "my_field",
        "separator": "\\s+"
      }
    }
  ]
}

10. Lowercase Processor: 将某个字段的内容都转成小写

field: 必须有, The field to insert, upsert, or update. Supports template snippets.

target_field: The field to assign the converted value to, by default field is updated in-place

ignore_missing: If true and field does not exist or is null, the processor quietly exits without modifying the document

if: Conditionally execute this processor.

on_failure: Handle failures for this processor. See Handling Failures in Pipelines.

ignore_failure: 默认为false, Ignore failures for this processor. See Handling Failures in Pipelines.

tag: An identifier for this processor. Useful for debugging and metrics.

使用样例


PUT _ingest/pipeline/lowercase_pipe
{
  "description": "lowercase pipeline",
  "processors": [
    {
      "lowercase": {
        "field": "name"
      }
    }
  ]
}

11. Uppercase Processor: 该处理器类似于Lowercase Processor,将字符串文本统一转换成大写.

field: 必须有, The field to insert, upsert, or update. Supports template snippets.

target_field: The field to assign the converted value to, by default field is updated in-place

ignore_missing: If true and field does not exist or is null, the processor quietly exits without modifying the document

if: Conditionally execute this processor.

on_failure: Handle failures for this processor. See Handling Failures in Pipelines.

ignore_failure: 默认为false, Ignore failures for this processor. See Handling Failures in Pipelines.

tag: An identifier for this processor. Useful for debugging and metrics.

使用样例


PUT _ingest/pipeline/uppercase_pipe
{
  "description": "uppercase pipeline",
  "processors": [
    {
      "uppercase": {
        "field": "name"
      }
    }
  ]
}

12. Convert Processor: 对字段的类型进行转换设置

使用样例

PUT _ingest/pipeline/my-pipeline-id
{
  "description": "converts the content of the id field to an integer",
  "processors" : [
    {
      "convert" : {
        "field" : "id",
        "type": "integer"
      }
    }
  ]
}

13. Date Index Name Processor: 把文档按照日期分到按天或者月创建的索引当中去

field: 必须有, The field to insert, upsert, or update. Supports template snippets.

value: 必须有 The value to be set for the field. Supports template snippets.

override: 非必须，默认为true,If processor will update fields with pre-existing non-null-valued field. When set to false, such fields will not be touched.

if: Conditionally execute this processor.

on_failure: Handle failures for this processor. See Handling Failures in Pipelines.

ignore_failure: 默认为false, Ignore failures for this processor. See Handling Failures in Pipelines.

tag: An identifier for this processor. Useful for debugging and metrics.

使用样例

PUT _ingest/pipeline/monthlyindex
{
  "description": "monthly date-time index naming",
  "processors" : [
    {
      "date_index_name" : {
        "field" : "date1",
        "index_name_prefix" : "myindex-",
        "date_rounding" : "M"
      }
    }
  ]
}

PUT /myindex/_doc/1?pipeline=monthlyindex
{
  "date1" : "2016-04-25T12:02:01.789Z"
}
 
{
  "_index" : "myindex-2016-04-01",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 55,
  "_primary_term" : 1
}

使用模拟方式

POST _ingest/pipeline/_simulate
{
  "pipeline" :
  {
    "description": "monthly date-time index naming",
    "processors" : [
      {
        "date_index_name" : {
          "field" : "date1",
          "index_name_prefix" : "myindex-",
          "date_rounding" : "M"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "date1": "2016-04-25T12:02:01.789Z"
      }
    }
  ]
}

返回
{
  "docs" : [
    {
      "doc" : {
        "_index" : "<myindex-{2016-04-25||/M{yyyy-MM-dd|UTC}}>",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "date1" : "2016-04-25T12:02:01.789Z"
        },
        "_ingest" : {
          "timestamp" : "2020-10-27T06:30:58.273Z"
        }
      }
    }
  ]
}

这里的_index对应的"<myindex-{2016-04-25||/M{yyyy-MM-dd|UTC}}>"表达式代表的实际上就是2016-04-01

14. Dot Expander Processor: 这个一般结合其他的processor使用，他使后面定义的processor能够使用`.`的方式去访问嵌套的field

使用样例


PUT _ingest/pipeline/dot_pipeline
{
  "description": "dot expand pipeline",
  "processors": [
    {
      "dot_expander": {
        "field": "foo.bar"
      }
    }
  ]
}


PUT script_test/_doc/38?pipeline=dot_pipeline
{
  "foo.bar" : "value2",
  "foo" : {
    "bar" : "value1"
  }
}

GET script_test/_doc/38

返回
"_source" : {
    "foo" : {
      "bar" : [
        "value1",
        "value2"
      ]
    }
  }

15. Fail Processor: 该处理器比较简单,就是当文档通过该pipeline的时候,一旦出现异常,该pipeline指定的错误信息就会返回给请求者

使用样例


PUT _ingest/pipeline/fial_pipeline
{
  "description": "fail pipeline",
  "processors": [
    {
      "fail": {
        "if": "ctx.tags.contains('production') != true",
        "message": "The production tag is not present, found tags: {{tags}}"
      }
    }
  ]
}

16. Foreach Processor: 一个Foreach Processor是用来处理一些数组字段,数组内的每个元素都会使用到一个相同的处理器,比如

使用样例


PUT _ingest/pipeline/foreach_pipeline
{
  "description": "foreach pipeline",
  "processors": [
    {
      "foreach": {
        "field": "persons",
        "processor": {
          "remove": {
            "field": "_ingest._value.id"
          }
        }
      }
    }
  ]
}


PUT foreach_test/_doc/2?pipeline=foreach_pipeline
{
  "persons" : [
    {
      "id" : "1",
      "name" : "John Doe"
    },
    {
      "id" : "2",
      "name" : "Jane Doe"
    }
  ]
}


GET foreach_test/_search
返回

"_source" : {
          "persons" : [
            {
              "name" : "John Doe"
            },
            {
              "name" : "Jane Doe"
            }
          ]
        }

17. Pipeline Processor: 执行另一个pipeline

使用样例

PUT _ingest/pipeline/pipelineA
{
  "description" : "inner pipeline",
  "processors" : [
    {
      "set" : {
        "field": "inner_pipeline_set",
        "value": "inner"
      }
    }
  ]
}
 

PUT _ingest/pipeline/pipelineB
{
  "description" : "outer pipeline",
  "processors" : [
    {
      "pipeline" : {
        "name": "pipelineA"
      }
    },
    {
      "set" : {
        "field": "outer_pipeline_set",
        "value": "outer"
      }
    }
  ]
}

PUT /myindex/_doc/1?pipeline=pipelineB
{
  "field": "value"
}

对应存储后的doc是
{
  "field": "value",
  "inner_pipeline_set": "inner",
  "outer_pipeline_set": "outer"
}

18. Script Processor: 使用es中的script来处理，直接是script的编程访问模式，script能访问哪些字段这里就能访问那些字段

这个在script那一部分有详解，感觉processor中都用到了script

使用样例

PUT _ingest/pipeline/my_index
{
    "description": "use index:my_index and type:_doc",
    "processors": [
      {
        "script": {
          "source": """
            ctx._index = 'my_index';
            ctx._type = '_doc';
          """
        }
      }
    ]
}


PUT any_index/_doc/1?pipeline=my_index
{
  "message": "text"
}

19. Sort Processor: 用于处理数组类型的字段,可以将存储在原文档中某个数组类型的字段中的元素按照升序或降序来对原元素进行排序

使用样例


PUT _ingest/pipeline/sort_pipeline
{
  "description": "sort pipeline",
  "processors": [
    {
      "sort": {
        "field": "age_arr",
        "order": "desc"
      }
    }
  ]
}

PUT sort_test/_doc/1?pipeline=sort_pipeline
{
  "name":"age to be sort",
  "ages":[56,23,78,45,99],
  "age_arr":[56,23,78,45,99]
}

GET sort_test/_doc/1

返回

"_source" : {
    "name" : "age to be sort",
    "ages" : [
      56,
      23,
      78,
      45,
      99
    ],
    "age_arr" : [
      99,
      78,
      56,
      45,
      23
    ]
  }

20. Trim Processor: 专门用于处理字符串两端的空格问题

使用样例

PUT _ingest/pipeline/trim_pipe
{
  "description": "trim field",
  "processors": [
    {
      "trim": {
        "field": "foo"
      }
    }
  ]
}

夜月行者

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

02.pipeline常用processor

文章目录

1. Set Processor: 指定字段存在时,修改指定字段的值,指定字段不存在时,新增该字段并设置该字段的值,可以修改_index的值哦

2. Append Processor: 在一个已经存在的field上增加一些value

3. Drop Processor: 删除doc的processor

4. Remove Processor: 删除某些字段

5. Rename Processor: 修改某个field的name

6. Join Processor: 将某个field的数组内容jion成一个字符串，和python中的字符串的join方法很类似

7. JSON Processor: 将符合json格式的字符装换成json

8. KV Processor: 使用某个分隔符，将一个字段分割成k,v 格式

9. Split Processor: 用于将一个以指定分隔分开的字符串转换成一个数组类型的字段

10. Lowercase Processor: 将某个字段的内容都转成小写

11. Uppercase Processor: 该处理器类似于Lowercase Processor,将字符串文本统一转换成大写.

12. Convert Processor: 对字段的类型进行转换设置

13. Date Index Name Processor: 把文档按照日期分到按天或者月创建的索引当中去

14. Dot Expander Processor: 这个一般结合其他的processor使用，他使后面定义的processor能够使用.的方式去访问嵌套的field

15. Fail Processor: 该处理器比较简单,就是当文档通过该pipeline的时候,一旦出现异常,该pipeline指定的错误信息就会返回给请求者

16. Foreach Processor: 一个Foreach Processor是用来处理一些数组字段,数组内的每个元素都会使用到一个相同的处理器,比如

17. Pipeline Processor: 执行另一个pipeline

18. Script Processor: 使用es中的script来处理，直接是script的编程访问模式，script能访问哪些字段这里就能访问那些字段

19. Sort Processor: 用于处理数组类型的字段,可以将存储在原文档中某个数组类型的字段中的元素按照升序或降序来对原元素进行排序

20. Trim Processor: 专门用于处理字符串两端的空格问题

14. Dot Expander Processor: 这个一般结合其他的processor使用，他使后面定义的processor能够使用`.`的方式去访问嵌套的field