elasticsearch之Pipeline&Processor（管道处理数据）

最新推荐文章于 2024-10-23 19:51:04 发布

树叶要走风怎么挽留

最新推荐文章于 2024-10-23 19:51:04 发布

阅读量2.3k

点赞数

分类专栏： elasticsearch 技术使用总结知识总结文章标签： elasticsearch

本文链接：https://blog.csdn.net/weixin_44993313/article/details/106890022

版权

技术使用总结同时被 3 个专栏收录

102 篇文章 0 订阅

订阅专栏

知识总结

84 篇文章 0 订阅

订阅专栏

elasticsearch

48 篇文章 2 订阅

订阅专栏

###Pipeline & Processor
###● Pipeline - 管道会对通过的数据（⽂档），按照顺序进⾏加⼯
###● Processor - Elasticsearch 对⼀些加⼯的⾏为进⾏了抽象包装
###● Elasticsearch 有很多内置的Processors。也⽀持通过插件的⽅式，实现⾃⼰的 Processor

pipeline 就是一组processors

我们隔离通过_ingest/pipeline/_simulate API 模拟管道行为，进行测试

description: 描述我们的 ingest pipeline 是用来做什么的

processors: 我们的pipeline处理流水线，可以是多个处理方式的集合

docs：模拟数据

将我们的数据中的tags字段转为数组

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "to split blog tags",
    "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "title": "Introducing big data......",
        "tags": "hadoop,elasticsearch,spark",
        "content": "You konw, for big data"
      }
    },
    {
      "_index": "index",
      "_id": "idxx",
      "_source": {
        "title": "Introducing cloud computering",
        "tags": "openstack,k8s",
        "content": "You konw, for cloud"
      }
    }
  ]
}

将我们的tags字段转为数组，并且给每条数据新增一个views字段

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "to split blog tags",
    "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        }
      },
      {
        "set": {
          "field": "views",
          "value": "0"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "title": "Introducing big data......",
        "tags": "hadoop,elasticsearch,spark",
        "content": "You konw, for big data"
      }
    },
    {
      "_index": "index",
      "_id": "idxx",
      "_source": {
        "title": "Introducing cloud computering",
        "tags": "openstack,k8s",
        "content": "You konw, for cloud"
      }
    }
  ]
}

Pipeline API

Action Sample

添加或者更新

PUT _ingest/pipeline/my-pipeline-id
{
  "description": "describe pipeline",
  "processors": [
    {
      "set": {
        "field": "foo",
        "value": "bar"
      }
    }
  ]
}

获取

GET _ingest/pipeline/my-pipeline-id

删除

DELETE _ingest/pipeline/my-pipeline-id

添加pipeline 并测试

PUT _ingest/pipeline/blog_pepeline
{
  "description": "a blog pipeline",
  "processors": [
    {
      "split": {
        "field": "tags",
        "separator": ","
      }
    },
    {
      "set": {
        "field": "views",
        "value": 0
      }
    }
  ]
}

查看pipeline

GET _ingest/pipeline/blog_pepeline

测试指定的pipeline

POST _ingest/pipeline/blog_pepeline/_simulate
{
  "docs": [
    {
      "_source": {
        "title": "Introducing cloud computering",
        "tags": "openstack,k8s",
        "content": "You konw, for cloud"
      }
    }
  ]
}

不使用pipeline更新数据

PUT tech_blogs/_doc/1
{
  "title": "Introducing big data......",
  "tags": "hadoop,elasticsearch,spark",
  "content": "You konw, for big data"
}

使用pipeline 更新数据

POST tech_blogs/_doc?pipeline=blog_pepeline
{
  "title": "Introducing cloud computering",
  "tags": "openstack,k8s",
  "content": "You konw, for cloud"
}

查看数据一条是pipeline 处理过的，一条没有

GET tech_blogs/_search
GET tech_blogs/_mapping

因为我们把tags的数据改为了数组，所以我们直接进行update by query 是会报错的，

POST tech_blogs/_update_by_query?pipeline=blog_pepeline

我们可以增加 update by query的条件，进行限定，当数据是字符串的时候，再对他进行修改，已经是数组的tags的数据，不做操作，也说明update by query 支持限定设置

我们指定没有views字段的树进行blog_pepeline处理

POST tech_blogs/_update_by_query?pipeline=blog_pepeline
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "views"
        }
      }
    }
  }
}