###Pipeline & Processor
###● Pipeline - 管道会对通过的数据(⽂档),按照顺序进⾏加⼯
###● Processor - Elasticsearch 对⼀些加⼯的⾏为进⾏了抽象包装
###● Elasticsearch 有很多内置的Processors。也⽀持通过插件的⽅式,实现⾃⼰的 Processor
pipeline 就是一组processors
我们隔离通过_ingest/pipeline/_simulate API 模拟管道行为,进行测试
description: 描述我们的 ingest pipeline 是用来做什么的
processors: 我们的pipeline处理流水线,可以是多个处理方式的集合
docs:模拟数据
将我们的数据中的tags字段转为数组
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "to split blog tags",
"processors": [
{
"split": {
"field": "tags",
"separator": ","
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"title": "Introducing big data......",
"tags": "hadoop,elasticsearch,spark",
"content": "You konw, for big data"
}
},
{
"_index": "index",
"_id": "idxx",
"_source": {
"title": "Introducing cloud computering",
"tags": "openstack,k8s",
"content": "You konw, for cloud"
}
}
]
}
将我们的tags字段转为数组,并且给每条数据新增一个views字段
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "to split blog tags",
"processors": [
{
"split": {
"field": "tags",
"separator": ","
}
},
{
"set": {
"field": "views",
"value": "0"
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"title": "Introducing big data......",
"tags": "hadoop,elasticsearch,spark",
"content": "You konw, for big data"
}
},
{
"_index": "index",
"_id": "idxx",
"_source": {
"title": "Introducing cloud computering",
"tags": "openstack,k8s",
"content": "You konw, for cloud"
}
}
]
}
Pipeline API
Action Sample
添加或者更新
PUT _ingest/pipeline/my-pipeline-id
{
"description": "describe pipeline",
"processors": [
{
"set": {
"field": "foo",
"value": "bar"
}
}
]
}
获取
GET _ingest/pipeline/my-pipeline-id
删除
DELETE _ingest/pipeline/my-pipeline-id
添加pipeline 并测试
PUT _ingest/pipeline/blog_pepeline
{
"description": "a blog pipeline",
"processors": [
{
"split": {
"field": "tags",
"separator": ","
}
},
{
"set": {
"field": "views",
"value": 0
}
}
]
}
查看pipeline
GET _ingest/pipeline/blog_pepeline
测试指定的pipeline
POST _ingest/pipeline/blog_pepeline/_simulate
{
"docs": [
{
"_source": {
"title": "Introducing cloud computering",
"tags": "openstack,k8s",
"content": "You konw, for cloud"
}
}
]
}
不使用pipeline更新数据
PUT tech_blogs/_doc/1
{
"title": "Introducing big data......",
"tags": "hadoop,elasticsearch,spark",
"content": "You konw, for big data"
}
使用pipeline 更新数据
POST tech_blogs/_doc?pipeline=blog_pepeline
{
"title": "Introducing cloud computering",
"tags": "openstack,k8s",
"content": "You konw, for cloud"
}
查看数据 一条是pipeline 处理过的,一条没有
GET tech_blogs/_search
GET tech_blogs/_mapping
因为我们把tags的数据改为了数组,所以我们直接进行update by query 是会报错的,
POST tech_blogs/_update_by_query?pipeline=blog_pepeline
我们可以增加 update by query的条件,进行限定,当数据是字符串的时候,再对他进行修改,已经是数组的tags的数据,不做操作,也说明update by query 支持限定设置
我们指定没有views字段的树进行blog_pepeline处理
POST tech_blogs/_update_by_query?pipeline=blog_pepeline
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "views"
}
}
}
}
}
这是我们所有的数据都已经变成了pipeline处理过的数据
GET tech_blogs/_search
GET tech_blogs