数据清洗-使用es的ingest

 通常es产品里面,数据清洗的logstash,这里使用另外的ingest做简单的数据处理

这里是根据逗号分隔数据,变成数组

PUT spring_blogs/_doc/1
{
  "title":"Introducing spring framework......",
  "tags":"spring,spring boot,spring cloud",
  "content":"You konw, for spring framework"
}

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "to split blog tags",
    "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "title": "Introducing spring framework......",
        "tags": "spring,spring boot,spring cloud",
        "content": "You konw, for spring framework"
      }
    },
    {
      "_index": "index",
      "_id": "idxx",
      "_source": {
        "title": "Introducing cloud computering",
        "tags": "docker,k8s,ingrest",
        "content": "You konw, for cloud"
      }
    }
  ]
}
# 为ES添加一个 Pipeline
PUT _ingest/pipeline/spring_blog_pipeline
{
  "description": "a spring blog pipeline",
  "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        }
      },

      {
        "set":{
          "field": "views",
          "value": 0
        }
      }
    ]
}

#查看Pipleline
GET _ingest/pipeline/spring_blog_pipeline

#测试pipeline
POST _ingest/pipeline/spring_blog_pipeline/_simulate
{
  "docs": [
    {
      "_source": {
        "title": "Introducing cloud computering",
        "tags": "docker,k8s,ingrest",
        "content": "You konw, for cloud"
      }
    }
  ]
}

DELETE spring_blogs

PUT spring_blogs/_doc/1
{
  "title":"Introducing spring framework......",
  "tags":"spring,spring boot,spring cloud",
  "content":"You konw, for spring framework"
}

#使用pipeline更新数据
PUT spring_blogs/_doc/2?pipeline=spring_blog_pipeline
{
  "title": "Introducing cloud computering",
  "tags": "docker,k8s,ingrest",
  "content": "You konw, for cloud"
}

POST spring_blogs/_search


#增加update_by_query的条件
POST spring_blogs/_update_by_query?pipeline=spring_blog_pipeline
{
    "query": {
        "bool": {
            "must_not": {
                "exists": {
                    "field": "views"
                }
            }
        }
    }
}

最终处理后的数据

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "spring_blogs",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "title" : "Introducing cloud computering",
          "content" : "You konw, for cloud",
          "views" : 0,
          "tags" : [
            "docker",
            "k8s",
            "ingrest"
          ]
        }
      },
      {
        "_index" : "spring_blogs",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "Introducing spring framework......",
          "content" : "You konw, for spring framework",
          "views" : 0,
          "tags" : [
            "spring",
            "spring boot",
            "spring cloud"
          ]
        }
      }
    ]
  }
}

还可以使用Script Prcessor,这种自由度更大,可以处理稍微复杂点数据

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "to split spring blog tags",
    "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        }
      },
      {
        "script": {
          "source": """
          if(ctx.containsKey("title")){
            ctx.content_length = ctx.title.length();
          }else{
            ctx.content_length=0;
          }


          """
        }
      },
      {
        "set": {
          "field": "views",
          "value": 0
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "title": "Introducing spring framework......",
        "tags": "spring,spring boot,spring cloud",
        "content": "You konw, for spring framework"
      }
    },
    {
      "_index": "index",
      "_id": "idxx",
      "_source": {
        "title": "Introducing cloud computering",
        "tags": "docker,k8s,ingrest",
        "content": "You konw, for cloud"
      }
    }
  ]
}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值