Elasticsearch：Painless scripting 高级编程

Elastic 中国社区官方博客

于 2020-02-08 14:05:52 发布

阅读量4.2k

点赞数

分类专栏： Elastic Elasticsearch 文章标签： elasticsearch 大数据数据库

本文为博主原创文章，未经博主允许不得转载。

本文链接：https://blog.csdn.net/UbuntuTouch/article/details/104221275

版权

Elastic 同时被 2 个专栏收录

1483 篇文章 897 订阅

订阅专栏

Elasticsearch

1015 篇文章 595 订阅

订阅专栏

在之前的文章中，我介绍了 Painless 脚本编程，并提供了有关其语法和用法的详细信息。它还涵盖了一些最佳实践，例如，为什么使用参数，何时访问文档字段时何时使用 “doc” 值而不是 “ _source” 以及如何动态创建字段等。

之前的文章：

在本文中，我们将探讨 Painless 脚本的更多用法。本文介绍了在查询上下文中使用 Painless 脚本，过滤上下文，在脚本中使用条件，删除字段/嵌套字段，访问嵌套对象，在评分中使用脚本等。

教程

首先，让我们使用可用于本文其余部分的数据集：

PUT tweets/_bulk
{"index":{"_id":1}}
{"username":"tom","posted_date":"2017/07/25" ,"message": "I brought apple stock at the best price" ,"tags": ["stock","money"] , "info":{"device":"mobile", "os": "ios"}, "likes": 10}
{"index":{"_id":2}}
{"username":"mary","posted_date":"2017/06/25" ,"message": "Machine learning is the future" ,"tags": ["ai","tech"] , "info":{"device":"desktop", "os": "ios"}, "likes": 100}
{"index":{"_id":3}}
{"username":"tom","posted_date":"2017/07/27" ,"message": "just tweeting" ,"tags": ["confused"] , "info":{"device":"mobile", "os": "win"}, "likes": 0}
{"index":{"_id":4}}
{"username":"mary","posted_date":"2017/07/28" ,"message": "exploring painless" ,"tags": ["elastic"] , "info":{"device":"mobile", "os": "linux"}, "likes": 100}
{"index":{"_id":5}}
{"username":"mary","posted_date":"2017/05/20" ,"message": "painless is fun but its a new scripting language in the town" ,"tags": ["elastic","painless","scripting"] , "info":{"device":"mobile", "os": "linux"}, "likes": 1000}

在上面，我们通过 bulk API 来把我们的实验数据导入到 tweets 索引中。

Script Query

脚本查询使我们可以在每个文档上执行脚本。脚本查询通常在过滤器上下文中使用。如果要在查询或过滤器上下文中包含脚本，请确保将脚本嵌入脚本对象（"script"：{}）中。因此，在下面的示例中，你将在 script 标签内看到 script 标签。

让我们尝试一个例子。让我们找出所有包含字符串 “painless” 且长度大于25个字符的推文。

GET tweets/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "message": "painless"
          }
        }
      ],
      "filter": [
        {
          "script": {
            "script": {
              "source": "doc['message.keyword'].value.length() > params.length",
              "params": {
                "length": 25
              }
            }
          }
        }
      ]
    }
  }
}

返回结果：

    "hits" : [
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.60910475,
        "_source" : {
          "username" : "mary",
          "posted_date" : "2017/05/20",
          "message" : "painless is fun but its a new scripting language in the town",
          "tags" : [
            "elastic",
            "painless",
            "scripting"
          ],
          "info" : {
            "device" : "mobile",
            "os" : "linux"
          },
          "likes" : 1000
        }
      }
    ]

Aggregation 中的 Scripts

脚本也可以用于聚合中。对于聚合，我们通常使用字段（非分析字段）中的值执行聚合。使用脚本，可以从现有字段中提取值，从多个字段中追加值，然后对新派生的值进行聚合。

在上面的推文中，我们仅包含 “posted_date” 信息。如果我们想找出每月的推文数量怎么办？下面是一个示例，显示了聚合中脚本的使用：

GET tweets/_search
{
  "size": 0,
  "aggs": {
    "my_terms_agg": {
      "terms": {
        "script": {
          "source": """
            ZonedDateTime date = doc['posted_date'].value;
            return date.getMonth()
         """
        }
      }
    }
  }
}

在上面我们通过 script 来获得每个文档的月份，然后让这个生产的月份来进行做聚合：

  "aggregations" : {
    "my_terms_agg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "JULY",
          "doc_count" : 3
        },
        {
          "key" : "JUNE",
          "doc_count" : 1
        },
        {
          "key" : "MAY",
          "doc_count" : 1
        }
      ]
    }
  }

使用脚本来删除一个字段

我们可以使用脚本删除字段/嵌套字段。你要做的就是使用 remove 方法并传入字段/嵌套字段名称。例如，假设我们要删除 ID 为5的文档的嵌套字段 “device”。

POST tweets/_update/5
{
  "script": {
    "source": "ctx._source.info.remove(params.fieldname)",
    "params": {
      "fieldname": "device"
    }
  }
}

我们来重新查询id为5的文档：

GET tweets/_doc/5

返回的结果：

  "_source" : {
    "username" : "mary",
    "posted_date" : "2017/05/20",
    "message" : "painless is fun but its a new scripting language in the town",
    "tags" : [
      "elastic",
      "painless",
      "scripting"
    ],
    "info" : {
      "os" : "linux"
    },
    "likes" : 1000
  }

我们可以看到在 info 下的 device 被删除了。

利用 Scripts 来定制分数

当我们执行匹配查询时，Elasticsearch 返回匹配结果，并为每个匹配的文档计算分数，以显示文档与给定查询的匹配程度。尽管默认算法 BM25 很好地完成了评分/相关性，但有时必须通过其他算法来回答相关性问题，或者必须通过其他评分启发式方法来增强相关性。在这里，Elasticsearch 的 script_score 和 function_score 功能变得有用。

假设我们要搜索 “painless” 文本，但要在搜索结果顶部显示带有更多 “likes” 赞的推文。它更像是顶部的热门推文/流行推文。让我们来看看它的实际效果。

GET tweets/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "message": "painless"
        }
      },
      "script_score": {
        "script": {
          "source": "1 + doc['likes'].value"
        }
      }
    }
  }
}

返回结果：

        "hits" : [
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 529.9271,
        "_source" : {
          "username" : "mary",
          "posted_date" : "2017/05/20",
          "message" : "painless is fun but its a new scripting language in the town",
          "tags" : [
            "elastic",
            "painless",
            "scripting"
          ],
          "info" : {
            "os" : "linux"
          },
          "likes" : 1000
        }
      },
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 98.51341,
        "_source" : {
          "username" : "mary",
          "posted_date" : "2017/07/28",
          "message" : "exploring painless",
          "tags" : [
            "elastic"
          ],
          "info" : {
            "device" : "mobile",
            "os" : "linux"
          },
          "likes" : 100
        }
      }
    ]

在上面的示例中，如果由于进行了常规查询而未创建自定义分数，则由于 TF/IDF，文档4将会位于顶部（由于这个句子比较短），也就是文档分数将高于文档5。

GET tweets/_search
{
  "query": {
    "match": {
      "message": "painless"
    }
  }
}

返回的结果是：

    "hits" : [
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.9753803,
        "_source" : {
          "username" : "mary",
          "posted_date" : "2017/07/28",
          "message" : "exploring painless",
          "tags" : [
            "elastic"
          ],
          "info" : {
            "device" : "mobile",
            "os" : "linux"
          },
          "likes" : 100
        }
      },
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.5293977,
        "_source" : {
          "username" : "mary",
          "posted_date" : "2017/05/20",
          "message" : "painless is fun but its a new scripting language in the town",
          "tags" : [
            "elastic",
            "painless",
            "scripting"
          ],
          "info" : {
            "os" : "linux"
          },
          "likes" : 1000
        }
      }
    ]

文档 id 为4的分数高于 id 为5的文档。

Elastic 中国社区官方博客

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch：Painless scripting 高级编程

在之前的文章中，我介绍了Painless 脚本编程，并提供了有关其语法和用法的详细信息。它还涵盖了一些最佳实践，例如，为什么使用参数，何时访问文档字段时何时使用“doc”值而不是“ _source”以及如何动态创建字段等。之前的文章：Elasticsearch：Painless scripting Elasticsearch: Painless script编程在本文中，我们将探讨 ...
复制链接

扫一扫

专栏目录