Elasticsearch 脚本功能

Ingenuity1992

已于 2022-02-20 19:37:36 修改

阅读量1.5k

点赞数

分类专栏：原创文章标签： elasticsearch 大数据

于 2022-02-20 19:31:53 首次发布

本文链接：https://blog.csdn.net/Ingenuity1992/article/details/123031001

版权

原创专栏收录该内容

5 篇文章 0 订阅

订阅专栏

文章目录

1. 脚本功能的基本介绍

1.1 使用场景

当需要生成自定义字段或者需要自定义评分时，可以考虑使用脚本。

1.2 基本使用模式

  "script": {
    "lang":   "...",  			//语言
    "source" | "id": "...", 	//脚本代码或者脚本ID
    "params": { ... } 			//参数
  }

1.3 默认脚本语言的变化

--《Breaking Changes in 5.0》
从5.0版本开始，默认脚本语言从groovy改为painless，一个主要的差别是对脚本参数的使用，groovy中可以直接使用，painless需要加"params"前缀。
Groovy:

{
  "script_score": {
    "script": {
      "lang": "groovy",
      "source": "Math.log(_score * 2) + my_modifier",
      "params": {
        "my_modifier": 8
      }
    }
  }
}

Painless (my_modifer is prefixed with params):

{
  "script_score": {
    "script": {
      "lang": "painless",
      "source": "Math.log(_score * 2) + params.my_modifier",
      "params": {
        "my_modifier": 8
      }
    }
  }
}

如果需要使用painless之外的脚本语言，需要在"script"."lang"参数中显示指定，目前支持的语言如下：
在这里插入图片描述

1.4 一些注意事项

1.脚本的编辑是一个耗时操作，ES对脚本只编译一次，然后将其缓存到内层，因此，脚本中的参数尽量写到"params"对象中，而不要使用硬编码：

"source": "doc['my_field'] * 2"

应该写成：

  "source": "doc['my_field'] * multiplier",
  "params": {
    "multiplier": 2
  }

2.不能在短时间内提交大量不同的脚本, 如果ES在短时间内编译了过多的脚本，抛出circuit_breaking_exception异常。默认每分钟不超过15个，可以通过"script.max_compilations_rate"动态设置。

3.为了减少编译次数，所有脚本都会被缓存。默认最多缓存100个脚本，但不设置过期时间，可通过 script.cache.expire 和 script.cache.max_size 修改。

4.存储脚本最长支持65,535字节，可以通过 script.max_size_in_bytes 设置，但还要考虑脚本引擎的上限。

2 关于Stored Script

如果脚本基本不会变动，且需要反复使用，则可以将其持久化到集群状态中 /_scripts/{id}：

POST _scripts/calculate-score
{
  "script": {
    "lang": "painless",
    "source": "Math.log(_score * 2) + params.my_modifier"
  }
}

GET _scripts/calculate-score

GET _search
{
  "query": {
    "script": {
      "script": {
        "id": "calculate-score",
        "params": {
          "my_modifier": 2
        }
      }
    }
  }
}

DELETE _scripts/calculate-score

3.使用实例

3.1 样例数据

准备两条商品数据

{
	"id": "1",
	"name": "L＇oreal/欧莱雅复颜玻尿酸水光充盈导入膨润精华液",
	"brand": "欧莱雅",
	"price": 279,
	"desc": "补水 提拉紧致 淡化细纹"
}
{
	"id": "2",
	"name": "资生堂面膜",
	"brand": "资生堂",
	"price": 199,
	"desc": "补水 美白 修复",
	"comments": ["女朋友说很好用！",
	"一直用的这款，价格也实惠。"]
}

3.2 脚本字段(新增自定义字段)

新增一个自定义字段"档次"，当"price">=250元"档次"=“高档”，反之"档次"=“中低档”。

GET test_index_20211220/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "档次": {
      "script": {
        "source": " if(doc['price'].value >= 250) {return '高档' } else { return '中低档' }"
      }
    }
  }
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index_20211220",
        "_type": "e-com",
        "_id": "1",
        "_score": 1,
        "fields": {
          "档次": [
            "高档"
          ]
        }
      },
      {
        "_index": "test_index_20211220",
        "_type": "e-com",
        "_id": "2",
        "_score": 1,
        "fields": {
          "档次": [
            "中低档"
          ]
        }
      }
    ]
  }
}

这里和预想的有点不一样，明明有查询所有文档，但是只返回了脚本字段，没有返回文档字段，如果想要返回文档字段，需要使用"_source"显示控制：

{
  "_source": {
    "includes": [
      "id",
      "name",
      "brand",
      "price",
      "desc"
    ]
  },
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "档次": {
      "script": {
        "source": " if(doc['price'].value >= 250) {return '高档' } else { return '中低档' }"
      }
    }
  }
}

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index_20211220",
        "_type": "e-com",
        "_id": "1",
        "_score": 1,
        "_source": {
          "price": 279,
          "name": "L＇oreal/欧莱雅复颜玻尿酸水光充盈导入膨润精华液",
          "id": "1",
          "brand": "欧莱雅",
          "desc": "补水 提拉紧致 淡化细纹"
        },
        "fields": {
          "档次": [
            "高档"
          ]
        }
      },
      {
        "_index": "test_index_20211220",
        "_type": "e-com",
        "_id": "2",
        "_score": 1,
        "_source": {
          "price": 199,
          "name": "资生堂面膜",
          "id": "2",
          "brand": "资生堂",
          "desc": "补水 美白 修复"
        },
        "fields": {
          "档次": [
            "中低档"
          ]
        }
      }
    ]
  }
}

3.3 在搜索和聚合中使用脚本

在这里插入图片描述
除了脚本字段，也可以在搜索上下文或者聚合上下文中使用脚本，但是与脚本字段不同的是，脚本字段是先搜索命中，再执行脚本。而搜索和聚合是先执行脚本，再搜索或者聚合，因此最后的性能取决于索引的文档数以及脚本的执行效率。

3.3.1 在搜索中使用脚本

GET test_index_20211220/_search
{
  "query": {
    "script": {
      "script": "return doc['price'].value >= 250"
    }
  }
}

// 放到filter上下文避免打分
GET test_index_20211220/_search
{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": "return doc['price'].value < 250"
        }
      }
    }
  }
}

搜索结果和一般搜索结果一样，这里不贴了。

3.3.2 在聚合中使用脚本

按价格档次分类：

GET test_index_20211220/_search
{
  "_source": {
    "includes": [
      "id",
      "name",
      "brand",
      "price",
      "desc"
    ]
  },
  "query": {
    "match_all": {}
  },
  "aggs": {
    "档次统计": {
      "terms": {
        "script": {
          "source": " if(doc['price'].value >= 250) {return '高档' } else { return '中低档' }"
        }
      }
    }
  }
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index_20211220",
        "_type": "e-com",
        "_id": "1",
        "_score": 1,
        "_source": {
          "price": 279,
          "name": "L＇oreal/欧莱雅复颜玻尿酸水光充盈导入膨润精华液",
          "id": "1",
          "brand": "欧莱雅",
          "desc": "补水 提拉紧致 淡化细纹"
        }
      },
      {
        "_index": "test_index_20211220",
        "_type": "e-com",
        "_id": "2",
        "_score": 1,
        "_source": {
          "price": 199,
          "name": "资生堂面膜",
          "id": "2",
          "brand": "资生堂",
          "desc": "补水 美白 修复"
        }
      }
    ]
  },
  "aggregations": {
    "档次统计": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "中低档",
          "doc_count": 1
        },
        {
          "key": "高档",
          "doc_count": 1
        }
      ]
    }
  }
}

3.3.3 可以使用脚本的地方

以下图片来自ES1.7版本官网文档，可能已经有更新。
在这里插入图片描述

3.4 如何在脚本中访问文档值

在脚本中，一共有三种方式访问文档值：

doc-values
stored fields
_source field

3.4.1 doc[‘field_name’]

速度最快，特点：

不可访问开启分词的字段
不能访问JSON对象类型字段(对象类型和嵌套类型等等)
可以访问一般非JSON对象数组

3.4.2 _fields[‘field_name’]

速度比doc[]方式慢，特点：

需要开启字段的 “store”: true
不会被加载到内存
通过下面两种方式访问：

_fields['field_name'].value
_fields['field_name'].values

3.4.3 _source.field_name.field_name

_source是特殊的stored field，所以速度和_fields[]相当，特点是可以访问对象类型，通过连续"."好分割每层字段名

缺点是在_source文档内容很大的时候，用_source访问的每次开销都比较大：

if at all possible, because _source must be loaded and parsed every
time it’s used. Using _source is very slow.

但是对类似top ten结果做脚本字段操作时，_source的开销就可能可以忽略了：
在这里插入图片描述
同样，如果source很大，_fields也会比source快:

Ingenuity1992

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch 脚本功能

1、脚本功能的基本介绍1.1 使用场景当需要生成自定义字段或者需要自定义评分时，可以考虑使用脚本。1.2 基本使用模式 "script": { "lang": "...", //语言 "source" | "id": "...", //脚本代码或者脚本ID "params": { ... } //参数 }1.3 默认脚本语言的变化从5.0版本开始，默认脚本语言从groovy改为painless，一个主要的差别是对脚本参数的使用，groovy中
复制链接

扫一扫