python 倒排索引正则表达式查询_elasticsearch学习笔记高级篇（十四）——实战前缀搜索、通配符搜索、正则搜索...

最新推荐文章于 2024-04-14 16:32:11 发布

weixin_39929683

最新推荐文章于 2024-04-14 16:32:11 发布

阅读量130

点赞数

文章标签： python 倒排索引正则表达式查询

本文链接：https://blog.csdn.net/weixin_39929683/article/details/111429207

版权

准备数据：

PUT /test_index/_create/1

{

"test_field": "C3D0-KD345"

}

PUT /test_index/_create/2

{

"test_field": "C3K5-DFG65"

}

PUT /test_index/_create/3

{

"test_field": "C4I8-UI365"

}

前缀搜索：

原理：前缀匹配不会计算相关度分数，与前缀过滤的唯一区别就是过滤会有cache bitset。它会扫描整个倒排索引。找到符合前缀条件的文档。所以说前缀越短，要处理的文档就越多，性能就越差，尽可能应该用长前缀搜索。

示例，搜索前缀为C3的文档：

GET /test_index/_search

{

"query": {

"match_phrase_prefix": {

"test_field": "C3"

}

结果：

{

"took" : 16,

"timed_out" : false,

"_shards" : {

"total" : 1,

"successful" : 1,

"skipped" : 0,

"failed" : 0

"hits" : {

"total" : {

"value" : 2,

"relation" : "eq"

"max_score" : 0.9808292,

"hits" : [

{

"_index" : "test_index",

"_type" : "_doc",

"_id" : "1",

"_score" : 0.9808292,

"_source" : {

"test_field" : "C3D0-KD345"

}

{

"_index" : "test_index",

"_type" : "_doc",

"_id" : "2",

"_score" : 0.9808292,

"_source" : {

"test_field" : "C3K5-DFG65"

}

]

}

通配符搜索：

通配符搜索跟前缀搜索类似，比前缀搜索要更加强大。也是需要扫描整个倒排索引，性能也是很差的。

？：表示匹配任意一个字符

：表示匹配任意多个字符

示例：通配符搜索条件为*4?的文档

GET /test_index/_search

{

"query": {

"wildcard": {

"test_field": {

"value": "*4?"

}

输出结果：

{

"took" : 1,

"timed_out" : false,

"_shards" : {

"total" : 1,

"successful" : 1,

"skipped" : 0,

"failed" : 0

"hits" : {

"total" : {

"value" : 1,

"relation" : "eq"

"max_score" : 1.0,

"hits" : [

{

"_index" : "test_index",

"_type" : "_doc",

"_id" : "1",

"_score" : 1.0,

"_source" : {

"test_field" : "C3D0-KD345"

}

]

}

正则搜索：

regexp可以说功能比之前的通配符搜索功能更加强大，但是都会扫描整个倒排索引，性能也是会非常的差。

[0-9]：指定范围内的数字

[a-z]：指定范围内的字母

.：一个字符

+：前面的正则表达式可以出现一次或多次

*：前面的正则表达式可以出现零次或多次

{n}: n是非负整数，表示匹配n次

示例，搜索条件为.*[a-z]{3}[0-9]{2}的文档

GET /test_index/_search

{

"query": {

"regexp": {

"test_field": {

"value": ".*[a-z]{3}[0-9]{2}"

}

输出结果：

{

"took" : 1,

"timed_out" : false,

"_shards" : {

"total" : 1,

"successful" : 1,

"skipped" : 0,

"failed" : 0

"hits" : {

"total" : {

"value" : 1,

"relation" : "eq"

"max_score" : 1.0,

"hits" : [

{

"_index" : "test_index",

"_type" : "_doc",

"_id" : "2",

"_score" : 1.0,

"_source" : {

"test_field" : "C3K5-DFG65"

}

]

}

weixin_39929683

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

python 倒排索引 正则表达式查询_elasticsearch学习笔记高级篇（十四）——实战前缀搜索、通配符搜索、正则搜索...

python 倒排索引正则表达式查询_elasticsearch学习笔记高级篇（十四）——实战前缀搜索、通配符搜索、正则搜索...