elasticsearch实战三部曲之三：搜索操作

2401_84831823

于 2024-05-14 15:14:57 发布

阅读量590

点赞数 26

分类专栏：程序员文章标签： elasticsearch 大数据搜索引擎

本文链接：https://blog.csdn.net/2401_84831823/article/details/138856407

版权

程序员专栏收录该内容

114 篇文章 0 订阅

订阅专栏

得到结果：

{

“took”: 4,

“timed_out”: false,

“_shards”: {

“total”: 5,

“successful”: 5,

“skipped”: 0,

“failed”: 0

“hits”: {

“total”: 1,

“max_score”: 1,

“hits”: [

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “1”,

“_score”: 1,

“_source”: {

“id”: “1”,

“title”: “Deep Learning”,

“language”: “python”,

“author”: “Yoshua Bengio”,

“price”: 549,

“publish_time”: “2016-11-18”,

“description”: “written by three experts in the field, deep learning is the only comprehensive book on the subject.”

}

]

}

请求参数中使用了constant_score 后，查询将以非评分模式来执行 term，并以一作为统一评分；

查看分词效果

text类型的字段会被分词后构建倒排索引，来看看title字段的值为"Core Java"时的分词效果：

GET englishbooks/_analyze

{

“field”:“title”,

“text”:“Core Java”

}

响应如下所示，"Core Java"被分"core"和"java"两个词，也就是说我们以词项"core"或"java"搜索title字段都能收到对应文档：

{

“tokens”: [

{

“token”: “core”,

“start_offset”: 0,

“end_offset”: 4,

“type”: “”,

“position”: 0

{

“token”: “java”,

“start_offset”: 5,

“end_offset”: 9,

“type”: “”,

“position”: 1

}

]

}

需要注意的是分词后的结果都是小写，这是分词器的处理结果；

词项查询（term query）

前面我们查看分词效果发现"Core Java"被分"core"和"java"两个词，现在就以"java"为关键词搜索一下试试：

GET englishbooks/_search

{

“query”:{

“term”:{“title”:“java”}

}

结果如下，title中有java关键词的两个文档都被搜到：

{

“took”: 4,

“timed_out”: false,

“_shards”: {

“total”: 5,

“successful”: 5,

“skipped”: 0,

“failed”: 0

“hits”: {

“total”: 2,

“max_score”: 0.5754429,

“hits”: [

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “4”,

“_score”: 0.5754429,

“_source”: {

“id”: “4”,

“title”: “Thinking in Java”,

“language”: “java”,

“author”: “Bruce Eckel”,

“price”: 70.1,

“publish_time”: “2015-07-06”,

“description”: “Thinking in Java should be read cover to cover by every Java programmer, then kept close at hand for frequent reference. The exercises are challenging, and the chapter on Collections is superb!”

}

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “3”,

“_score”: 0.2876821,

“_source”: {

“id”: “3”,

“title”: “Core Java”,

“language”: “java”,

“author”: “Horstmann”,

“price”: 85.9,

“publish_time”: “2016-06-01”,

“description”: "The book is aimed at experienced programmers who want to learn how to write useful Java applications and applets. "

}

]

}

分词查询（match query）

term query的特点是将输入的内容作为一个词项来用，例如以下的查询是没有结果的：

GET englishbooks/_search

{

“query”:{

“term”:{“title”:“core java”}

}

上述查询没有结果的原因，是因为"core java"被当做一个词项去查询了，而title的分词结果中只有"core"、"java"这些分词过的词项，并没有一个叫做"core java"的词项，所以搜不到结果；

如果输入的查询条件"core java"也被做一次分词处理，再把处理结果"core"和"java"用来搜索，应该就能得到结果了，match query就是用来对输入条件做分词处理的，如下：

GET englishbooks/_search

{

“query”:{

“match”:{“title”:“Core Java”}

}

搜索结果如下，包含了java的两条记录都被查出来了：

{

“took”: 8,

“timed_out”: false,

“_shards”: {

“total”: 5,

“successful”: 5,

“skipped”: 0,

“failed”: 0

“hits”: {

“total”: 2,

“max_score”: 0.5754429,

“hits”: [

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “4”,

“_score”: 0.5754429,

“_source”: {

“id”: “4”,

“title”: “Thinking in Java”,

“language”: “java”,

“author”: “Bruce Eckel”,

“price”: 70.1,

“publish_time”: “2015-07-06”,

}

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “3”,

“_score”: 0.5753642,

“_source”: {

“id”: “3”,

“title”: “Core Java”,

“language”: “java”,

“author”: “Horstmann”,

“price”: 85.9,

“publish_time”: “2016-06-01”,

“description”: "The book is aimed at experienced programmers who want to learn how to write useful Java applications and applets. "

}

]

}

如果我们的本意是只要"Core Java"的匹配结果，上面的结果显然是不符合要求的，此时可以给查询条件加个"operator":"and"属性，就会查询匹配了所有关键词的文档，注意json的结构略有变化，以前title的属性是搜索条件，现在变成了一个json对象，里面的query属性是原来的搜索条件：

GET englishbooks/_search

{

“query”:{

“match”:{

“title”:{

“query”:“Core Java”,

“operator”:“and”

}

这次的搜索结果就是同时匹配了"core"和"java"两个词项的记录了(为什么core和java是小写？因为"Core Java"被分词后改为了小写，再去搜索的)：

{

“took”: 11,

“timed_out”: false,

“_shards”: {

“total”: 5,

“successful”: 5,

“skipped”: 0,

“failed”: 0

“hits”: {

“total”: 1,

“max_score”: 0.5753642,

“hits”: [

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “3”,

“_score”: 0.5753642,

“_source”: {

“id”: “3”,

“title”: “Core Java”,

“language”: “java”,

“author”: “Horstmann”,

“price”: 85.9,

“publish_time”: “2016-06-01”,

“description”: "The book is aimed at experienced programmers who want to learn how to write useful Java applications and applets. "

}

]

}

match_phrase搜索

match_phrase搜索和前面的match搜索相似，并且有以下两个特点：

分词后的所有词项都要匹配上，也就是前面的"operator":"and"属性的效果；
分析后的词项顺序要和搜索字段的顺序一致，才能匹配上；

GET englishbooks/_search

{

“query”:{

“match_phrase”:{“title”:“Core Java”}

}

上述查询可以搜索到结果，但如果将"Core Java"改成"Java Core"就搜不到结果了，但是match query用"Java Core"是可以搜到结果的；

match_phrase_prefix搜索

match_phrase_prefix的功能和前面的match_phrase类似，不过match_phrase_prefix支持最后一个词项做前缀匹配，如下所示，"Core J"这个搜索条件用match_phrase是搜不到结果的，但是match_phrase_prefix可以，因为"J"可以作为前缀和"Java"匹配：

GET englishbooks/_search

{

“query”:{

“match_phrase”:{“title”:“Core J”}

}

multi_match搜素

multi_match是在match的基础上支持多字段搜索，以下查询就是用"1986"和"deep"这两个词项，同时搜索title和description两个字段：

GET englishbooks/_search

{

“query”:{

“multi_match”:{

“query”:“1986 deep”,

“fields”:[“title”, “description”]

}

响应如下，可见title和description中含有词项"1986"或者"deep"的文档都被返回了：

{

“took”: 4,

“timed_out”: false,

“_shards”: {

“total”: 5,

“successful”: 5,

“skipped”: 0,

“failed”: 0

“hits”: {

“total”: 2,

“max_score”: 0.79237825,

“hits”: [

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “2”,

“_score”: 0.79237825,

“_source”: {

“id”: “2”,

“title”: “Compilers”,

“language”: “c”,

“author”: “Alfred V.Aho”,

“price”: 62.5,

“publish_time”: “2011-01-01”,

“description”: “In the time since the 1986 edition of this book, the world of compiler designhas changed significantly.”

}

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “1”,

“_score”: 0.2876821,

“_source”: {

“id”: “1”,

“title”: “Deep Learning”,

“language”: “python”,

“author”: “Yoshua Bengio”,

“price”: 549,

“publish_time”: “2016-11-18”,

“description”: “written by three experts in the field, deep learning is the only comprehensive book on the subject.”

}

]

}

terms query

terms是term查询的升级，用来查询多个词项：

GET englishbooks/_search

{

“query”:{

“terms”:{

“title”:[“deep”, “core”]

}

响应如下，title中含有deep和core的文档都被查到：

{

“took”: 5,

“timed_out”: false,

“_shards”: {

“total”: 5,

“successful”: 5,

“skipped”: 0,

“failed”: 0

“hits”: {

“total”: 2,

“max_score”: 1,

“hits”: [

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “1”,

“_score”: 1,

“_source”: {

“id”: “1”,

“title”: “Deep Learning”,

“language”: “python”,

“author”: “Yoshua Bengio”,

“price”: 549,

“publish_time”: “2016-11-18”,

“description”: “written by three experts in the field, deep learning is the only comprehensive book on the subject.”

}

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “3”,

“_score”: 1,

“_source”: {

“id”: “3”,

“title”: “Core Java”,

“language”: “java”,

“author”: “Horstmann”,

“price”: 85.9,

“publish_time”: “2016-06-01”,

“description”: "The book is aimed at experienced programmers who want to learn how to write useful Java applications and applets. "

}

]

}

范围查询

range query是范围查询，例如查询publish_time在"2016-01-01"到"2016-12-31"之间的文档：

GET englishbooks/_search

{

“query”:{

“range”:{

“publish_time”:{

“gte”:“2016-01-01”,

“lte”:“2016-12-31”,

“format”:“yyyy-MM-dd”

}

篇幅所限，此处略去返回结果；

exists query

exists query返回的是字段中至少有一个非空值的文档：

GET englishbooks/_search

{

“query”:{

“exists”:{

“field”:“author”

}

前缀查询

用于查询某个字段是否以给定前缀开始：

GET englishbooks/_search

{

“query”:{

“prefix”:{

“title”:“cor”

}

以上请求可以查到title字段为"Core Java"的文档：

{

“took”: 6,

“timed_out”: false,

“_shards”: {

“total”: 5,

“successful”: 5,

“skipped”: 0,

“failed”: 0

“hits”: {

“total”: 1,

“max_score”: 1,

“hits”: [

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “3”,

“_score”: 1,

“_source”: {

“id”: “3”,

“title”: “Core Java”,

“language”: “java”,

“author”: “Horstmann”,

“price”: 85.9,

“publish_time”: “2016-06-01”,

“description”: "The book is aimed at experienced programmers who want to learn how to write useful Java applications and applets. "

}

]

}

通配符查询

以下查询，可以搜到title字段中含有"core"的文档，另外需要注意的是，"?“匹配一个字符，”*"匹配零个或者多个字符：

GET englishbooks/_search

{

“query”:{

“wildcard”:{

“title”:“cor?”

}

正则表达式

使用属性regexp可以进行正则表达式查询，例如查找description字段带有4位数字的分词的文档：

GET englishbooks/_search

{

“query”:{

“regexp”:{

“description”:“[0-9]{4}”

}

查找结果如下，description字段中带有数字1986：

{

“took”: 4,

“timed_out”: false,

“_shards”: {

“total”: 5,

“successful”: 5,

“skipped”: 0,

“failed”: 0

“hits”: {

“total”: 1,

“max_score”: 1,

“hits”: [

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “2”,

“_score”: 1,

“_source”: {

“id”: “2”,

“title”: “Compilers”,

“language”: “c”,

“author”: “Alfred V.Aho”,

“price”: 62.5,

“publish_time”: “2011-01-01”,

“description”: “In the time since the 1986 edition of this book, the world of compiler designhas changed significantly.”

}

]

}

模糊查询(fuzzy query)

fuzzy是通过计算词项与文档的编辑距离来得到结果的，例如查找description字段还有分词"1986"的时候，不小心输入了"1987"，通过fuzzy查询也能得到结果，只是得分变低了，请求内容如下所示：

GET englishbooks/_search

{

“query”:{

“fuzzy”:{

“description”:“1987”

}

搜索到的文档如下所示，得分只有0.5942837，低于用"1986"查询的0.79237825：

{

“took”: 5,

“timed_out”: false,

“_shards”: {

“total”: 5,

“successful”: 5,

“skipped”: 0,

“failed”: 0

“hits”: {

“total”: 1,

“max_score”: 0.5942837,

“hits”: [

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “2”,

“_score”: 0.5942837,

“_source”: {

“id”: “2”,

“title”: “Compilers”,

“language”: “c”,

“author”: “Alfred V.Aho”,

“price”: 62.5,

“publish_time”: “2011-01-01”,

“description”: “In the time since the 1986 edition of this book, the world of compiler designhas changed significantly.”

}

]

}

需要注意的是，fuzzy查询时消耗资源较大；

最后

自我介绍一下，小编13年上海交大毕业，曾经在小公司待过，也去过华为、OPPO等大厂，18年进入阿里一直到现在。

深知大多数Java工程师，想要提升技能，往往是自己摸索成长，自己不成体系的自学效果低效漫长且无助。

因此收集整理了一份《2024年Java开发全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友，同时减轻大家的负担。

既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，基本涵盖了95%以上Java开发知识点，不论你是刚入门Java开发的新手，还是希望在技术上不断提升的资深开发者，这些资料都将为你打开新的学习之门！

如果你觉得这些内容对你有帮助，需要这份全套学习资料的朋友可以戳我获取！！

由于文件比较大，这里只是将部分目录截图出来，每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频，并且会持续更新！

“fuzzy”:{

“description”:“1987”

}

搜索到的文档如下所示，得分只有0.5942837，低于用"1986"查询的0.79237825：

{

“took”: 5,

“timed_out”: false,

“_shards”: {

“total”: 5,

“successful”: 5,

“skipped”: 0,

“failed”: 0

“hits”: {

“total”: 1,

“max_score”: 0.5942837,

“hits”: [

{

“_index”: “englishbooks”,

“_type”: “IT”,

“_id”: “2”,

“_score”: 0.5942837,

“_source”: {

“id”: “2”,

“title”: “Compilers”,

“language”: “c”,

“author”: “Alfred V.Aho”,

“price”: 62.5,

“publish_time”: “2011-01-01”,

“description”: “In the time since the 1986 edition of this book, the world of compiler designhas changed significantly.”

}

]

}

需要注意的是，fuzzy查询时消耗资源较大；

最后

自我介绍一下，小编13年上海交大毕业，曾经在小公司待过，也去过华为、OPPO等大厂，18年进入阿里一直到现在。

深知大多数Java工程师，想要提升技能，往往是自己摸索成长，自己不成体系的自学效果低效漫长且无助。

[外链图片转存中…(img-JHIJ5PZR-1715670883073)]

[外链图片转存中…(img-63XqcNEm-1715670883074)]

[外链图片转存中…(img-c9zgZQNI-1715670883074)]

如果你觉得这些内容对你有帮助，需要这份全套学习资料的朋友可以戳我获取！！

由于文件比较大，这里只是将部分目录截图出来，每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频，并且会持续更新！

2401_84831823

关注

26
点赞
踩
13

收藏

觉得还不错? 一键收藏
0
评论
elasticsearch实战三部曲之三：搜索操作

自我介绍一下，小编13年上海交大毕业，曾经在小公司待过，也去过华为、OPPO等大厂，18年进入阿里一直到现在。深知大多数Java工程师，想要提升技能，往往是自己摸索成长，自己不成体系的自学效果低效漫长且无助。因此收集整理了一份《2024年Java开发全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友，同时减轻大家的负担。
复制链接

扫一扫