Elasticsearch——搜索API详解

码炫课堂-码哥

于 2024-07-29 10:48:46 发布

阅读量1.4k

点赞数 10

分类专栏： elasticsearch专题文章标签： elasticsearch 搜索引擎

本文链接：https://blog.csdn.net/smart_an/article/details/140765293

版权

作者简介：大家好，我是smart哥，前中兴通讯、美团架构师，现某互联网公司CTO

联系qq：184480602，加我进群，大家一起学习，一起进步，一起对抗互联网寒冬

学习必须往深处挖，挖的越深，基础越扎实！

阶段1、深入多线程

 阶段2、深入多线程设计模式

 阶段3、深入juc源码解析

阶段4、深入jdk其余源码解析

阶段5、深入jvm源码解析

码哥源码部分

码哥讲源码-原理源码篇【2024年最新大厂关于线程池使用的场景题】

码哥讲源码【炸雷啦！炸雷啦！黄光头他终于跑路啦！】

码哥讲源码-【jvm课程前置知识及c/c++调试环境搭建】

码哥讲源码-原理源码篇【揭秘join方法的唤醒本质上决定于jvm的底层析构函数】

码哥源码-原理源码篇【Doug Lea为什么要将成员变量赋值给局部变量后再操作？】

码哥讲源码【你水不是你的错,但是你胡说八道就是你不对了！】

码哥讲源码【谁再说Spring不支持多线程事务，你给我抽他！】

终结B站没人能讲清楚红黑树的历史，不服等你来踢馆！

打脸系列【020-3小时讲解MESI协议和volatile之间的关系，那些将x86下的验证结果当作最终结果的水货们请闭嘴】

搜索

1、搜索入门

搜索分为两个过程：

当向索引中保存文档时，默认情况下，es 会保存两份内容，一份是 _source 中的数据，另一份则是通过分词、排序等一系列过程生成的倒排索引文件，倒排索引中保存了词项和文档之间的对应关系。
搜索时，当 es 接收到用户的搜索请求之后，就会去倒排索引中查询，通过的倒排索引中维护的倒排记录表找到关键词对应的文档集合，然后对文档进行评分、排序、高亮等处理，处理完成后返回文档。

2、简单搜索

2.1、match_all——查询所有

    GET /bank/_search
    {
      "query": {
        "match_all": {}
      }
    }

简写：

    GET /bank/_search

结果：

因为没有设置查询条件，所有最大的得分是 1.0。

这里并没有把所有的数据都展示出来，因为默认是有分页功能的。

2.2、term——词项查询

即 term 查询，就是根据词去查询，查询指定字段中包含给定单词的文档，term 查询不被解析，只有搜索的词和文档中的词精确匹配，才会返回文档。应用场景如：人名、地名等等。

    GET /bank/_search
    {
      "query": {
        "term": {
          "city.keyword": {
            "value": "Brogan"
          }
        }
      }
    }

结果：

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 6.5032897,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 6.5032897,
            "_source" : {
              "account_number" : 1,
              "balance" : 39225,
              "firstname" : "Amber",
              "lastname" : "Duke",
              "age" : 32,
              "gender" : "M",
              "address" : "880 Holmes Lane",
              "employer" : "Pyrami",
              "email" : "amberduke@pyrami.com",
              "city" : "Brogan",
              "state" : "IL"
            }
          }
        ]
      }
    }

2.3、from/size——分页

默认返回前 10 条数据，es 中也可以像关系型数据库一样，给一个分页参数：

from：从第几条开始。
size：多少条数据。

    GET /bank/_search
    {
      "query": {
        "term": {
          "age": {
            "value": 32
          }
        }
      },
      "from": 0,
      "size": 2
    }

    {
      "took" : 0,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 52,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "account_number" : 1,
              "balance" : 39225,
              "firstname" : "Amber",
              "lastname" : "Duke",
              "age" : 32,
              "gender" : "M",
              "address" : "880 Holmes Lane",
              "employer" : "Pyrami",
              "email" : "amberduke@pyrami.com",
              "city" : "Brogan",
              "state" : "IL"
            }
          },
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "56",
            "_score" : 1.0,
            "_source" : {
              "account_number" : 56,
              "balance" : 14992,
              "firstname" : "Josie",
              "lastname" : "Nelson",
              "age" : 32,
              "gender" : "M",
              "address" : "857 Tabor Court",
              "employer" : "Emtrac",
              "email" : "josienelson@emtrac.com",
              "city" : "Sunnyside",
              "state" : "UT"
            }
          }
        ]
      }
    }

2.4、_source——过滤返回字段

如果返回的字段比较多，又不需要这么多字段，此时可以指定返回的字段：

    GET /bank/_search
    {
      "query": {
        "term": {
          "age": {
            "value": 32
          }
        }
      },
      "from": 0,
      "size": 2,
      "_source": ["firstname", "lastname"]
    }

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 52,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "firstname" : "Amber",
              "lastname" : "Duke"
            }
          },
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "56",
            "_score" : 1.0,
            "_source" : {
              "firstname" : "Josie",
              "lastname" : "Nelson"
            }
          }
        ]
      }
    }

2.5、min_score——最小评分

有的文档得分特别低，说明这个文档和我们查询的关键字相关度很低。我们可以设置一个最低分，只有得分超过最低分的文档才会被返回。

    GET /bank/_search
    {
      "query": {
        "match": {
          "address": "Street"
        }
      },
      "min_score": 0.9
    }

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 385,
          "relation" : "eq"
        },
        "max_score" : 0.95395315,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "6",
            "_score" : 0.95395315,
            "_source" : {
              "account_number" : 6,
              "balance" : 5686,
              "firstname" : "Hattie",
              "lastname" : "Bond",
              "age" : 36,
              "gender" : "M",
              "address" : "671 Bristol Street",
              "employer" : "Netagy",
              "email" : "hattiebond@netagy.com",
              "city" : "Dante",
              "state" : "TN"
            }
          },
          ...
        ]
      }
    }

2.6、highlight——高亮

查询关键字高亮：

    GET /bank/_search
    {
      "query": {
        "term": {
          "city.keyword": {
            "value": "Brogan"
          }
        }
      },
      "highlight": {
        "fields": {"city.keyword": {}}
      }
    }

    {
      "took" : 59,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 6.5032897,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 6.5032897,
            "_source" : {
              "account_number" : 1,
              "balance" : 39225,
              "firstname" : "Amber",
              "lastname" : "Duke",
              "age" : 32,
              "gender" : "M",
              "address" : "880 Holmes Lane",
              "employer" : "Pyrami",
              "email" : "amberduke@pyrami.com",
              "city" : "Brogan",
              "state" : "IL"
            },
            "highlight" : {
              "city.keyword" : [
                "<em>Brogan</em>"
              ]
            }
          }
        ]
      }
    }

3、全文搜索

3.1、match query——分词查询

match query 会对查询语句进行分词，分词后，如果查询语句中的任何一个词项被匹配，则文档就会被索引到。

    GET /bank/_search
    {
      "query": {
        "match": {
          "address": "Bristol Street"
        }
      },
      "from": 0,
      "size": 2
    }

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 385,
          "relation" : "eq"
        },
        "max_score" : 7.455468,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "6",
            "_score" : 7.455468,
            "_source" : {
              "account_number" : 6,
              "balance" : 5686,
              "firstname" : "Hattie",
              "lastname" : "Bond",
              "age" : 36,
              "gender" : "M",
              "address" : "671 Bristol Street",
              "employer" : "Netagy",
              "email" : "hattiebond@netagy.com",
              "city" : "Dante",
              "state" : "TN"
            }
          },
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "13",
            "_score" : 0.95395315,
            "_source" : {
              "account_number" : 13,
              "balance" : 32838,
              "firstname" : "Nanette",
              "lastname" : "Bates",
              "age" : 28,
              "gender" : "F",
              "address" : "789 Madison Street",
              "employer" : "Quility",
              "email" : "nanettebates@quility.com",
              "city" : "Nogal",
              "state" : "VA"
            }
          }
        ]
      }
    }

Bristol Street只要能有一个词能匹配，这条记录就算是相关记录会返回来。如果想要两个词都包含，那么可以使用 operator 的 and （默认是 or）：

    GET /bank/_search
    {
      "query": {
        "match": {
          "address": {
            "query": "Bristol Street",
            "operator": "and"
          }
        }
      },
      "from": 0,
      "size": 2
    }

    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 7.455468,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "6",
            "_score" : 7.455468,
            "_source" : {
              "account_number" : 6,
              "balance" : 5686,
              "firstname" : "Hattie",
              "lastname" : "Bond",
              "age" : 36,
              "gender" : "M",
              "address" : "671 Bristol Street",
              "employer" : "Netagy",
              "email" : "hattiebond@netagy.com",
              "city" : "Dante",
              "state" : "TN"
            }
          }
        ]
      }
    }

3.2、match_phrase query——分词且有序

match_phrase query 也会对查询的关键字进行分词，但是它分词后有两个特点：

分词后的词项顺序必须和文档中词项的顺序一致
所有的词都必须出现在文档中

    GET /bank/_search
    {
      "query": {
        "match_phrase": {
          "address": {
            "query": "671 street",
            "slop": 1
          }
        }
      }
    }

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 2,
          "relation" : "eq"
        },
        "max_score" : 4.1140327,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "206",
            "_score" : 4.1140327,
            "_source" : {
              "account_number" : 206,
              "balance" : 47423,
              "firstname" : "Kelli",
              "lastname" : "Francis",
              "age" : 20,
              "gender" : "M",
              "address" : "671 George Street",
              "employer" : "Exoswitch",
              "email" : "kellifrancis@exoswitch.com",
              "city" : "Babb",
              "state" : "NJ"
            }
          },
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "6",
            "_score" : 4.1140327,
            "_source" : {
              "account_number" : 6,
              "balance" : 5686,
              "firstname" : "Hattie",
              "lastname" : "Bond",
              "age" : 36,
              "gender" : "M",
              "address" : "671 Bristol Street",
              "employer" : "Netagy",
              "email" : "amberduke@pyrami.com",
              "city" : "Dante",
              "state" : "TN"
            }
          }
        ]
      }
    }

query 是查询的关键字，会被分词器进行分解，分解之后去倒排索引中进行匹配。

slop 是指关键字之间的最小距离，但是注意不是关键之间间隔的字数。文档中的字段被分词器解析之后，解析出来的词项都包含一个 position 字段表示词项的位置，查询短语分词之后的 position 之间的间隔要满足 slop 的要求。

    PUT /b
    {
      "mappings": {
        "properties": {
          "title": {
            "type": "text",
            "analyzer": "ik_smart"
          }
        }
      }
    }