Elasticsearch——搜索API详解

作者简介:大家好,我是smart哥,前中兴通讯、美团架构师,现某互联网公司CTO

联系qq:184480602,加我进群,大家一起学习,一起进步,一起对抗互联网寒冬

学习必须往深处挖,挖的越深,基础越扎实!

阶段1、深入多线程

阶段2、深入多线程设计模式

阶段3、深入juc源码解析


阶段4、深入jdk其余源码解析


阶段5、深入jvm源码解析

码哥源码部分

码哥讲源码-原理源码篇【2024年最新大厂关于线程池使用的场景题】

码哥讲源码【炸雷啦!炸雷啦!黄光头他终于跑路啦!】

码哥讲源码-【jvm课程前置知识及c/c++调试环境搭建】

​​​​​​码哥讲源码-原理源码篇【揭秘join方法的唤醒本质上决定于jvm的底层析构函数】

码哥源码-原理源码篇【Doug Lea为什么要将成员变量赋值给局部变量后再操作?】

码哥讲源码【你水不是你的错,但是你胡说八道就是你不对了!】

码哥讲源码【谁再说Spring不支持多线程事务,你给我抽他!】

终结B站没人能讲清楚红黑树的历史,不服等你来踢馆!

打脸系列【020-3小时讲解MESI协议和volatile之间的关系,那些将x86下的验证结果当作最终结果的水货们请闭嘴】

搜索

1、搜索入门

搜索分为两个过程:

  1. 当向索引中保存文档时,默认情况下,es 会保存两份内容,一份是 _source 中的数据,另一份则是通过分词、排序等一系列过程生成的倒排索引文件,倒排索引中保存了词项和文档之间的对应关系。
  2. 搜索时,当 es 接收到用户的搜索请求之后,就会去倒排索引中查询,通过的倒排索引中维护的倒排记录表找到关键词对应的文档集合,然后对文档进行评分、排序、高亮等处理,处理完成后返回文档。

2、简单搜索

2.1、match_all——查询所有
    GET /bank/_search
    {
      "query": {
        "match_all": {}
      }
    }

简写:

    GET /bank/_search

结果:

因为没有设置查询条件,所有最大的得分是 1.0。

这里并没有把所有的数据都展示出来,因为默认是有分页功能的。

2.2、term——词项查询

即 term 查询,就是根据词去查询,查询指定字段中包含给定单词的文档,term 查询不被解析,只有搜索的词和文档中的词精确匹配,才会返回文档。应用场景如:人名、地名等等。

    GET /bank/_search
    {
      "query": {
        "term": {
          "city.keyword": {
            "value": "Brogan"
          }
        }
      }
    }

结果:

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 6.5032897,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 6.5032897,
            "_source" : {
              "account_number" : 1,
              "balance" : 39225,
              "firstname" : "Amber",
              "lastname" : "Duke",
              "age" : 32,
              "gender" : "M",
              "address" : "880 Holmes Lane",
              "employer" : "Pyrami",
              "email" : "amberduke@pyrami.com",
              "city" : "Brogan",
              "state" : "IL"
            }
          }
        ]
      }
    }

2.3、from/size——分页

默认返回前 10 条数据,es 中也可以像关系型数据库一样,给一个分页参数:

from:从第几条开始。
size:多少条数据。

    GET /bank/_search
    {
      "query": {
        "term": {
          "age": {
            "value": 32
          }
        }
      },
      "from": 0,
      "size": 2
    }

返回:

    {
      "took" : 0,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 52,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "account_number" : 1,
              "balance" : 39225,
              "firstname" : "Amber",
              "lastname" : "Duke",
              "age" : 32,
              "gender" : "M",
              "address" : "880 Holmes Lane",
              "employer" : "Pyrami",
              "email" : "amberduke@pyrami.com",
              "city" : "Brogan",
              "state" : "IL"
            }
          },
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "56",
            "_score" : 1.0,
            "_source" : {
              "account_number" : 56,
              "balance" : 14992,
              "firstname" : "Josie",
              "lastname" : "Nelson",
              "age" : 32,
              "gender" : "M",
              "address" : "857 Tabor Court",
              "employer" : "Emtrac",
              "email" : "josienelson@emtrac.com",
              "city" : "Sunnyside",
              "state" : "UT"
            }
          }
        ]
      }
    }

2.4、_source——过滤返回字段

如果返回的字段比较多,又不需要这么多字段,此时可以指定返回的字段:

    GET /bank/_search
    {
      "query": {
        "term": {
          "age": {
            "value": 32
          }
        }
      },
      "from": 0,
      "size": 2,
      "_source": ["firstname", "lastname"]
    }

返回:

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 52,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "firstname" : "Amber",
              "lastname" : "Duke"
            }
          },
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "56",
            "_score" : 1.0,
            "_source" : {
              "firstname" : "Josie",
              "lastname" : "Nelson"
            }
          }
        ]
      }
    }

2.5、min_score——最小评分

有的文档得分特别低,说明这个文档和我们查询的关键字相关度很低。我们可以设置一个最低分,只有得分超过最低分的文档才会被返回。

    GET /bank/_search
    {
      "query": {
        "match": {
          "address": "Street"
        }
      },
      "min_score": 0.9
    }

返回:

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 385,
          "relation" : "eq"
        },
        "max_score" : 0.95395315,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "6",
            "_score" : 0.95395315,
            "_source" : {
              "account_number" : 6,
              "balance" : 5686,
              "firstname" : "Hattie",
              "lastname" : "Bond",
              "age" : 36,
              "gender" : "M",
              "address" : "671 Bristol Street",
              "employer" : "Netagy",
              "email" : "hattiebond@netagy.com",
              "city" : "Dante",
              "state" : "TN"
            }
          },
          ...
        ]
      }
    }

2.6、highlight——高亮

查询关键字高亮:

    GET /bank/_search
    {
      "query": {
        "term": {
          "city.keyword": {
            "value": "Brogan"
          }
        }
      },
      "highlight": {
        "fields": {"city.keyword": {}}
      }
    }

返回:

    {
      "took" : 59,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 6.5032897,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 6.5032897,
            "_source" : {
              "account_number" : 1,
              "balance" : 39225,
              "firstname" : "Amber",
              "lastname" : "Duke",
              "age" : 32,
              "gender" : "M",
              "address" : "880 Holmes Lane",
              "employer" : "Pyrami",
              "email" : "amberduke@pyrami.com",
              "city" : "Brogan",
              "state" : "IL"
            },
            "highlight" : {
              "city.keyword" : [
                "<em>Brogan</em>"
              ]
            }
          }
        ]
      }
    }

3、全文搜索

3.1、match query——分词查询

match query 会对查询语句进行分词,分词后,如果查询语句中的任何一个词项被匹配,则文档就会被索引到。

    GET /bank/_search
    {
      "query": {
        "match": {
          "address": "Bristol Street"
        }
      },
      "from": 0,
      "size": 2
    }

返回:

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 385,
          "relation" : "eq"
        },
        "max_score" : 7.455468,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "6",
            "_score" : 7.455468,
            "_source" : {
              "account_number" : 6,
              "balance" : 5686,
              "firstname" : "Hattie",
              "lastname" : "Bond",
              "age" : 36,
              "gender" : "M",
              "address" : "671 Bristol Street",
              "employer" : "Netagy",
              "email" : "hattiebond@netagy.com",
              "city" : "Dante",
              "state" : "TN"
            }
          },
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "13",
            "_score" : 0.95395315,
            "_source" : {
              "account_number" : 13,
              "balance" : 32838,
              "firstname" : "Nanette",
              "lastname" : "Bates",
              "age" : 28,
              "gender" : "F",
              "address" : "789 Madison Street",
              "employer" : "Quility",
              "email" : "nanettebates@quility.com",
              "city" : "Nogal",
              "state" : "VA"
            }
          }
        ]
      }
    }

Bristol Street只要能有一个词能匹配,这条记录就算是相关记录会返回来。如果想要两个词都包含,那么可以使用 operator 的 and (默认是 or):

    GET /bank/_search
    {
      "query": {
        "match": {
          "address": {
            "query": "Bristol Street",
            "operator": "and"
          }
        }
      },
      "from": 0,
      "size": 2
    }

返回:

    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 7.455468,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "6",
            "_score" : 7.455468,
            "_source" : {
              "account_number" : 6,
              "balance" : 5686,
              "firstname" : "Hattie",
              "lastname" : "Bond",
              "age" : 36,
              "gender" : "M",
              "address" : "671 Bristol Street",
              "employer" : "Netagy",
              "email" : "hattiebond@netagy.com",
              "city" : "Dante",
              "state" : "TN"
            }
          }
        ]
      }
    }

3.2、match_phrase query——分词且有序

match_phrase query 也会对查询的关键字进行分词,但是它分词后有两个特点:

  1. 分词后的词项顺序必须和文档中词项的顺序一致
  2. 所有的词都必须出现在文档中
    GET /bank/_search
    {
      "query": {
        "match_phrase": {
          "address": {
            "query": "671 street",
            "slop": 1
          }
        }
      }
    }

返回:

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 2,
          "relation" : "eq"
        },
        "max_score" : 4.1140327,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "206",
            "_score" : 4.1140327,
            "_source" : {
              "account_number" : 206,
              "balance" : 47423,
              "firstname" : "Kelli",
              "lastname" : "Francis",
              "age" : 20,
              "gender" : "M",
              "address" : "671 George Street",
              "employer" : "Exoswitch",
              "email" : "kellifrancis@exoswitch.com",
              "city" : "Babb",
              "state" : "NJ"
            }
          },
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "6",
            "_score" : 4.1140327,
            "_source" : {
              "account_number" : 6,
              "balance" : 5686,
              "firstname" : "Hattie",
              "lastname" : "Bond",
              "age" : 36,
              "gender" : "M",
              "address" : "671 Bristol Street",
              "employer" : "Netagy",
              "email" : "amberduke@pyrami.com",
              "city" : "Dante",
              "state" : "TN"
            }
          }
        ]
      }
    }

query 是查询的关键字,会被分词器进行分解,分解之后去倒排索引中进行匹配。

slop 是指关键字之间的最小距离,但是注意不是关键之间间隔的字数。文档中的字段被分词器解析之后,解析出来的词项都包含一个 position 字段表示词项的位置,查询短语分词之后 的 position 之间的间隔要满足 slop 的要求。

    PUT /b
    {
      "mappings": {
        "properties": {
          "title": {
            "type": "text",
            "analyzer": "ik_smart"
          }
        }
      }
    }
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值