作者简介:大家好,我是smart哥,前中兴通讯、美团架构师,现某互联网公司CTO
联系qq:184480602,加我进群,大家一起学习,一起进步,一起对抗互联网寒冬
学习必须往深处挖,挖的越深,基础越扎实!
阶段1、深入多线程
阶段2、深入多线程设计模式
阶段3、深入juc源码解析
码哥源码部分
码哥讲源码-原理源码篇【2024年最新大厂关于线程池使用的场景题】
码哥讲源码-原理源码篇【揭秘join方法的唤醒本质上决定于jvm的底层析构函数】
码哥源码-原理源码篇【Doug Lea为什么要将成员变量赋值给局部变量后再操作?】
码哥讲源码【谁再说Spring不支持多线程事务,你给我抽他!】
打脸系列【020-3小时讲解MESI协议和volatile之间的关系,那些将x86下的验证结果当作最终结果的水货们请闭嘴】
搜索
1、搜索入门
搜索分为两个过程:
- 当向索引中保存文档时,默认情况下,es 会保存两份内容,一份是 _source 中的数据,另一份则是通过分词、排序等一系列过程生成的倒排索引文件,倒排索引中保存了词项和文档之间的对应关系。
- 搜索时,当 es 接收到用户的搜索请求之后,就会去倒排索引中查询,通过的倒排索引中维护的倒排记录表找到关键词对应的文档集合,然后对文档进行评分、排序、高亮等处理,处理完成后返回文档。
2、简单搜索
2.1、match_all——查询所有
GET /bank/_search
{
"query": {
"match_all": {}
}
}
简写:
GET /bank/_search
结果:
因为没有设置查询条件,所有最大的得分是 1.0。
这里并没有把所有的数据都展示出来,因为默认是有分页功能的。
2.2、term——词项查询
即 term 查询,就是根据词去查询,查询指定字段中包含给定单词的文档,term 查询不被解析,只有搜索的词和文档中的词精确匹配,才会返回文档。应用场景如:人名、地名等等。
GET /bank/_search
{
"query": {
"term": {
"city.keyword": {
"value": "Brogan"
}
}
}
}
结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 6.5032897,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : 6.5032897,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
}
}
]
}
}
2.3、from/size——分页
默认返回前 10 条数据,es 中也可以像关系型数据库一样,给一个分页参数:
from:从第几条开始。
size:多少条数据。
GET /bank/_search
{
"query": {
"term": {
"age": {
"value": 32
}
}
},
"from": 0,
"size": 2
}
返回:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 52,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "56",
"_score" : 1.0,
"_source" : {
"account_number" : 56,
"balance" : 14992,
"firstname" : "Josie",
"lastname" : "Nelson",
"age" : 32,
"gender" : "M",
"address" : "857 Tabor Court",
"employer" : "Emtrac",
"email" : "josienelson@emtrac.com",
"city" : "Sunnyside",
"state" : "UT"
}
}
]
}
}
2.4、_source——过滤返回字段
如果返回的字段比较多,又不需要这么多字段,此时可以指定返回的字段:
GET /bank/_search
{
"query": {
"term": {
"age": {
"value": 32
}
}
},
"from": 0,
"size": 2,
"_source": ["firstname", "lastname"]
}
返回:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 52,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"firstname" : "Amber",
"lastname" : "Duke"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "56",
"_score" : 1.0,
"_source" : {
"firstname" : "Josie",
"lastname" : "Nelson"
}
}
]
}
}
2.5、min_score——最小评分
有的文档得分特别低,说明这个文档和我们查询的关键字相关度很低。我们可以设置一个最低分,只有得分超过最低分的文档才会被返回。
GET /bank/_search
{
"query": {
"match": {
"address": "Street"
}
},
"min_score": 0.9
}
返回:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 385,
"relation" : "eq"
},
"max_score" : 0.95395315,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.95395315,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
},
...
]
}
}
2.6、highlight——高亮
查询关键字高亮:
GET /bank/_search
{
"query": {
"term": {
"city.keyword": {
"value": "Brogan"
}
}
},
"highlight": {
"fields": {"city.keyword": {}}
}
}
返回:
{
"took" : 59,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 6.5032897,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : 6.5032897,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
},
"highlight" : {
"city.keyword" : [
"<em>Brogan</em>"
]
}
}
]
}
}
3、全文搜索
3.1、match query——分词查询
match query 会对查询语句进行分词,分词后,如果查询语句中的任何一个词项被匹配,则文档就会被索引到。
GET /bank/_search
{
"query": {
"match": {
"address": "Bristol Street"
}
},
"from": 0,
"size": 2
}
返回:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 385,
"relation" : "eq"
},
"max_score" : 7.455468,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "6",
"_score" : 7.455468,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "13",
"_score" : 0.95395315,
"_source" : {
"account_number" : 13,
"balance" : 32838,
"firstname" : "Nanette",
"lastname" : "Bates",
"age" : 28,
"gender" : "F",
"address" : "789 Madison Street",
"employer" : "Quility",
"email" : "nanettebates@quility.com",
"city" : "Nogal",
"state" : "VA"
}
}
]
}
}
Bristol Street
只要能有一个词能匹配,这条记录就算是相关记录会返回来。如果想要两个词都包含,那么可以使用 operator 的 and (默认是 or):
GET /bank/_search
{
"query": {
"match": {
"address": {
"query": "Bristol Street",
"operator": "and"
}
}
},
"from": 0,
"size": 2
}
返回:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 7.455468,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "6",
"_score" : 7.455468,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
}
]
}
}
3.2、match_phrase query——分词且有序
match_phrase query 也会对查询的关键字进行分词,但是它分词后有两个特点:
- 分词后的词项顺序必须和文档中词项的顺序一致
- 所有的词都必须出现在文档中
GET /bank/_search
{
"query": {
"match_phrase": {
"address": {
"query": "671 street",
"slop": 1
}
}
}
}
返回:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 4.1140327,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "206",
"_score" : 4.1140327,
"_source" : {
"account_number" : 206,
"balance" : 47423,
"firstname" : "Kelli",
"lastname" : "Francis",
"age" : 20,
"gender" : "M",
"address" : "671 George Street",
"employer" : "Exoswitch",
"email" : "kellifrancis@exoswitch.com",
"city" : "Babb",
"state" : "NJ"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "6",
"_score" : 4.1140327,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "amberduke@pyrami.com",
"city" : "Dante",
"state" : "TN"
}
}
]
}
}
query 是查询的关键字,会被分词器进行分解,分解之后去倒排索引中进行匹配。
slop 是指关键字之间的最小距离,但是注意不是关键之间间隔的字数。文档中的字段被分词器解析之后,解析出来的词项都包含一个 position 字段表示词项的位置,查询短语分词之后 的 position 之间的间隔要满足 slop 的要求。
PUT /b
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_smart"
}
}
}
}