目录
目标
掌握多匹配查询,包含对多匹配查询的类型分析和应用。
ES版本信息
7.17.5
官方文档
新增测试数据
PUT /boss_db
{
"settings": {
"index": {
"analysis.analyzer.default.type": "ik_max_word"
}
}
}
PUT /boss_db/_bulk
{"index":{"_id":"1"}}
{"company":"星耀科技有限公司","min_num":0,"max_num":20,"province":"广东省","city":"深圳市","county":"南山区","post":"前端开发实习生","min_salary":10,"max_salary":16,"qualification":"本科","min_work_time":3,"max_work_time":5,"skill":["html","css","vue","js"]}
{"index":{"_id":"2"}}
{"company":"恒和科技有限公司","min_num":100,"max_num":500,"province":"广东省","city":"广州市","county":"天河区","post":"JAVA开发工程师","min_salary":20,"max_salary":30,"qualification":"硕士","min_work_time":1,"max_work_time":3,"skill":["k8s","springboot","mybatis","微服务"]}
{"index":{"_id":"3"}}
{"company":"天心科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"广州市","county":"天河区","post":"JAVA架构师","min_salary":40,"max_salary":50,"qualification":"博士","min_work_time":3,"max_work_time":5,"skill":["mybatis","spring","kafka","微服务"]}
{"index":{"_id":"4"}}
{"company":"黄河科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"广州市","county":"天河区","post":"JAVA","min_salary":40,"max_salary":50,"qualification":"博士","min_work_time":3,"max_work_time":5,"skill":["es","mysql","分布式","soa"]}
{"index":{"_id":"5"}}
{"company":"长江科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"资深大数据开发工程师","min_salary":40,"max_salary":50,"qualification":"博士","min_work_time":0,"max_work_time":5,"skill":["redis","kafka","mq","数据结构"]}
{"index":{"_id":"6"}}
{"company":"黄山科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"前端开发","min_salary":20,"max_salary":30,"qualification":"大专","min_work_time":0,"max_work_time":5,"skill":["html","css","js","vue"]}
{"index":{"_id":"7"}}
{"company":"黄山科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"前端开发实习生","min_salary":10,"max_salary":13,"qualification":"不限","min_work_time":0,"max_work_time":5}
{"index":{"_id":"8"}}
{"company":"银河大数据科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"大数据实习生","min_salary":10,"max_salary":13,"qualification":"不限","min_work_time":0,"max_work_time":5,"skill":["电商","spring","容器技术","微服务技术"]}
{"index":{"_id":"9"}}
{"company":"银河大数据科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"JAVA实习生","min_salary":30,"max_salary":60,"qualification":"本科","min_work_time":0,"max_work_time":5,"skill":["数据结构","k8s","云原生技术","电商"]}
PUT /blog_db
{
"settings": {
"index": {
"analysis.analyzer.default.type": "ik_max_word"
}
}
}
PUT /blog_db/_bulk
{"index":{"_id":"1"}}
{"title":"kafka入门手册","content":"kafka命令、集群、优化"}
{"index":{"_id":"2"}}
{"title":"kafka命令手册","content":"命令详情、命令实战"}
基本语法实战
基本格式
需求:例如:在招聘网搜索栏中的关键词匹配职位和公司。这里分别输入"天心"和"大数据"进行搜索。
GET boss_db/_search
{
"query": {
"multi_match" : {
"query": "天心",
"fields": [ "company", "post" ]
}
}
}
GET boss_db/_search
{
"query": {
"multi_match" : {
"query": "大数据",
"fields": [ "company", "post" ]
}
}
}
通配符匹配多个字段
需求:凡是字段名称包含"work_time"的都作为匹配字段。
GET boss_db/_search
{
"query": {
"multi_match" : {
"query": 5,
"fields": [ "*work_time" ]
}
}
}
逻辑操作符
需求一:搜索条件是"前端开发实习生",要求所有分词都匹配。
GET boss_db/_search
{
"query": {
"multi_match" : {
"query": "前端开发实习生",
"fields": [ "post" ]
, "operator": "and"
}
}
}
需求二:搜索条件是"前端开发实习生",要求只要有分词匹配就符合条件。
GET boss_db/_search
{
"query": {
"multi_match" : {
"query": "前端开发实习生",
"fields": [ "post" ]
, "operator": "or"
}
}
}
设置评分权重
需求:关键词为"大数据",要求匹配字段是公司和职位,且评分要求职位字段权重大于公司字段权重。
#不处理权重
GET boss_db/_search
{
"query": {
"multi_match" : {
"query": "大数据",
"fields": ["company", "post" ]
}
}
}
#职位评分分数乘以4。
GET boss_db/_search
{
"query": {
"multi_match" : {
"query": "大数据",
"fields": ["company", "post^4" ]
}
}
}
multi_match多种类型实战
best_fields最佳字段(默认)
作用:从所有字段被搜索的字段中找到最重要的字段。比如:关键词为"棕色的狐狸";a字段包含棕色的狐狸,b字段只包含棕色的,c字段只包含狐狸。此时ES认为a字段是最佳字段。tie_breaker的取值的范围是[1,0],默认值为0,即只考虑最佳字段的分数。如果对它进行设置:
- 设置0表示:总分=最佳字段的分数。
- 设置0<tie_breaker<1表示:总分=最佳字段的分数+tie_breaker*其他字段的分数。
- 设置tie_breaker=1表示:所有字段分数权重一样,相当于没有最佳字段。总分=所有字段相加。
需求:搜索关键词为"kafka命令",同时匹配标题和内容,优先标题权重。
分析:如果不设置tie_breaker,根据关键词"kafka命令"匹配,两个文档的最终得分相等,因为id小的排在前面。但是按照业务来看,明显id=2的文档更符合逻辑,所以这里需要将其他字段的分数也算进来一部分即可,这里我设置算进来0.1倍分数。
GET /blog_db/_search
{
"query": {
"multi_match": {
"query": "kafka命令",
"type": "best_fields",
"fields": [
"title",
"content"
],
"tie_breaker": 0.1
}
}
}
most_fields最多字段
相当于best_fields类型,tie_breaker属性设置为1的效果。说明该类型更适合处理字段评分权重相同的场景。这里不做演示,具体同上,tie_breaker设置为1的情况。
跨字段匹配
需求:搜索公司所在区县是"天河区",且应聘学历为"硕士"的数据。
GET boss_db/_search
{
"query": {
"multi_match": {
"query": "天河区硕士",
"type": "cross_fields",
"fields": [
"county",
"qualification"
],
"operator": "and"
}
}
}
分析:所有分词必须至少出现在一个字段中,文档才能匹配。它与copy_to类似,但是copy_to需要额外存储,而cross_fields方式不需要额外存储且可以设置字段权重。个人觉得这种方式更适合在查询地名和英文姓名使用。