文章目录
基本
一个布尔查询是一个或多个查询字句的组合,总共有四种子句,其中两个会影响算分,两个个不影响
正向匹配
子句顺序不重要,但如果没有must子句,那么should子句中必须至少有一项匹配。
示例如下,先插入数据:
POST /products/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10,"avaliable":true,"date":"2018-01-01", "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20,"avaliable":true,"date":"2019-01-01", "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30,"avaliable":true, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30,"avaliable":false, "productID" : "QQPX-R-3956-#aD8" }
然后进行布尔查询:
POST /products/_search
{
"query": {
"bool" : {
"must" : {
"term" : { "price" : "30" }
},
"filter": {
"term" : { "avaliable" : "true" }
},
"must_not" : {
"range" : {
"price" : { "lte" : 10 }
}
},
"should" : [
{ "term" : { "productID.keyword" : "JODL-X-1937-#pV7" } },
{ "term" : { "productID.keyword" : "XHDK-A-1293-#fJ3" } }
],
"minimum_should_match" :1
}
}
}
负向匹配
也可以通过negative和negative_boost来指定负向强化字段和因子。
示例如下,先插入数据:
POST /news/_bulk
{ "index": { "_id": 1 }}
{ "content":"Apple Mac" }
{ "index": { "_id": 2 }}
{ "content":"Apple iPad" }
{ "index": { "_id": 3 }}
{ "content":"Apple employee like Apple Pie and Apple Juice" }
再进行带负向强化的布尔查询
POST news/_search
{
"query": {
"boosting": {
"positive": {
"match": {
"content": "apple"
}
},
"negative": {
"match": {
"content": "pie"
}
},
"negative_boost": 0.5
}
}
}
嵌套布尔查询
同一级别下的查询子句的权重是一样的,所以我们可以通过嵌套的方式改变子句的权重
POST /animals/_search
{
"query": {
"bool": {
"should": [
{ "term": { "text": "quick" }},
{ "term": { "text": "dog" }},
{
"bool":{
"should":[
{ "term": { "text": "brown" }},
{ "term": { "text": "brown" }}
]
}
}
]
}
}
}
单字符串查询
dis_max会把多个子句进行匹配,总的匹配度高的结果放在前面
PUT /blogs/_doc/1
{
"title": "Quick brown rabbits",
"body": "Brown rabbits are commonly seen."
}
PUT /blogs/_doc/2
{
"title": "Keeping pets healthy",
"body": "My quick brown fox eats rabbits on a regular basis."
}
should查询
此时如果用下面的should查询,那么两个文档分数一样
POST /blogs/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "Brown fox" }},
{ "match": { "body": "Brown fox" }}
]
}
}
}
但是把should换成dis_max就不一样了
dis_max查询
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Brown fox" }},
{ "match": { "body": "Brown fox" }}
]
}
}
}
由于文档2匹配得更好(直接匹配了brown fox),所以它就会在前面
但对于以下dis_max,因为两个文档都没有直接匹配,所以分数一样。注意dis_max和should一样匹配一个子句就算匹配
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Quick pets" }},
{ "match": { "body": "Quick pets" }}
]
}
}
}
但显然文档2匹配的单词更多一些(quick和pets都有),所以需要加上一个参数tie_breaker,来弱化文档1的算分
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Quick pets" }},
{ "match": { "body": "Quick pets" }}
],
"tie_breaker": 0.2
}
}
}
单字符串多字段查询
三种场景:最佳字段、多数字段和混合字段查询
最佳字段
当字段之间相互竞争,又相互关联时,评分来自最佳匹配字段:
POST blogs/_search
{
"query": {
"multi_match": {
"type": "best_fields",
"query": "quick pets",
"fields": ["title", "body"],
"tie_breaker": 0.2
}
}
}
多数字段
匹配的字段越多越好。例如English Analyzer在主字段中抽取词干,加入同义词来匹配更多的文档。而Standard Analyze则为相同的文本加入子字段来提供更加精确的匹配。
先插入数据:
PUT test_mul/_doc/1
{
"k1": "a",
"k2": "a b",
"k3": "a c d",
"k4": "e f c"
}
PUT test_mul/_doc/2
{
"k1": "a",
"k2": "b",
"k3": "a c d",
"k4": "e f c"
}
再进行多数字段布尔查询:
POST test_mul/_search
{
"query": {
"multi_match": {
"query": "a",
"type": "most_fields",
"fields": ["k1", "k2", "k3", "k4"]
}
}
}
混合字段
如果需要在多个字段中确定信息,那么需要在这些字段中找到尽可能多的词。
比如插入以下数据:
PUT address/_doc/1
{
"street": "5 Poland Street",
"city": "London",
"country": "United Kindom",
"postcode": "W1V 3DG"
}
混合字段查询示例如下:
POST address/_search
{
"query": {
"multi_match": {
"query": "Poland Kindom W1V",
"type": "cross_fields",
"fields": ["street", "country", "postcode"]
}
}
}
中文和多语言分词检索
中文分词
可以通过安装es插件的方式来进行中文分词
例如安装ik分词器和hanlp分词器
#安装插件
./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.1.0/elasticsearch-analysis-ik-7.1.0.zip
#安装插件
./elasticsearch-plugin install https://github.com/KennFalcon/elasticsearch-analysis-hanlp/releases/download/v7.1.0/elasticsearch-analysis-hanlp-7.1.0.zip
拼音分词器
然后使用拼音分词器分词
#Pinyin
PUT /artists/
{
"settings" : {
"analysis" : {
"analyzer" : {
"user_name_analyzer" : {
"tokenizer" : "whitespace",
"filter" : "pinyin_first_letter_and_full_pinyin_filter"
}
},
"filter" : {
"pinyin_first_letter_and_full_pinyin_filter" : {
"type" : "pinyin",
"keep_first_letter" : true,
"keep_full_pinyin" : false,
"keep_none_chinese" : true,
"keep_original" : false,
"limit_first_letter_length" : 16,
"lowercase" : true,
"trim_whitespace" : true,
"keep_none_chinese_in_first_letter" : true
}
}
}
}
}
GET /artists/_analyze
{
"text": ["刘德华 张学友 郭富城 黎明 四大天王"],
"analyzer": "user_name_analyzer"
}
hanlp分词
使用hanlp标准分词器分词
POST _analyze
{
"analyzer": "hanlp_standard",
"text": ["剑桥分析公司多位高管对卧底记者说,他们确保了唐纳德·特朗普在总统大选中获胜"]
}
英文精确分词
对于英文精确分词,可以通过对同一字段设置两个索引,每个索引设置不同的分词器来实现。下面这个例子就是先对title进行标准分词,再给title加一个tag_e字段,并对其进行english分词
PUT /my_index
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"tag_e": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
测试如下:
PUT /my_index/_doc/1
{ "title": "I'm happy for this fox" }
PUT /my_index/_doc/2
{ "title": "I'm not happy about my fox problem" }
GET /my_index/_search
{
"query": {
"multi_match": {
"type": "most_fields",
"query": "not happy foxes",
"fields": [ "title", "title.tag_e" ]
}
}
}