Elasticsearch_elasticsearch glt-CSDN博客

本文链接：https://blog.csdn.net/xsfshuaijile/article/details/114961765

Elasticsearch

查询

基于词项的查询

精确查询

term查询

term 查询相当于 =

{
  "from":0,
  "size":10,
  "query":{
      "term":{
          "products.id":"id1"
      }
  }
  "_source":{
    "includes": "products.*",
    "excludes": "products.tax_amount"
  },
  "sort":
    {"products.taxless_price":"asc"}    
}

terms 查询

terms 多值匹配，相当于in。也可以进行跨索引查询

用户索引(用户id)、文章索引(文章id)

id为1的用户发表的文章，根据userid查询用户索引id为1发表出的文章id，然后根据该文章id查询出对应的文章内容

{
    "query":{
        "terms":{
            "_id":{
                "index":"user"
                "id":1
                "path":"articles"
            }
        }
    }
}

range 查询

gt 大于
lt 小于
gte 大于等于
lte 小于等于
booste 相关性评分

{
	"query":{
        "range":{
            "fieldname":{
                "gte":100,
                "glt":200
            }
        }
    }
}

exists查询

不为空 is not null

{
	"query":{
	    "exists": {"field":"products.taxless_price"}
	}
}

prefix 查询

like ‘xxxx%’

{
	"query":{
		"prefix":{
			"products._id":"idprefix"
		}
	}
}

wildcard 查询和regexp查询

wildcard 通配符查询 *匹配多个 ? 一个
regexp 正则查询

{
	"query":{
		"wildcard":{
			"products.manufacturer":"Gno*ho?se"
		}
	}
}

{
	"query":{
		"regexp":{
			"products.manufacturer":"Gno*ho?se"
		}
	}
}

基于全文的检索

与词项的区别在于，先进行分析

存储过程中的分析器

字段上设置分析器
索引上设置分析器
默认分析器

设置全局索引分析器

put test
{
    "settings":{
        "analysis":{
            "analyzer":{
                "default":{
                    "type":"simple"
                },
                "my_analyzer":{
                    "type":"standard",
                    "stopword":["the","a"]
                }
            }
        }
    }
}

设置字段级别分析器，可以分别设置存储和查询的分析器

put test
{
    "mapping":{
        "properties":{
            "field":{
            	"type":"text",
                "analyzer":"standard",
                "search_analyzer":"simple"
       		 }
        }
    }
}

查询时的分析器

搜索请求带上
创建索引时候字段上的search_analyzer
创建索引时候字段上的analyzer
创建索引时候，索引上配置的default_search
standard

match

query 匹配相应的值
operator and/or 上述值时与还是或
minimum_should_match 上述值至少匹配到几个默认一个

get /kibana_sample_data_ecommerce/_search
{
  "query":{
    "match": {
      "customer_full_name":{
        "query":"Eddie Underwood",
        "operator":"and"
      }
    }
  },
  "_source":{
    "includes":"customer_full_name"
  }
}

multi_match

多个字段上进行查询

get /kibana_sample_data_ecommerce/_search
{
  "query":{
    "multi_match": {
        "query":"FEMALE",
        "fields": ["customer_full_name","customer_gender"],
        "operator": "and",
        "minimum_should_match":2
		}
  },
  "_source":{
    "includes":["customer_full_name","customer_gender"]
  }
  
}

match_phrase

精确短语匹配

slop 最大间隔几个词算匹配

java spark

customer_full_name hellp world, java is very goog, spark is also very good.

get /kibana_sample_data_ecommerce/_search
{
  "query":{
    "multi_phrase": {
        "customer_full_name":"Eddie Underwood",
        "slop":3
		}
  },
  "_source":{
    "includes":["customer_full_name","customer_gender"]
  }
  
}

match_phrase_prefix

基于前缀的短语匹配，最后一个词作为前缀匹配

max_expansions 默认50 多少次以后不再寻找

get /kibana_sample_data_ecommerce/_search
{
  "query":{
    "match_phrase_prefix": {
        "customer_full_name":"Eddie Underwood",
        "max_expansions":50
		}
  },
  "_source":{
    "includes":["customer_full_name","customer_gender"]
  }
  
}

模糊查询与纠错提示

编辑距离算法

Levenshtein

替换、插入、删除（换位）

NGram size

Unigram Biggram Thrigram

fuzziness: 0.5/1/2/auto

get /kibana_sample_data_ecommerce/_search
{
  "query":{
    "fuzzy": {
        "customer_full_name": {
          "value": "Dawsun",
          "fuzziness": 2
        }

      
    }
  },
  "_source":{
    "includes":["customer_full_name","customer_gender"]
  }
}

提示器搜索建议

term 会进行分词
phrase 短语搜索建议
completion 自动补全提示器

get /kibana_sample_data_ecommerce/_search
{
  "suggest":{
  	"msg-suggest":{
  		"text":"dawsun",
  		"term":{
  			"field":"customer_full_name"
  		}
  	}
  }
}

组合查询

bool（布尔）查询

must 必须包含的内容，影响相关度
filter 必须包含的内容，不影响相关度
should 不是必须包含的，影响相关度。如果没有must,should 应该至少含有其中一个
must_not 不能包含的内容，不影响相关度

get /kibana_sample_data_ecommerce/_search
{
	"query":{
		"bool":{
			"must":[
				{"match":{"customer_full_name":"Underwood"}}
			],
			"should":[
				{"term":{"currency":"EUR"}},
				{"term":{"customer_phone":"1111"}}
			],
			"filter":{
				"term":{
					"day_of_week":"Sunday"
				}
			}
		}
	}
}

dis_max组合查询

相关度最大得分的作为分值返回结果，其他相关度低的进行忽略。

tie_breaker : 其他相关度低的可占据比重

{
	"query":{
		"dis_max":{
			"queries":[
				{"match":{"messgae":"firefox"}},
				{"term":{"geo.src":"CN"}},
				{"term":{"geo.dest":"CN"}}
			],
			"tie_breaker":0.7
		}
	}
}

constant_score 查询

返回出来的分值为一个固定值. 有什么用？再组合，子查询出来的所有的分值都一样。

get /kibana_sample_data_ecommerce/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "match": {
                "customer_full_name": "Underwood"
              }
            },
            {
              "match": {
                "customer_gender": "MALE"
              }
            }
          ]
        }
      },
      "boost": 1.2
    }
  }
}

boosting 查询

positive 正相关的类似于 bool查询里边的 must。必须满足。
negative 负相关的类似于bool查询里边的must_not。不过这个如果匹配只会降低分值。
negative_boost 负相关权重

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "boosting": {
      "positive": [
        {"match":{"customer_full_name":"Underwood"}}  
      ],
      "negative": [
        {"match":{"customer_gender":"MALE"}}  
      ],
      "negative_boost": 0.2
    }
  }
}