ElasticSearch API实现CRUD

最新推荐文章于 2024-07-30 07:27:30 发布

客人

最新推荐文章于 2024-07-30 07:27:30 发布

阅读量241

点赞数

分类专栏： ElasticSearch

原文链接：https://blog.csdn.net/qq_41851454/article/details/81353359

版权

ElasticSearch 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

2.5、版本控制

2.6、实现映射mapping

2.7、基本查询（Query查询）

2.7.1、数据准备

2.7.2、term查询和terms查询

2.7.3、控制查询返回的数量

2.7.4、返回版本号

2.7.5、match查询（分词器）

2.7.6、控制加载的字段

2.7.7、排序

2.7.8、前缀匹配查询

2.7.9、范围查询

2.7.10、willdcard查询

2.7.11、fuzzy实现模糊查询

2.7.12、高亮搜索结果

2.8、中文的基本查询（Query查询）

2.8、Filter查询

2.8.1、简单的过滤查询

2.8.2、bool过滤查询

2.8.3、范围过滤查询

2.8.4、过滤非空

2.8.5、过滤器缓存

2.8.6、聚合查询

2.8.7、复合查询

添加索引：

PUT /lib/
{
"settings":{
"index":{
"number_of_shards": "5",
"number_of_replicas": "1"
}
}
}
PUT lib

查看索引信息：

GET /lib/_settings
查看全部索引信息：

GET _all/_settings
添加文档：

PUT /lib/user/1

{"first_name":"Fir",

}

2.5、版本控制
ElasticSearch采用了乐观锁来保证数据的一致性，也就是说，当用户对document进行操作时，并不需要对该document作加锁和解锁的操作，只需要指定要操作的版本即可，当版本号一致时，ElasticSearch会允许该操作顺利执行，而当版本号存在冲突时，ElasticSearch会提示冲突并抛出异常（VersionConflictEngineException异常）。

ElasticSearch的版本号的取值范围为1到2^63 - 1。

内部版本控制：使用的是 _version

外部版本控制：ElasticSearch在处理外部版本号时会与内部版本号的处理有些不同。它不再是检查_version是否与请求中指定的数值相同，而检查当前的_version是否比指定的数值小，如果请求成功，那么外部的版本号就会被存储到文档中_versionz中。

为了保持_version与外部版本控制的数据一致，使用version_type = external。

2.6、实现映射mapping

创建索引的时候，可以预先定义字段的类型以及相关属性，这样就能够把日期字段处理成日期，把数字字段处理成数字，把字符串字段处理字符串值等支持的数据类型：

（1）核心数据类型（Code datatypes）

字符型：string，string类型包括

text和keyword

text类型被用来索引长文本，在建立索引前会将这些文本进行分词，转化为词的组合，建立索引，允许es来检索这些词语。text类型不能用来排序和聚合。

keyword 类型不需要进行分词，可以被用来检索过滤、排序和聚合。keyword类型字段只能用本身来进行检索。

数字型：long，integer，short，byte，double，float

日期型：date

布尔型：boolean

二进制型：binary
（2）复杂数据类型（Complex dataypes）

数组类型（Array datatype）；数组类型不需要专门制定数组元素的type，例如：

字符型数组：["one","two"]

整数数组：[1,2]

数组型整数：[1,[2,3]] 等价于 [1,2,3]

对象数组：[{"name": "Mary", "age":12},{"name" : "john" , "age" : 10}]

对象类型（Object datatype）：_object_ 用于单个JSON对象；

嵌套类型（Nested datatype）：_nested_用于JSON数组；
（3）地理位置类型（Geo datatypes）

地理坐标类型（Geo-point datatype）：_geo_point_ 用于经纬度坐标；

地理形状类型（Geo-Shape datatype）：_geo_shape_ 用于类似于多边形的复杂形状；
（4）特定类型（Specialised datatype）

IPv4类型（IPv4 datatype）：_ip_ 用于IPv4地址；

Completion类型（Completion datatype）：_ completion _ 提供自动补全建议；

Token count类型（Token count datatype）：_ token _ count _ 用于统计做了标记的字段的index数目，该值会一直增加，不会因为过滤条件而减少。mapper-murmur3

类型：通过插件，可以通过 _ murmur3 _ 来计算index的hash值；

附加类型（Attachment datatype）：采用mapper-attachments

插件，可支持 _ attachments _ 索引
支持的属性：

2.7、基本查询（Query查询）
2.7.1、数据准备
创建一个mapping：

PUT /lib3
{
"settings": {
"number_of_shards": 3
, "number_of_replicas": 0
},
"mappings": {
"user":{
"properties": {
"name":{"type": "text"},
"address":{"type": "text"},
"age":{"type": "integer"},
"interests":{"type": "text"},
"birthday":{"type": "date"}
}
}
}
}
插入几条数据：

PUT /lib3/user/1
{
"name" : "zhaoliu",
"address" : "hei long jiang sheng tie ling shi",
"age" : 50,
"birthday" : "1970-12-12",
"interests" : "xi huan hejiu,duanlian,lvyou"
}

PUT /lib3/user/2
{
"name" : "zhaoming",
"address" : "bei jing hai dian qu ",
"age" : 20,
"birthday" : "1998-10-12",
"interests" : "xi huan hejiu,duanlian,lvyou"
}
PUT /lib3/user/3
{
"name" : "lisi",
"address" : "hei long jiang sheng tie ling shi",
"age" : 23,
"birthday" : "1970-12-12",
"interests" : "xi huan hejiu,duanlian,lvyou"
}
PUT /lib3/user/4
{
"name" : "wangwu",
"address" : "bei jing hai dian qu",
"age" : 26,
"birthday" : "1995-12-12",
"interests" : "xi huan hejiu,duanlian,lvyou"
}
PUT /lib3/user/5
{
"name" : "zhangsan",
"address" : "bei jing chao yang qu",
"age" : 29,
"birthday" : "1988-12-12",
"interests" : "xi huan hejiu,duanlian,lvyou"
}
查看全部的内容：

GET /lib3/user/_search
按条件查询：

#"max_score": 0.6931472：和当前搜索相关度的匹配分数
GET /lib3/user/_search?q=name:lisi
搜索结果：

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.6931472,#和当前搜索相关度的匹配分数
"hits": [
{
"_index": "lib3",
"_type": "user",
"_id": "3",
"_score": 0.6931472,
"_source": {
"name": "lisi",
"address": "hei long jiang sheng tie ling shi",
"age": 23,
"birthday": "1970-12-12",
"interests": "xi huan hejiu,duanlian,lvyou"
}
}
]
}
}
GET lib3/user/_search?q=interests:hejiu&sort=age:desc
2.7.2、term查询和terms查询
term query会去倒排索引中寻找确切的term，它并不知道分词器的存在。这种查询适合keyword、numeric、date。

term：查询某个字段里含有某个关键词的文档

GET lib3/user/_search
{
"query" :{
"term": {
"name": "zhaoliu"
}
}
}
terms：查询某个字段里含有多个关键词的文档

GET lib3/user/_search
{
"query" :{
"terms": {
"interests": ["hejiu","lvyou"]
}
}
}
2.7.3、控制查询返回的数量
from：从哪一个文档开始

size：需要的个数

取前2个文档：

GET lib3/user/_search
{
"from": 0,
"size": 2,
"query" :{
"terms": {
"interests": ["hejiu","lvyou"]
}
}
}
2.7.4、返回版本号
添加上版本号：

GET lib3/user/_search
{
"version": true,
"query" :{
"terms": {
"interests": ["hejiu","lvyou"]
}
}
}
2.7.5、match查询（分词器）
match query知道分词器的存在，会对filed进行分词操作，然后再查询

GET lib3/user/_search
{
"query" :{
"match": {
"name": "zhaoliu wangwu"
}
}
}
GET lib3/user/_search
{
"query" :{
"match": {
"interests": "duanlian changge"
}
}
}
GET lib3/user/_search
{
"query" :{
"match": {
"age": "20"
}
}
}
match_all：查询所有文档

GET lib3/user/_search
{
"query" :{
"match_all": {}
}
}
multi_match：可以指定多个字段

GET lib3/user/_search
{
"query" :{
"multi_match": {
"query": "hejiu",
"fields": ["interests","name"]
}
}
}
match_phrase：短语匹配查询

GET lib3/user/_search
{
"query" :{
"match_phrase": {
"interests": "duanlian,lvyou"
}
}
}
指定返回的字段：

GET lib3/user/_search
{
"_source": ["address","name"],
"query": {
"match": {
"interests": "duanlian"
}
}
}
ElasticSearch引擎首先分析（analyze）查询字符串，从分析后的文本中构建短语查询，这意味着匹配短语中的所有分词，并且保证各个分词的相对位置不变：

2.7.6、控制加载的字段
includes：包含的字段 excludes：排除哪些字段

GET lib3/user/_search
{
"query": {
"match_all": {}
},
"_source": {
"includes": ["name","address"]
, "excludes": ["age","birthday"]
}
}
也可以使用通配符来表示字段：

GET lib3/user/_search
{
"query": {
"match_all": {}
},
"_source": {
"includes": "addr*"
, "excludes": ["age","bir*"]
}
}
2.7.7、排序
使用sort实现排序：desc：降序，asc升序

GET lib3/user/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
2.7.8、前缀匹配查询
GET lib3/user/_search
{
"query": {
"match_phrase_prefix": {
"name": {
"query": "zhao"
}
}
}
}
2.7.9、范围查询
range：实现范围查询

参数：from，to，include_lower，include_upper，boost

include_lower：是否包含范围的左边界，默认是true

include_upper：是否包含范围的右边界，默认是true

GET lib3/user/_search
{
"query": {
"range": {
"birthday": {
"from": "1990-10-10",
"to": "2018-05-01"
}
}
}
}
GET lib3/user/_search
{
"query": {
"range": {
"age": {
"from": 20,
"to": 25,
"include_lower":true,
"include_upper":false
}
}
}
}
2.7.10、willdcard查询
允许使用通配符*和？来进行查询

*代表0个或多个字符

？代表任意一个字符

GET lib3/user/_search
{
"query": {
"wildcard": {
"name": "zhao*"
}
}
}

GET lib3/user/_search
{
"query": {
"wildcard": {
"name": "li?i"
}
}
}
2.7.11、fuzzy实现模糊查询
value：查询的关键字

boost：查询的权值，默认值是1.0

min_similarity：设置匹配的最小相似度，默认值为0.5，对于字符创，取值为0-1（包括0和1）；对于数值，取值可能大于1；对于日期型取值为1d，1m等，1d代表1天

prefix_length：指名分区词项的共同前缀长度，默认是0

max_expansions：查询中的词项可以扩展的数目，默认可以无限大

GET lib3/user/_search
{
"query": {
"fuzzy": {
"name": "zholiu"
}
}
}
GET lib3/user/_search
{
"query": {
"fuzzy": {
"interests": {
"value": "duanlin"
}
}
}
}
2.7.12、高亮搜索结果
GET lib3/user/_search
{
"query": {
"match": {
"interests": "duanlian"
}
},
"highlight": {
"fields": {
"interests": {}
}
}
}
2.8、中文的基本查询（Query查询）

ik带有两个分词器

ik_max_word：会将文本做最细粒度的拆分；尽可能多的拆分出词语

ik_smart：会做最粗粒度的拆分；已被分出的词语将不会再次被其它词语占有

PUT /lib4
{
"settings": {
"number_of_shards": 3
, "number_of_replicas": 0
},
"mappings": {
"user":{
"properties": {
"name":{"type": "text","analyzer": "ik_max_word"},
"address":{"type": "text","analyzer": "ik_max_word"},
"age":{"type": "integer"},
"interests":{"type": "text","analyzer": "ik_max_word"},
"birthday":{"type": "date"}
}
}
}
}
跟英文查询相同。

2.8、Filter查询
filter是不计算相关性的，同时可以cache。因此，filter速度要快于query

创建数据：

POST /lib4/items/_bulk
{"index":{"_id":1}}
{"price":40,"itemID":"ID100123"}
{"index":{"_id":2}}
{"price":50,"itemID":"ID100124"}
{"index":{"_id":3}}
{"price":25,"itemID":"ID100125"}
{"index":{"_id":4}}
{"price":30,"itemID":"ID100126"}
{"index":{"_id":5}}
{"price":null,"itemID":"ID100127"}
2.8.1、简单的过滤查询
GET /lib4/items/_search
{
"query": {
"bool": {
"filter": [
{"term":{"price":40}}
]
}
}
}

GET /lib4/items/_search
{
"query": {
"bool": {
"filter": [
{"terms":{"price":[25,40]}}
]
}
}
}

GET /lib4/items/_search
{
"query": {
"bool": {
"filter": [
{"term":{ "itemID": "id100123" }}
]
}
}
}
查看分词器分析的结果：

GET /lib4/_mapping
不希望商品id字段被分词，则重新创建映射

PUT lib4
{
"mappings": {
"items": {
"properties": {
"itemID":{
"type": "text",
"index": false
}
}
}
}
}
2.8.2、bool过滤查询
可以实现组合过滤查询

格式：

{"bool":{"must":[],"should":[],"must_not":[]}}

must：必须满足的条件 --- and

should：可以满足也可以不满足的条件 --- or

must_not ：不需要满足的条件 --- not

GET /lib4/items/_search
{
"query": {
"bool": {
"should":[
{"term":{"price":25}},
{"term":{"itemID":"id100123"}}
],
"must_not": [
{"term": {
"price": "30"
}}
]
}
}
}
嵌套使用bool：

GET /lib4/items/_search
{
"query": {
"bool": {
"should": [
{"term": {"itemID": "id100123"}},
{
"bool": {
"must": [
{"term": {
"itemID": "id100124"
}},
{
"term": {
"price": "40"
}
}
]
}
}
]
}
}
}
2.8.3、范围过滤查询
gt：>

it：<

gte：>=

lte：<=

GET lib4/items/_search
{
"query": {
"bool": {
"filter": {
"range": {
"price": {
"gte": 20,
"lte": 50
}
}
}
}
}
}
2.8.4、过滤非空
GET lib4/items/_search
{
"query": {
"bool": {
"filter": {
"exists": {
"field": "price"
}
}
}
}
}
2.8.5、过滤器缓存
ElasticSearch提供了一种特殊的缓存，即过滤器缓存（filter cache），永爱存储过滤器的结果，被缓存的过滤器并不需要消耗过多的内存（因为它们只存储了哪些文档能与过滤相匹配的相关信息），而且可供后续所有与之相关的查询重复使用，从而极大地特高了查询性能。

注意：ElasticSearch并不是默认缓存所有的过滤器，以下过滤器默认不缓存：

2.8.6、聚合查询
# 聚合查询

#SUM
GET lib4/items/_search
{
"size" : 0,
"aggs":{
"price_of_sum":{
"sum" : {
"field" : "price"
}
}
}
}

#最小值
GET lib4/items/_search
{
"size": 0,
"aggs": {
"price_of_min": {
"min": {
"field": "price"
}
}
}
}

#最大值
GET lib4/items/_search
{
"size": 0,
"aggs": {
"price_of_max": {
"max": {
"field": "price"
}
}
}
}

#平均值
GET lib4/items/_search
{
"size": 0,
"aggs": {
"price_of_avg": {
"avg": {
"field": "price"
}
}
}
}

#有多少个互不相同的值
GET lib4/items/_search
{
"size": 0,
"aggs": {
"price_of_cardi": {
"cardinality": {
"field": "price"
}
}
}
}

#分组
GET lib4/items/_search
{
"size": 0,
"aggs": {
"price_of_group": {
"terms": {
"field": "price"
}
}
}
}

#对那些有锻炼兴趣的用户按年龄分组,排序
GET lib3/user/_search
{
"query": {
"match": {
"interests": "duanlian"
}
}
, "size": 0,
"aggs": {
"age_of_group": {
"terms": {
"field": "age"
, "order": {
"age_of_avg": "desc"
}
}
, "aggs": {
"age_of_avg": {
"avg": {
"field": "age"
}
}
}
}
}
}
2.8.7、复合查询
将多个基本查询组合成单一查询的查询

1. 使用bool查询

接收以下参数：

must：文档必须匹配这些条件才能被包含进来。

must_out : 文档必须不匹配这个条件才能被包含进来。

should ：如果满足这些语句中的任意句，将增加_score，否则，无任何影响。它们主要用于修正每个文档的相关性得分。

filter ：必须匹配，但它以不评分、过滤模式来进行。这些语句对评分没有共享，只有根据过滤标准来排除或包含文档。

相关性得分是如何组合的，每一个子查询都独自地计算文档的相关性得分，一旦他们的得分被计算出来，bool查询就将这些得分进行合并并且返回一个代表整个布尔操作的得分。

下面的查询用于查找title字段匹配how to make millions 并且不被标识为spam的文档。那些被标识为starred或在2014之后的文档，将比另外那些文档拥有更高的排名。如果两者都满足，那么它排名讲更高：

GET lib4/items/_search
{
"query": {
"bool": {
"filter": {
"range": {
"price": {
"gte": 20,
"lte": 50
}
}
}
}
}
}
GET lib4/items/_search
{
"query": {
"bool": {
"filter": {
"exists": {
"field": "price"
}
}
}
}
}

GET lib3/user/_search
{
"query": {
"bool": {
"must": [
{
"match": {"interests": "duanlian"}}],
"must_not": [{"match": {"interests": "lvyou"}}]
, "should": [
{"match": {"address": "bei jing"}},
{ "range": {"birthday": {"gte": "1996-01-01"}}}

]

}
}
}

GET lib3/user/_search
{
"query": {
"bool": {
"must": [
{"match": {
"interests": "duanlian"
}}

]
, "must_not": [
{"match": {
"interests": "lvyou"
}}
]
, "should": [
{"match": {
"address": "beijing"
}}
]
, "filter": {
"range": {
"birthday": {
"gte": "1996-01-01"
}
}
}
}
}
}
constant_score查询(不计算相关度分数)

它将一个不变的常量评分应用于所有匹配的文档，它被经常用于你需要执行一个filter而没有其他查询（例如，评分查询）的情况下。

term查询被放置在constant_score中，转成不评分的filter。这种方式可以用来取代只有filter语句的bool查询。

GET lib3/user/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"interests": "duanlian"
}
}
}
}
}

原文链接：https://blog.csdn.net/qq_41851454/article/details/81353359