1. 运行环境
jdk1.8、ElasticSearch6.2.4、Kibana6.2.4
此处省略环境安装步骤。
2. 使用ElasticSearch API 实现CRUD
- 添加索引
PUT /lib/
{
"settings":{
"index":{
"number_of_shards": 3,
"number_of_replicas": 0
}
}
}
- 查看索引信息
GET /lib/_settings
GET /_all/_settings
- 添加文档
# 指定id
PUT /lib/user/1
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
# 不指定id,自动生成
POST /lib/user/
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 23,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
- 查看文档
# 指定id
GET /lib/user/1
# 查询所有文档
GET /lib/user/_search
# 指定字段
GET /lib/user/1?_source=age,interests
- 更新文档
# 会更新整个文档
PUT /lib/user/1
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 36,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
# 更新某个字段
POST /lib/user/1/_update
{
"doc":{
"age":33
}
}
- 删除
# 删除一个文档
DELETE /lib/user/1
# 删除一个索引
DELETE /lib
3. 批量获取文档_mget
# 可以获取不同索引下的文档
GET /_mget
{
"docs":[
{
"_index": "lib",
"_type": "user",
"_id": 1
},
{
"_index": "lib",
"_type": "user",
"_id": 2
},
{
"_index": "lib",
"_type": "user",
"_id": 3
}
]
}
# 通过_source指定具体的字段
GET /_mget
{
"docs":[
{
"_index": "lib",
"_type": "user",
"_id": 1,
"_source": "interests"
},
{
"_index": "lib",
"_type": "user",
"_id": 2,
"_source": ["age","interests"]
}
]
}
# 获取同索引同类型下的不同文档
GET /lib/user/_mget
{
"docs":[
{
"_id": 1
},
{
"_type": "user",
"_id": 2,
}
]
}
# 获取同索引同类型下的不同文档(简化写法)
GET /lib/user/_mget
{
"ids": ["1","2"]
}
4. 使用Bulk API实现批量操作
bulk格式:
# 注意格式是固定的,一定要有换行
{action:{metadata}}
{requstbody}
- action(行为):
create:文档不存在时创建
update:更新文档
index:创建新文档或替换已有文档
delete:删除一个文档
(create和index的区别:如果文档存在,使用create会失败,会提示文档已经存在;但使用index则可以成功执行,将替换已有文档) - metadata:_index,_type,_id
- 实例1:
# 注意删除没有requstbod
POST /lib/user/_bulk
{"delete":{"_index":"lib","_type":"user","_id":"1"}}
- 实例2:
批量添加:
POST /lib2/books/_bulk
{"index":{"_id":1}}
{"title":"Java","price":55}
{"index":{"_id":2}}
{"title":"Html5","price":45}
{"index":{"_id":3}}
{"title":"Php","price":35}
{"index":{"_id":4}}
{"title":"Python","price":50}
批量获取
GET /lib2/books/_mget
{
"ids": ["1","2","3","4"]
}
# 删除没有请求体
POST /lib2/books/_bulk
{"delete":{"_index":"lib2","_type":"books","_id":4}}
{"create":{"_index":"tt","_type":"ttt","_id":"100"}}
{"name":"lisi"}
{"index":{"_index":"tt","_type":"ttt"}}
{"name":"zhaosi"}
{"update":{"_index":"lib2","_type":"books","_id":"4"}}
{"doc":{"price":58}}
- bulk一次最大处理的数据量
(1). bulk会把将要处理的数据载入内存中,所以数据量是有限制的,最佳的数据量不是一个确定的数值,它取决于你的硬件,你的文档大小以及复杂性,你的索引以及搜索的负载。
(2). 一般建议是1000-5000个文档,大小建议是5-15MB,默认不能超过100M,可以在es的配置文件(即$ES_HOME下的config下的elasticsearch.yml)中。
5. 版本控制
- ElasticSearch的版本号的取值范围1到2^63-1;
- ElasticSearch采用乐观锁来保证数据的一致性。当用户对文档进行操作时,并不需要对该文档进行加锁和解锁的操作,只需指定要操作的版本即可。当版本号一致时,ES允许该操作顺利进行,而当版本号存在冲突时,ES会提示冲突并抛出异常version conflict;
- 内部版本控制:使用_version字段;
- 外部版本控制:ES处理外部版本号与对内部版本号的处理会有些不同。它不再检查version是否与请求中指定的数值相同,而是检查当前的version是否比指定的数值小,若请求成功,则外部的版本号会被存储到文档中的_version中;
- 外部版本控制:version_type=external。
# 内部版本控制,注意传入的version一定要和现存的相同
PUT /lib4/user/1?version=4
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 36,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
# 外部版本控制,注意传入的version一定要大于现存的version
PUT /lib4/user/1?version=5&version_type=external
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 36,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
6. 什么是mapping
- mapping映射:
mapping定义了type中的每个字段的数据类型,以及这些字段如何分词等相关属性。 - ElasticSearch会自动创建index,type,如果不指定mapping,会自动创建type对应的动态映射(dynamic mapping)。
# 查看ES创建的mapping
GET /lib4/user/_mapping
- 创建索引时,可以预先定义字段的类型以及相关属性,可以把日期字段处理成日期,把数字字段处理成数字,把字符串字段处理字符串值等。
# 创建索引时,也创建字段的映射类型
PUT /lib6
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0
},
"mappings": {
"books": {
"properties": {
"title": {
"type": "text"
},
"name": {
"type": "text",
"index": false
},
"publish_date": {
"type": "date",
"index": false
},
"price": {
"type": "double"
},
"number": {
"type": "integer"
}
}
}
}
}
以下为2020.09.29通过ElasticSearch7.9.2的测试代码
主要去除了type的概念
#添加索引
PUT /lib/
{
"settings":{
"index":{
"number_of_shards": 3,
"number_of_replicas": 0
}
}
}
PUT /lib2/
{
"settings":{
"index":{
"number_of_shards": 3,
"number_of_replicas": 0
}
}
}
#查看lib索引信息
GET /lib/_settings
#查看所有索引信息
GET /_all/_settings
#添加文档
#指定id为1
PUT /lib/_doc/1
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
#不指定id会自动生成
POST /lib/_doc/
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 23,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
#查看文档
GET /lib/_doc/1
# 查询所有文档
GET /lib2/_search
# 指定字段
GET /lib/_doc/1?_source=age,interests
# 会更新整个文档
PUT /lib/_doc/1
{
"first_name" : "Jane111",
"last_name" : "Smith111",
"age" : 36,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
# 更新某个字段
POST /lib/_update/1
{
"doc":{
"age":40
}
}
# 删除一个文档
DELETE /lib/_doc/1
# 删除某个文档中的某个字段
POST /lib/_update/1
{
"script": {
"source": "ctx._source.remove(\"first_name\")"
}
}
#删除一个索引
DELETE /lib
# 批量获取文档_mget
# 获取不通索引下的文档
GET /_mget
{
"docs":[
{
"_index": "lib",
"_id": 1
},
{
"_index": "lib",
"_id": 2
}
]
}
# 通过_source指定具体的字段
GET /_mget
{
"docs":[
{
"_index": "lib",
"_id": 1,
"_source": "interests"
},
{
"_index": "lib",
"_id": 2,
"_source": ["age","interests"]
}
]
}
# 获取同索引下的不同文档
GET /lib/_mget
{
"docs":[
{
"_id": 1,
"_source": "interests"
},
{
"_id": 2,
"_source": ["age","interests"]
}
]
}
# 获取同索引下的不同文档(简化写法)
GET /lib/_mget
{
"ids": ["1","2"]
}
# bulk批量操作
POST /lib2/_bulk
{"index":{"_id":1}}
{"title":"Java","price":55}
{"index":{"_id":2}}
{"title":"Html5","price":45}
{"index":{"_id":3}}
{"title":"Php","price":35}
{"index":{"_id":4}}
{"title":"Python","price":50}
#批量获取
GET /lib2/_mget
{
"ids": ["1","2","3","4"]
}
# bulk删除没有请求体
POST /lib2/_bulk
{"delete":{"_index":"lib2","_id":4}}
# 查看ES创建的mapping
GET /lib1/_mapping