ElasticSearch学习

最新推荐文章于 2024-09-14 10:22:18 发布

小马宝莉马

最新推荐文章于 2024-09-14 10:22:18 发布

阅读量1.5k

点赞数 42

文章标签： elasticsearch 学习大数据

本文链接：https://blog.csdn.net/congbao_/article/details/139906587

版权

索引

类型 es6之后移除了类型

文档

SearchAPI

1.cat
http://192.168.56.10:9200/_cat/nodes 查看所有节点
http://192.168.56.10:9200/_cat/health 查看es健康状况
http://192.168.56.10:9200/_cat/master 查看主节点
http://192.168.56.10:9200/_cat/indices 查看所有索引

2.索引一个文档(保存)【保存一条记录】

保存一个数据，保存在哪个索引的哪个类型下，指定用哪些唯一标识【保存到哪个数据库的哪张表下】

put&post

put必须带id
put带id保存发送多次是一个更新操作
http://192.168.56.10:9200/customer/external/1
{
"name":"John Doe"
}

post保存
新增：不带id，带id但之前没数据
修改：带ID，并且有数据
http://192.168.56.10:9200/customer/external/
{
"name":"John Doe"
}

3.查询文档

get

http://192.168.56.10:9200/customer/external/1
结果：
{
"_index": "customer",
"_type": "external",
"_id": "1", //记录id
"_version": 2, //版本号
"_seq_no": 1, //并发控制字段，每次更新就+1，用来做乐观锁
"_primary_term": 1, //同上，主分片重新分配，如重启，就会变化
"found": true,
"_source": {
"name": "John Doe" //真正的内容
}
}

更新携带?if_seq_no=1&if_primary_term=1

4.更新文档

post
post带_update 对比原来数据，与原来一样就什么都不做，version,seq_no都不改变
http://192.168.56.10:9200/customer/external/1/_update

结果：
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 4,
"result": "noop",
"_shards": {
"total": 0,
"successful": 0,
"failed": 0
},
"_seq_no": 4,
"_primary_term": 1
}

put 或者 post不带_update都不会对比原来数据，都会更新版本信息

5.删除文档&索引

DELETE customer/external/1
DELETE customer

6.批量_bulk
POST /customer/external/_bulk
{"index":{"_id":"1"}}
{"name":"John Doe"}
{"index":{"_id":"2"}}
{"name":"John Doe"}

POST _bulk
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"My second blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"My second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"My update blog post"}}

es学习

基本检索功能
两种检索
1.通过使用REST uri发送搜索参数（uri+检索参数）
2.使用REST body来发送他们（uri+请求体）

1.GET bank/_search?q=*&sort=account_number:asc
查询bank下 q=* 查全部 sort 排序 account_number:asc按这个字段升序查询

2.GET /bank/_search
{ // 请求体叫 Query DSL
"query": {
"match_all": {}
},
"sort": [
{
"account_number": "asc"
}
]
}

查询bank索引下查询范围为全部按照account_number升序排序查数据

命中的信息在hits中有每一条记录 _source代表当前记录的完整信息

Query DSL基本语法

GET /bank/_search
{
"query": {"match_all": {}},
"sort": [
{
"account_number":"asc"
}],
"from":"0",
"size":5,
"_source": ["account_number","firstname"]

}
全文检索
match用法

查询某条数据
GET /bank/_search
{
"query": {
"match": {
"balance": 16623
}
}
}

模糊匹配
GET /bank/_search
{
"query": {
"match": {
"address" : "mill lane"
}
}
}

##全文检索按照评分进行排序，会对检索条件进行分词匹配

match_phrase[短语匹配]
将需要匹配的值当成一个整体单词（不分词）进行检索
GET /bank/_search
{
"query": {
"match_phrase": {
"address" : "mill lane"
}
}
}

multi_match[多字段匹配]
state或者address包含mile或者Urie
GET /bank/_search
{
"query": {
"multi_match": {
"query": "Mill Urie",
"fields": ["address","state"]
}
}
}

bool[复合查询]

must 必须满足 must_not 必须不满足 should 最好满足
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"gender": "M"
}
},
{
"match": {
"address": "Mill"
}
}
],
"must_not": [
{
"match": {
"age": "18"
}
}
],
"should": [
{
"match": {
"lastname": "Holland"
}
}
]
}
}
}

filter[结果过滤]

与must区别是不计算相关得分 "_score" : 0.0,
GET bank/_search
{
"query": {
"bool": {
"must": [
{"range": {
"age": {
"gte": 18,
"lte": 28
}
}}
]
}
}
}

GET bank/_search
{
"query": {
"bool": {
"filter":{
"range": {
"age": {
"gte": 18,
"lte": 28
}
}
}
}
}
}

term

和match一样。匹配某个属性的值。全文检索字段用match,其他非text字段用term
使用term查文本字段容易查不到记录，因为match分词了
匹配字段时候match_phrase和address.keyword都可以匹配固定关键字，不同的是address.keyword没有分词，后面跟的就是完整的字段值。match_phrase是字段值里只要包含就可以查出记录
GET bank/_search
{
"query": {
"term": {
"balance" : 5686
}
}
}

GET bank/_search
{
"query": {
"match_phrase": {
"address": "Street"
}
}
}

GET bank/_search
{
"query": {
"match": {
"address.keyword": "451 Humboldt Street"
}
}
}
##精确匹配

aggregations[执行聚合]

聚合提供了从数据中分组和提取数据的能力。大致等于sql GROUPBY 和sql聚合函数。

##搜索address中包含mill的所有人的年龄分布以及平均年龄

GET bank/_search
{
"query": {
"match": {
"address": "mill"
}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 10
}
},
"ageAvg":{
"avg": {
"field": "age"
}
},
"balanceAvg":{
"avg": {
"field": "balance"
}
}
}，

"size": 0
}

##size=0不显示搜索数据，只看数量

##复杂：按照年龄聚合，并且请求这些年龄段的这些人的平均薪资

GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 100
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
}
}
}

##复杂：查出所有年龄分布，并且这些年龄段中性别M的平均薪资和性别F的平均薪资以及这个年龄段的总体平均薪资

GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 100
},
"aggs": {
"genderAgg": {
"terms": {
"field": "gender.keyword",
"size": 10
},
"aggs": {
"balanceAgg": {
"avg": {
"field": "balance"
}
}
}
},
"ageBalanceAvg":{
"avg": {
"field": "balance"
}
}
}
}
}
}

mapping映射

查看mapping信息

GET bank/_search

修改mapping信息参考：https://www.elastic.co/guide/en/elasticsearch/reference/7.5/mapping.html

PUT /my_index
{
"mappings": {
"properties": {
"age":{"type": "integer"},
"email":{"type": "keyword"},
"name":{"type": "text"}
}
}
}

##创建索引指定映射

PUT /my_index
{
"mappings": {
"properties": {
"age":{"type": "integer"},
"email":{"type": "keyword"},
"name":{"type": "text"}
}
}
}

##修改索引仅限于添加新字段映射

PUT /my_index/_mapping
{
"properties":{
"employee-id":{
"type":"keyword",
"index":false
}
}
}

更新映射：对于已经存在的映射字段，我们不能更新。更新必须创建新的索引进行数据迁移

数据迁移

先创建出新索引正确的映射，然后使用下面方式进行数据迁移

##迁移数据6.0版本前
POST _reindex
{
"source": {
"index": "bank",
"type": "account"
},
"dest": {
"index": "newbank"
}
}

##迁移数据6.0版本后
POST _reindex
{
"source": {
"index": "bank"
},
"dest": {
"index": "newbank"
}
}

分词

安装好ik分词器

POST _analyze
{
"analyzer": "ik_max_word",
"text":"我是中国人"
}

ElasticSearch整合

elasticsearch-Rest-Client
1）9300: TCP

spring-data-elasticsearch:transport-api.jar;
springboot版本不同，ransport-api.jar不同，不能适配es版本。
7.x已经不建议使用，8以后就要废弃。
2）9200: HTTP

JestClient: 非官方，更新慢；

RestTemplate：模拟HTTP请求，ES很多操作需要自己封装，麻烦；

HttpClient：同上；

Elasticsearch-Rest-Client：官方RestClient，封装了ES操作，API层次分明，上手简单；
最终选择Elasticsearch-Rest-Client（elasticsearch-rest-high-level-client）；Search API | Java REST Client [7.17] | Elastic

1、导入依赖 elasticsearch-rest-high-level-client

2、编写配置,给容器注入 RestHighLevelClient