ElasticSearch基操

最新推荐文章于 2022-05-27 21:47:12 发布

拾 -.-

最新推荐文章于 2022-05-27 21:47:12 发布

阅读量316

点赞数

分类专栏：数据库文章标签： elasticsearch es 分布式

本文链接：https://blog.csdn.net/zk86547462/article/details/109905310

版权

数据库专栏收录该内容

11 篇文章 0 订阅

订阅专栏

基本概念：

索引（indices）-------------------Databases 数据库
类型（type）----------------------Table 数据表
文档（Document）---------------Row 行
字段（Field）---------------------Columns 列

详细说明：

文档、类型、索引及映射

概念	说明
索引库（indices)	indices是index的复数，代表许多的索引，
类型（type）	类型是模拟mysql中的table概念，一个索引库下可以有不同类型的索引，比如商品索引，订单索引，其数据格式不同。不过这会导致索引库混乱，因此未来版本中会移除这个概念
文档（document）	存入索引库原始的数据。比如每一条商品信息，就是一个文档
字段（field）	文档中的属性
映射配置（mappings）	字段的数据类型、属性、是否索引、是否存储等特性

节点、集群、分片及副本

概念	说明
节点（node）	一个节点是一个Elasticsearch的实例。在服务器上启动Elasticsearch之后，就拥有了一个节点。如果在另一台服务器上启动Elasticsearch，这就是另一个节点。甚至可以通过启动多个Elasticsearch进程，在同一台服务器上拥有多个节点。
集群（cluster）	多个协同工作的Elasticsearch节点的集合被称为集群。在多节点的集群上，同样的数据可以在多台服务器上传播。这有助于性能。这同样有助于稳定性，如果每个分片至少有一个副本分片，那么任何一个节点宕机后，Elasticsearch依然可以进行服务，返回所有数据。但是它也有缺点：必须确定节点之间能够足够快速地通信，并且不会产生脑裂（集群的2个部分不能彼此交流，都认为对方宕机了）。
分片（shard）	集群允许系统存储的数据总量超过单机容量。为了满足这个需求，Elasticsearch将数据散布到多个物理的Lucene索引上去。这些Lucene索引被称为分片，而散布这些分片的过程叫作分片处理（sharding）。Elasticsearch会自动完成分片处理，并且让用户看来这些分片更像是一个大的索引。除了Elasticsearch本身自动进行分片处理外，用户为具体的应用进行参数调优也是关重要的，因为分片的数量在创建索引的时就被配置好了，之后无法改变，除非创建一个新索引并重新索引全部数据。
副本（replica）	分片处理允许用户推送超过单机容量的数据至Elasticsearch集群。副本则解决了访问压力过大时单机无法处理所有请求的问题。分片可以是主分片，也可以是副本分片，其中副本分片是主分片的完整副本。副本分片用于搜索，或者是在原有的主分片丢失后成为新的主分片。

要注意的是：Elasticsearch 本身就是分布式的，因此即便你只有一个节点，Elasticsearch 默认也会对你的数据进行分片和副本操作，当你向集群添加新数据时，数据也会在新加入的节点中进行平衡。

初步探索

1._cat
GET /_cat/nodes: 查看所有节点
GET /_cat/health: 查看es健康状况
GET /_cat/master：查看主节点
GET /_cat/indices: 查看所有索引 show databases;
在这里插入图片描述
简单正删改查

1.查看全部索引

GET:192.168.100.102:9200/_cat/indices

2.创建名为news的索引

PUT:192.168.100.102:9200/news

3.新增document

PUT:192.168.100.102:9200/news/new/6
{
  "title":"title test",
  "conent":"content test"
}

4.全量修改用PUT，部分修改用POST（如果用全量修改（PUT），就等于是删除之后再新建）

PUT：192.168.100.102:9200/news/new/6
POST：192.168.100.102:9200/news/new/6

5.查询document

GET：192.168.100.102:9200/news/new/6

6.删除document

DELETE:192.168.100.102:9200/news/new/6

查询

#查询全部，按照账号升序，账号相同的情况下，再按照余额降序，匹配数为10到19：
GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" },
     { "balance": "desc" }
  ],
  "from": 10,
  "size": 10
}

#返回指定字段

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" },
     { "balance": "desc" }
  ],
  "from": 10,
  "size": 10,
  "_source": ["firstname","balance"]
}

#匹配指定值
GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}
GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}
#多字段匹配（“address”和“city”字段任意一个字段匹配到mill就可以）
GET /bank/_search
{
  "query": {
    "multi_match": {
      "query": "mill",
      "fields": ["address","city"]
    }
  }
}
#匹配(精确值匹配（非文本）)term
GET /bank/_search
{
  "query":
  { "term": 
  { "age": 13 } 
    
  }
}
#匹配(文本精确值匹配（文本）).keyword
GET /bank/_search
{
 "query": {
   "match": {
     "address.keyword": "mill lane"
   }
 }
}


#符合查询（组合多种查询条件）bool  must:必须满足 
must_not必须不满足 
should(能匹配到最好，匹配不上也没关系，满足的话分更高) 
filter起到过滤作用（和must一样，但是不会计算相关性得分，会直接过滤点不符合filter的东西）


GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "gender": "M"
          }
          , "match_phrase": {
            "address": "mill lane"
          }
          
        }
      ]
      , "must_not": [
        {
          "match": {
            "age": "38"
          }
        }
      ]
      , "should": [
        {
          "match": {
            "lastname": "Wallace"
          }
        }
      ], "filter": [
        {
          "range": {
            "age": {
              "gte": 18,
              "lte": 30
            }
          }
        }
      ]
    }
  }
}


#aggregations聚合查询

#搜索出所有人的年龄分布，以及平均年龄
GET /bank/_search
{
  "query": {
    "match_all": {}
  }
  ,"aggs": {
    "aggAgg": {
      "terms": {
        "field": "age",
        "size": 10 
      }
    },"aggAvg": {
      "avg": {
        "field": "age"
      }
    }
  }
  , "size": 0
}

#(复杂)按照年龄聚合，并且请求这些年龄段的这些人平均薪资
GET /bank/_search
{
  "query": {
    "match_all": {}
  }
  , "aggs": {
    "ageAss": {
      "terms": {
        "field": "age",
        "size": 10
      }
      , "aggs": {
        "ageAvg": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

#(复杂)查出年龄分布，并且这些年龄段中M和F的平均薪资，以及这个年龄段的平均薪资
GET /bank/_search
{
  "query": {
    "match_all": {}
  }
  , 
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 100
      }
      , "aggs": {
        "ganderAgg": {
          "terms": {
            "field": "gander.keyword"
          },
          "aggs": {
            "balanceAvg": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      , "aggAvg":{
      "avg": {
        "field": "balance"
      }
    }
      }
      
    }
  }
}

Mapping
类型参考
 字段映射所使用的各种映射参数

#查看映射
GET my_index/_mapping

PUT my_index
{
  "mappings": {
    "properties": {
      "age":    { "type": "integer" },  
      "email":  { "type": "keyword"  }, 
      "name":   { "type": "text"  }     
    }
  }
}
https://www.elastic.co/guide/en/elasticsearch/reference/7.9/mapping-params.html
#添加映射(index相当于冗余存储，不被检索，默认true)
PUT my_index/_mapping
{

    "properties": {
      "address":{
        "type": "text",
        "index" : false,
        "doc_values":false
      }
    }
  
}

#数据扁平化处理https://www.elastic.co/guide/en/elasticsearch/reference/7.9/nested.html
主要解决对象关系存储
需要手动指定索引各字段的类型 并指定关系对象为nested类型
https://blog.csdn.net/qq_29857681/article/details/88011313

#数据迁移_reindex
POST _reindex
{
  "source": {
    "index": "my-index-000001"
  },
  "dest": {
    "index": "my-new-index-000001"
  }
}

分词
ik分词器github

#ik分词器
#配置ik分子器(会将文本做最细粒度的拆分，比如会将“中华人民共和国人民大会堂”拆分为“中华人民共和国、中华人民、中华、华人、人民共和国、人民、共和国、大会堂、大会、会堂等词语。)
POST /jingdongs/_mapping
{
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
}
}
}
#ik_smart会做最粗粒度的拆分，比如会将“中华人民共和国人民大会堂”拆分为中华人民共和国、人民大会堂。
POST /jingdongs/_mapping
{
"properties": {
"title": {
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart"
}
}
}
#最佳实践两种分词器使用的最佳实践是：索引时用ik_max_word，在搜索时用ik_smart。

自定义分词
https://blog.csdn.net/fadgafdgfdg/article/details/82702163

GET product/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "skuTitle": "手机"
          }
        }
      ],
      "filter": [
        {
          "term": {
            "catalogId": 225
          }
        },
        {
          "terms": {
            "brandId": [
              "1",
              "2",
              "5",
              "9"
            ]
          }
        },
        {
          "term": {
            "hasStock": "true"
          }
        },
        {
          "nested": {
            "path": "attrs",
            "query": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "attrs.attrId": {
                        "value": "27"
                      }
                    }
                  },
                  {
                    "terms": {
                      "attrs.attrValue": [
                        "海思（Hisilicon）",
                        "骁龙765"
                      ]
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "range": {
            "skuPrice": {
              "gte": 0,
              "lte": 5000
            }
          }
        }
      ]
    }
  },
  "sort": [
    {
      "skuPrice": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 20,
  "highlight": {
    "fields": {
      "skuTitle": {}
    },
    "pre_tags": "<span style='color:red'>",
    "post_tags": "</span>"
  },
  "aggs": {
    "brand-agg": {
      "terms": {
        "field": "brandId",
        "size": 10
      },
      "aggs": {
        "brandname-agg": {
          "terms": {
            "field": "brandName",
            "size": 10
          }
        },
        "brandimg-agg": {
          "terms": {
            "field": "brandImg",
            "size": 10
          }
        }
      }
    },
    "catalogId-agg": {
      "terms": {
        "field": "catalogId",
        "size": 10
      },
      "aggs": {
        "catalogname-agg": {
          "terms": {
            "field": "catalogName",
            "size": 10
          }
        }
      }
    },
    "attr-agg": {
      "nested": {
        "path": "attrs"
      },
      "aggs": {
        "attrid-agg": {
          "terms": {
            "field": "attrs.attrId",
            "size": 10
          },
          "aggs": {
            "attr-name": {
              "terms": {
                "field": "attrs.attrName",
                "size": 10
              }
            },
            "attr-vale": {
              "terms": {
                "field": "attrs.attrValue",
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}