ES 数据库基础

望晓天

已于 2024-02-01 11:08:11 修改

阅读量4.2k

点赞数 16

分类专栏：应用与工具文章标签： elasticsearch 数据库

于 2024-02-01 10:52:34 首次发布

本文链接：https://blog.csdn.net/sinat_41117967/article/details/135968839

版权

应用与工具专栏收录该内容

2 篇文章 0 订阅

订阅专栏

官方文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html
API文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/rest-apis.html

Elastic Search 简介

Elastic Search 文档型数据库，文档是基础存储单元，所有的数据以json的形式存储在数据库中，es的优势在于查询，使用lucene 查询引擎，可以实现对字段等信息的快速查找，基于java语言开发，lucene将词组进行分类，按照词组类型，进行快速检索。同时，es也是一种分布式存储数据库，多台es数据端服务器，可以组成es数据集群，通过客户端访问该集群，可以实现对服务对象的透明化管理.

趣闻：
Lucene 是一个高性能的搜索引擎库，它提供索引数据和搜索数据的功能，内部非常复杂，elasticsearch利用了lucene的高性能，封装了它的复杂性，对外可以提供rest接口，不同语言的应用都可以调用。2004年，shay banon失业了，准备给他厨师老婆做一个食谱的搜索引擎，如果直接使用Lucene很难，所以他把使用lucene抽象了一下，并开源了，开发者可以直接在程序里使用compass来进行搜索，2010年的时候，它已经重构了compass，取名为elasticsearch，支持分布式和水平扩展。

Elastic Search 关键概念

ES中的一些关键概念：

文档（Document）：ES中的基本信息单位，通常用JSON格式表示。
索引（Index）：用于存储相关文档的集合。
类型（Type）：在ES 7.x版本之前，一个索引中可以有多个类型，用来表示索引中不同类别的文档。从7.x版本开始，ES废弃了类型概念，推荐每个索引只处理一种类型的文档。
分片（Shard）：索引被分割的单元，每个分片本质上是一个独立的索引，可以放置在集群中的任何节点上。
副本（Replica）：分片的拷贝，用于提供数据冗余和提高查询性能。

Elastic Search 部署

ElasticSearch 安装

平台：vbox 虚拟机
操作系统：fedaro 36
ES版本：8.12.0

部署步骤：

安装操作系统：略
从官网下载相关rpm包：
官方下载地址：https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.12.0-x86_64.rpm
执行安装命令

   rpm -i elasticsearch-8.12.0-x86_64.rpm

以上步骤即可完成一个es的一个简单安装过程

ElasticSearch 单节点集群部署

下面展示一个es的一个默认配置文件：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
#network.host: 192.168.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Allow wildcard deletion of indices:
#
#action.destructive_requires_name: false

#----------------------- BEGIN SECURITY AUTO CONFIGURATION -----------------------
#
# The following settings, TLS certificates, and keys have been automatically      
# generated to configure Elasticsearch security features on 31-01-2024 05:29:02
#
# --------------------------------------------------------------------------------

# Enable security features
xpack.security.enabled: true

xpack.security.enrollment.enabled: true

# Enable encryption for HTTP API client connections, such as Kibana, Logstash, and Agents
xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12

# Enable encryption and mutual authentication between cluster nodes
xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12
# Create a new cluster with the current node only
# Additional nodes can still join the cluster later
cluster.initial_master_nodes: ["localhost"]

# Allow HTTP API connections from anywhere
# Connections are encrypted and require user authentication
http.host: 0.0.0.0

# Allow other nodes to join the cluster from anywhere
# Connections are encrypted and mutually authenticated
#transport.host: 0.0.0.0

#----------------------- END SECURITY AUTO CONFIGURATION -------------------------

下面就几个重点配置说明：

cluster.name: my-application 集群名称，对于同一个集群中的节点，该配置应该是相同的
node.name: node-1 节点名称：对于同一个集群中的不同节点，都应该有对应的节点名称
path.data: /var/lib/elasticsearch : es 数据存储位置
path.logs: /var/log/elasticsearch： es 日志存储位置
network.host: 192.168.0.1：节点ip
http.port: 9200： http 访问端口
discovery.seed_hosts: [“host1”, “host2”] : 集群配置信息（对于这个配置还有另外一种表示方式，后面会做出说明）
cluster.initial_master_nodes: [“node-1”, “node-2”]：指定被选为主节点的节点名称
xpack.security.enabled: true：es 安全扩展包是否启用
xpack.security.enrollment.enabled: true：启用自动节点加入功能（为简化集群部署用的）

默认情况下，es服务监听9200端口用于向客户端提供服务，当使用rpm包安装结束之后，默认开启x-pack 扩展安全功能，次功能包括是否允许ssl加密通讯，账号密码管理等，后面为了方便测试与实验，将关闭xpack 部分功能，主要包括登录验证，ssl加密通讯。

下面是单节点es部署基本配置信息：

相较于前面给出的默认配置，进行了相应的调整，

集群名称调整为myapp
关闭了xpack 部分功能
cluster.initial_master_nodes: [“node1”] ：此配置是用来竞争选定主节点的，默认情况下是只允许本地访问,需要修改此选项，将本节点的名称放上去


cluster.name: myapp

node.name: node1

path.data: /var/lib/elasticsearch

path.logs: /var/log/elasticsearch

network.host: 172.16.20.73

http.port: 9200

xpack.security.enabled: false

xpack.security.enrollment.enabled: true

# Enable encryption for HTTP API client connections, such as Kibana, Logstash, and Agents
xpack.security.http.ssl:
  enabled: false
  keystore.path: certs/http.p12

# Enable encryption and mutual authentication between cluster nodes
xpack.security.transport.ssl:
  enabled: false
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12

cluster.initial_master_nodes: ["node1"]


http.host: 0.0.0.0

以上配置删减了注释信息，这时候使用systemctl restart elasticsearch 即可成功将es服务启动

使用systemctl status elasticsearch 查看服务状态

[root@localhost elasticsearch]# systemctl status elasticsearch 
 elasticsearch.service - Elasticsearch
     Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: active (running) since Wed 2024-01-31 13:13:11 CST; 1h 49min ago
       Docs: https://www.elastic.co
   Main PID: 1707 (java)
      Tasks: 92 (limit: 4645)
     Memory: 1.4G
        CPU: 1min 58.844s
     CGroup: /system.slice/elasticsearch.service
             ├─1707 /usr/share/elasticsearch/jdk/bin/java -Xms4m -Xmx64m -XX:+UseSerialGC -Dcli.name=server -Dcli.script=/usr/share/elasticsearch/bin/elasticsearch -Dcli.libs=lib>
             ├─1767 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -Djava.security.manager=allow -XX:+AlwaysPreT>
             └─1788 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Jan 31 13:12:46 localhost.localdomain systemd[1]: Starting elasticsearch.service - Elasticsearch...
Jan 31 13:12:48 localhost.localdomain systemd-entrypoint[1767]: CompileCommand: exclude org/apache/lucene/util/MSBRadixSorter.computeCommonPrefixLengthAndBuildHistogram bool excl>
Jan 31 13:12:48 localhost.localdomain systemd-entrypoint[1767]: CompileCommand: exclude org/apache/lucene/util/RadixSelector.computeCommonPrefixLengthAndBuildHistogram bool exclu>
Jan 31 13:12:48 localhost.localdomain systemd-entrypoint[1707]: Jan 31, 2024 1:12:48 PM sun.util.locale.provider.LocaleProviderAdapter <clinit>
Jan 31 13:12:48 localhost.localdomain systemd-entrypoint[1707]: WARNING: COMPAT locale provider will be removed in a future release
Jan 31 13:13:11 localhost.localdomain systemd[1]: Started elasticsearch.service - Elasticsearch.

此时使用浏览器访问9200端口则有：
在这里插入图片描述

此时可以看到，es服务已经启用，集群名称 myapp, 节点名称为：node1 以及一些其他信息

这里还有一个小问题需要注意，一般情况下，服务器都会有防火墙配置，如果没有修改防火墙配置，会出现明明服务已经启动但是就是无法访问的问题，为简单起见我这里是直接禁用了防火墙，实际生产环境则要对防火墙防护策略做出调整 systemctl stop firewalld

此时一个es单节点集群就已经能够正常使用了

ElasticSearch 多节点集群部署

es数据库单节点部署很简单对吧，其实多节点部署只是对配置文件进行调整，然后直接重启即可，这里的举例以8.12版本为主，在之前更早的版本中，有些配置已经被弃用，为了精简配置选项，很多步骤实现了自动化部署。
下面是调整之后的配置文件:

相较于单节点集群，多节点集群需要调整的配置主要为discovery.seed_hosts: 字段，需要添加相对应的集群中节点的地址，可以是ip加端口的形式，也可以是域名，url，等等，但要保证能通过这个地址正确访问到对应节点，默认端口9300
另外一个需要调整的就是cluster.initial_master_nodes: 字段，这里需要指定由哪些节点去竞争做主节点。ES数据库虽然是集群化部署，但也需要一个主节点去统一协调管理整个集群数据，在实际实践过程中，不可能让所有的节点都去竞争主节点，这样的话会影响其运行效率，该字段就是为了指定哪些节点去竞争主节点。需要说明的是，该配置只需要在第一次启动集群部署时需要配置，当集群主节点竞选结束之后，需要将该配置删除，不然会导致后续重启时出现不可预知的错误。

注：

这里这两个配置的书写方式较之前面发生了一些变化，这两个字段支持两种书写方式，一种是前面的[" ",‘’ "], 另一种就是这种换行的方式
在进行多节点部署时遇到了一个问题，就是当我最开始尝试将单节点扩展为双节点时，怎么都无法将两个节点关联起来，防火墙也检查了，甚至抓包看了一下双方的通讯消息，可以明确看到双方已经有了通讯，但是查询节点信息一直都是各自为政。经过一阵排查发现，在单节点部署时，cluster.initial_master_nodes 已经将本节点当作了主节点，然后再data目录下做了记录，当两个节点进行协商时，互相发现对方都是主节点，导致协商失败，删除path.data 目录下的数据，即可解决该问题。

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: myapp
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: false
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
network.host: 172.16.20.73
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: 
  - 172.16.20.73:9300
  - 172.16.20.74:9300
  - 172.16.20.75:9300
  - 172.16.20.76:9300
  - 172.16.20.77:9300
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1"]
#discovery.type: single-node
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Allow wildcard deletion of indices:
#
#action.destructive_requires_name: false

#----------------------- BEGIN SECURITY AUTO CONFIGURATION -----------------------
#
# The following settings, TLS certificates, and keys have been automatically      
# generated to configure Elasticsearch security features on 29-01-2024 02:35:42
#
# --------------------------------------------------------------------------------

# Enable security features
xpack.security.enabled: false

xpack.security.enrollment.enabled: true

# Enable encryption for HTTP API client connections, such as Kibana, Logstash, and Agents
xpack.security.http.ssl:
  enabled: false 
  keystore.path: certs/http.p12

# Enable encryption and mutual authentication between cluster nodes
xpack.security.transport.ssl:
  enabled: false
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12
# Create a new cluster with the current node only
# Additional nodes can still join the cluster later
#cluster.initial_master_nodes: 
#  - node1
#  - node2

cluster.initial_master_nodes: 
  - node1
  - node2
  - node3

# Allow HTTP API connections from anywhere
# Connections are encrypted and require user authentication
http.host: 0.0.0.0

# Allow other nodes to join the cluster from anywhere
# Connections are encrypted and mutually authenticated
#transport.host: 0.0.0.0

#----------------------- END SECURITY AUTO CONFIGURATION -------------------------

多节点部署结果展示：

在这里插入图片描述

这里使用的图形化工具叫做elasticvue 一款开源的es查看检索工具，es官网也提供了一个kibana 工具,用起来比较复杂，这里贴个图

ElasticSearch 前后端通讯

ElasticSearch 在7.x 以前的版本中，同时维护着三种不同的通讯协议，分别是基于HTTP的Elasticsearch REST APIs；基于tcp链接的client通讯协议，是使用java写的一个es客户端工具，7.x版本以后已经弃用，es官方一直推广的也是REST APIs，此通讯协议官方并没有给出对应文档，要想了解，只能去翻看客户端源码；基于tcp链接的节点间的通讯协议，用于集群内部之间的通讯，同样没有给出文档，只有翻看开源代码。

ElasticSearch 是使用java语言开发的一套基于lenece 检索工具的拥有强大检索能力的文档型数据库，其基本存储单元是json文档，又使用分词，分片等方法提高其对文档的检索能力，这里不对其实现细节与使用方法做详细解读，重点说明Elasticsearch REST APIs

elasticsearch 前后端通讯协议基于http协议实现，通过对外提供REST APIs 接口，用来实现对es数据库的各种操作

api 约定

请求类型说明
请求正文中发送的内容类型必须使用Content-Type标头指定。此标头的值必须映射到 API 支持的格式之一。大多数 API 支持 JSON、YAML、CBOR 和 SMILE。批量和多搜索API支持NDJSON、JSON和SMILE；其他类型将导致错误响应。
使用source查询字符串参数时，必须使用查询字符串参数指定内容类型source_content_type。
Elasticsearch 仅支持 UTF-8 编码的 JSON。Elasticsearch 会忽略随请求发送的任何其他编码标题。响应也是 UTF-8 编码的。

文档中明确说明请求正文中必须要求有content-type 类型，对于get 域head请求，不包含请求主体，因此在实践中常常不包含此部分内容，对于put以及post请求，则必须要求有content-type 字段。

es数据库是基于json设计的数据库，原则上只支持json结构文档，但实际使用过程中，其实也有yaml 等内容的消息类型，下面是一个yaml类型样例

GET 和 POST 请求编辑
许多 Elasticsearch GET API（尤其是搜索 API）支持请求正文。虽然 GET 操作在检索信息的上下文中有意义，但并非所有 HTTP 库都支持带有正文的 GET 请求。所有需要正文的 Elasticsearch GET API 也可以作为 POST 请求提交。或者，您可以在使用 GET 时将请求正文作为 source查询字符串参数传递。

REST APIs 兼容性说明

为了帮助 REST 客户端减轻不兼容（破坏性）API 更改的影响，Elasticsearch 提供了按请求选择加入的 API 兼容性模式。

对于api的请求兼容，一般通过content-type 指定需要兼容的内容，例如：

Accept: "application/vnd.elasticsearch+json;compatible-with=7"
Content-Type: "application/vnd.elasticsearch+json;compatible-with=7"

7.x-8.x 版本一般支持的兼容性选型有如下四种

"application/vnd.elasticsearch+json;compatible-with=7"
"application/vnd.elasticsearch+yaml;compatible-with=7"
"application/vnd.elasticsearch+smile;compatible-with=7"
"application/vnd.elasticsearch+cbor;compatible-with=7"

文档api

添加一个json 文档到一个指定的索引，如果该指定文档已经存在，则会直接更新该文档。

请求消息如下：

PUT /<target>/_doc/<_id>

POST /<target>/_doc/

PUT /<target>/_create/<_id>

POST /<target>/_create/<_id>

GET <index>/_doc/<_id>

HEAD <index>/_doc/<_id>

GET <index>/_source/<_id>

HEAD <index>/_source/<_id>

DELETE /<index>/_doc/<_id>

POST /<target>/_delete_by_query

POST /<index>/_update/<_id>

POST /<target>/_update_by_query

7.9 以上的版本中引入了一个新的数据结构——数据流，具体数据流跟索引有什么区别，后面会详细说明。对于数据流，不能使用 put /doc/ 请求，对于指定的文档id应该使用 put //_create/

这里的put和post 方法在使用上是相同的。

在api后面可以添加参数，比如

PUT my-index-000001/_doc/1?timeout=5m
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

这个 PUT 请求在Elasticsearch中创建或更新一个具有特定ID的文档。

PUT: HTTP方法，用于创建或替换资源。
my-index-000001: 目标索引的名称，在这个索引中将创建或更新文档。
/_doc/1: 请求路径，/_doc 表示在Elasticsearch 7.x及更新版本中文档类型已被弃用，1 是文档的ID。
请求的URL包含查询参数：

?timeout=5m: 表示这个请求有一个5分钟的超时时间。如果请求在5分钟内没有完成，Elasticsearch将停止处理并返回一个超时错误。
请求的正文（body）包含JSON格式的数据：

此请求会将上面的JSON数据作为文档内容存储到索引 my-index-000001 中，文档的ID为 1。如果文档ID 1 已经存在于该索引中，它将被新提供的数据替换；如果不存在，将创建一个新文档。

Mulil get

GET /_mget
{
  "docs": [
    {
      "_index": "my-index-000001",
      "_id": "1"
    },
    {
      "_index": "my-index-000001",
      "_id": "2"
    }
  ]
}

批处理操作

POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

问答

应该如何理解ES中的分词概念

在Elasticsearch（ES）中，分词（Tokenization）是指将文本字段拆分成一系列的独立单词或词语的过程。这个过程是全文搜索的基础，因为它使得搜索引擎可以建立起一个索引，通过这个索引能够快速检索到包含特定词语的文档。
以下是对分词过程的几个要点：
1.分词器（Tokenizer）
：在ES中，分词器是执行分词过程的组件。它读取文本数据，然后以单词、短语或其他有意义的元素作为标记（Token）输出。分词器可以是简单的，如按空格分隔，也可以是复杂的，如支持多种语言、理解复合词等。
2.标记（Token）
：分词过程的输出单元，可以是一个单词、数字或符号。这些标记成为索引的一部分，搜索时将对其进行匹配。
3.分析器（Analyzer）
：ES中的分析器由分词器及其前后处理的过滤器（如小写化过滤器、停用词过滤器等）组成。分析器负责整体的处理流程，包括标准化（如转换为小写）、去除无关词（停用词）、词干提取等，以便建立更加有效和相关的搜索索引。
4.映射（Mapping）
：在ES中定义一个字段时，可以指定使用特定的分析器。这个定义过程称为映射，它决定了如何索引字段以及以后如何搜索字段。
举个例子，如果有一段文本“Quick brown fox”，在进行分词时，标准分析器可能会将其分成“quick”、“brown”和“fox”三个标记，并且转换成小写。这样，当用户搜索“Quick”或者“quick”时，都能找到这段文本，因为搜索是不区分大小写的，而且已经建立了对应的索引。
分词是全文检索中非常重要的一个环节，它直接影响到搜索结果的质量和搜索性能。在ES中，理解和合理配置分词器对于优化搜索体验至关重要。

ES，我应该理解为一种数据库还是一种搜索引擎

Elasticsearch 既可以被视为一种数据库，也可以被视为一种搜索引擎，具体取决于它在系统中的应用场景。
作为搜索引擎，Elasticsearch 是专门设计用来处理复杂搜索查询的，特别是涉及全文搜索的场景。它的核心功能包括对数据进行索引和快速搜索，还有各种复杂查询的能力，如模糊搜索、近义词搜索、地理位置搜索等。它使用了倒排索引来高效地执行文本搜索。
作为数据库，Elasticsearch 能够存储和索引文档形式的数据（通常是 JSON 文档）。它具备一定的数据库特性，如数据持久化、能够处理CRUD操作（创建、读取、更新、删除），并且能够处理简单的数据聚合任务。
尽管 Elasticsearch 具备数据库的一些特性，但它在数据一致性、事务处理和多项其他传统关系型数据库功能方面并不那么强大。因此，Elasticsearch 通常与传统的数据库（如 MySQL、PostgreSQL）或非关系型数据库（如 MongoDB）结合使用，后者处理事务数据和复杂查询，而 Elasticsearch 用于支持高性能的文本搜索和数据分析。
总结来说，Elasticsearch 最常被视为一个强大的搜索引擎，但其也拥有存储和处理数据的数据库特性。

既然说到ES是分布式存储，那么对于分布式集群，每个节点之间的关系应该是怎样的，是去中心化集群还是什么

Elasticsearch 构建的是一个分布式的、去中心化的集群。在 Elasticsearch 集群中，每个节点都扮演着一定的角色，并与其他节点相互协作以确保整个集群的功能和性能。下面是 Elasticsearch 中节点之间的关系和角色划分：
1.主节点（Master Node）
：负责集群的管理和控制，例如创建或删除索引，追踪哪些节点是集群的一部分，以及决定哪些分片分配在哪个节点上。集群中有一个主节点活跃，但也有备选的主节点等待在必要时接管主节点角色。
2.数据节点（Data Node）
：存储数据，执行与数据相关的操作，如CRUD（创建、读取、更新、删除）、搜索和聚合。数据节点负责分片的存储和管理。
3.协调节点（Coordinating Node）
：接受客户端的请求，将这些请求路由到正确的数据节点，并将结果汇总返回给客户端。实际上，每个节点默认都具有协调节点的功能。
4.摄取节点（Ingest Node）
：对文档进行预处理，在文档被索引之前，执行一系列的处理流程。
5.专用节点（Dedicated Node）
：可以配置节点为专用的主节点或专用的数据节点。这意味着节点被配置为仅执行特定的任务，以优化性能和稳定性。
Elasticsearch 中的节点是通过网络连接并互相通信的，它们通过一个集中的状态管理来协调彼此的行动，这个状态包括了所有节点、索引、分片的当前状态。虽然存在主节点，但 Elasticsearch 的设计是去中心化的，因为每个节点都能够处理请求，主节点的职责主要是集群级别的管理。
节点可以动态加入或离开集群，主节点会重新分配分片来平衡集群。如果主节点失败，集群会自动选举一个新的主节点。这种设计使得 Elasticsearch 能够很好地处理节点故障，从而提供高可用性和扩展性。
在部署 Elasticsearch 时，应仔细规划节点的角色和数量，以满足性能、数据冗余和容错的需求。

es前后端通讯用的是http协议，那么，能不能详细讲一下什么情况下用什么请求

Elasticsearch确实使用HTTP REST接口进行通信，这意味着你可以使用标准的HTTP方法（GET、POST、PUT、DELETE）来进行不同类型的操作。下面是各种操作和相应HTTP方法的一些基本用例：
GET
用于检索信息，包括：

获取文档：通过特定ID检索文档。
搜索：执行查询以检索匹配的文档。
健康检查：获取Elasticsearch集群的健康状态。
获取映射：获取索引的映射信息。
列出索引：获取集群中所有索引的列表。
POST
用于提交数据，可以是：
索引文档：当文档ID是由Elasticsearch生成时。
搜索：执行复杂的查询。虽然GET也可以用于搜索，但POST在查询体积较大时更加常用。
更新文档：部分更新文档。
批量操作：可以执行多个index/delete/update操作。
PUT
用于创建或更新数据，通常用于：
创建索引：创建一个新的索引。
索引文档：在指定ID的情况下，创建新文档或替换现有文档。
创建或更新映射：为索引设置或更新映射。
DELETE
用于删除数据：
删除文档：通过ID删除特定文档。
删除索引：删除整个索引及其所有数据。
HEAD
用于检查资源是否存在：
检查文档：检查指定ID的文档是否存在。
这些HTTP动词对应的Elasticsearch操作遵循REST原则，相对直观。例如：
GET /index_name/_doc/doc_id 用于获取指定ID的文档。
POST /index_name/_doc/ 用于创建一个新文档，文档ID由Elasticsearch自动生成。
PUT /index_name/_doc/doc_id 用于创建或替换指定ID的文档。
DELETE /index_name/_doc/doc_id 用于删除指定ID的文档。
POST /index_name/_search 用于对索引进行搜索查询。
对于批量操作，还可以使用：
POST /_bulk 用于执行多个创建、删除、更新操作。
最终，选择使用哪种请求方法取决于您打算执行的特定操作，以及是否需要向Elasticsearch发送额外的数据负载（例如，查询JSON或文档数据）。这些HTTP方法的使用与Elasticsearch的RESTful风格一致：资源（如文档、索引）用URL表示，行为（如创建、获取）用HTTP方法表示。

dsl 语句又是什么，基本的语法规范是什么

DSL（Domain Specific Language）是一种特定于领域的语言，Elasticsearch DSL是一种用于构建Elasticsearch查询的JSON风格的领域特定语言。它允许你以声明的方式指定要执行的搜索，排序，过滤等操作。在Elasticsearch中，这种DSL用于构建强大的搜索能力，以及其他如聚合和过滤的功能。
Elasticsearch的DSL基于JSON，具有以下基本组件：
查询（Query）和过滤（Filter）

查询（Query）
：用于指定搜索条件，计算并为文档打分以确定相关性。
Match查询：检查字段中的文本是否包含提供的文本值。
Term查询：精确匹配一个字段的值。
Range查询：查找字段值在指定范围内的文档。
过滤（Filter）
：用于包含或排除文档，不计算相关性得分。
Term过滤：精确匹配过滤。
Range过滤：范围过滤，如日期或数值范围。
Bool过滤：组合多个过滤条件。
排序（Sorting）
定义了文档根据某个或某些字段值的排序方式。
聚合（Aggregations）
对搜索结果进行分组和统计，如求和、平均、最大值、最小值等。
基本查询示例
以下是一个简单的Match查询示例：

{
  "query": {
    "match": {
      "field_name": "search_value"
    }
  }
}

基本布尔查询示例
以下是一个结合了多个条件的bool查询示例，它使用must、must_not、should和filter子句来定义逻辑关系：

{
  "query": {
    "bool": {
      "must": [
        { "match": { "field1": "value1" } }
      ],
      "must_not": [
        { "match": { "field2": "value2" } }
      ],
      "should": [
        { "match": { "field3": "value3" } }
      ],
      "filter": [
        { "term": { "field4": "value4" } }
      ]
    }
  }
}

在这个例子中：

must 子句中的条件必须匹配，类似于逻辑AND。
must_not 子句中的条件必须不匹配，类似于逻辑NOT。
should 子句中的条件至少匹配一个，类似于逻辑OR（在没有must的情况下）。
filter 子句用于过滤结果，但不影响得分。
聚合查询示例
以下是一个简单的聚合查询示例：

{
  "size": 0,
  "aggs": {
    "popular_tags": {
      "terms": {
        "field": "tags.keyword"
      },
      "aggs": {
        "max_likes": {
          "max": {
            "field": "likes"
          }
        }
      }
    }
  }
}

在这个例子中，我们首先对"tags"字段进行分组，并在每个分组内寻找最大的"likes"值。
Elasticsearch的DSL非常灵活和强大，可以构建从简单到非常复杂的查询。在实践中，构建这些查询通常需要对你的数据模式和特定的用例有深入的了解。由于JSON结构的层次性和组合性，DSL能够表达非常复杂的逻辑。

es中数据流和索引有什么区别

在Elasticsearch中，数据流（Data Streams）和索引（Indices）是两种不同的数据结构，它们各自有不同的用途和优化场景。

索引（Indices）:

一个索引是Elasticsearch中的基础数据结构，它是具有相似特征的文档集合。每个文档都是一个包含了数据的JSON对象，这些文档可以被搜索和检索。一个索引包含了倒排索引、存储字段、映射（定义字段名称和数据类型）等组件，允许用户快速地执行全文搜索、精确值搜索、过滤等操作。

索引是多功能的，适合各种用途，但在管理大量数据时，尤其是时序数据（如日志或指标），可能会有一些挑战，比如数据生命周期管理、性能优化等。
数据流（Data Streams）:

数据流是Elasticsearch 7.9及以上版本中引入的一个特性，它专门为追加只读的时序数据而设计。数据流背后实际上是一系列索引，这些索引按时间顺序组织，并以一种隐藏的方式对用户提供数据操作的接口。

当你向数据流写入数据时，Elasticsearch会自动将数据写入到正确的后备索引中。每个后备索引都代表数据流中的一个数据段，称为一个索引生命周期的“生成”（generation）。随着时间的推移，数据会被追加到新的后备索引中，旧的后备索引最终会被冻结（只读）并且根据生命周期策略可能会被删除。

数据流的主要优势如下：