ES数据库学

TONG0S

已于 2023-07-24 18:26:37 修改

阅读量7.5k

点赞数 3

文章标签： elasticsearch 数据库大数据

于 2023-07-24 18:22:58 首次发布

本文链接：https://blog.csdn.net/weixin_44035340/article/details/131902384

版权

一、es数据库简介

1、简介

es数据库的英文全称为ElsticSearch，是位于Elastic Stack核心的分布式搜索和分析引擎。是一个由Apache开源的高扩展、全文检索和分析引擎（NoSQL数据库功能）的系统，它可以准实地快速存储、搜索、分析海量的数据。Elasticsearch提供丰富且灵活的查询语言叫做DSL查询(Query DSL),它允许构建更加复杂、强大的查询。 DSL(Domain Specific Language特定领域语言)以JSON请求体的形式出现。

全文检索：全文检索是指计算机索引程序通过扫描文章中的每一个词，对每一个词建立一个索引。指明该词在文章中出现的次数和位置，当用户查询时，检索程序就根据事先建立的索引进行查找，并将查找到的结果反馈给用户的检索方式。这个过程如同通过字典中的检索字表查询字的过程，全文搜索是搜索引擎数据库中的数据。

ELK是Elasticsearch,logash,kibana的结合。
es的核心概念主要是：index(索引)、Document(文档)、Clusters(集群)、Node(节点)与实例

2、es数据库搭建

详见kibana搭建

3、特点

① 基于java/lucene构建，支持实时搜索

② 分布式部署，可横向集群扩展

③ 支持百万级数据

④ 支持多条件查询，如聚合查询

⑤ 高可用，数据开源进行切片备份

⑥ 支持Restful风格的api调用

4、ES数据库和关系型数据库的比较

关系数据库 ⇒ 数据库 ⇒ 表 ⇒ 行 ⇒ 列(Columns)
Elasticsearch ⇒ 索引(Index) ⇒ 类型(type) ⇒ 文档(Docments) ⇒ 字段(Fields)

ES	关系型数据库
索引（index）	数据库（DataBase）
类型（Type）	表（Table）
映射（Mapping）	表结构（Schema）
文档（Document）	行（ROw）
字段（Field）	列（Column）
反向索引	正向索引
DSL查询	SQL查询

Segment：段，Lucence中存储时按段来进行存储，每个段相当于一个数据集。

Commit Point：提交点，记录着Lucence中所有段的集合。

Lucence Index：Lucene索引，由一堆Segment段集合和commit point组成。

Lucene：Apache开源的全文检索开发工具包，就是一个java的jar包。

	redis	mysql	elasticsearch	hbase	hadoop/hive
容量/容量扩展	低	中	较大	海量	海量
查询时效性	极高	中等	较高	中等	地
查询灵活性	较差（k-v模式）	非常好，支持sql	较好，关联查询较差，但可以全文检索，DSL语言可以处理过滤、匹配、排序、聚合等各种操作	较差，主要靠rowkey，scan的性能不行，或建立二集索引	非常好，支持sql
写入速度	极快	中等	较快	较快	慢
一致性、事务	弱	强	弱	弱	弱

① 关系型数据库中的数据库（DataBase），等价于ES中的索引（Index）
② 一个数据库下面有N张表（Table），等价于1个索引Index下面有N多类型（Type）
③ 一个数据库表（Table）下的数据由多行（ROW）多列（column，属性）组成，等价于1个Type由多个文档（Document）和多Field组成。
④ 在一个关系型数据库里面，schema定义了表、每个表的字段，还有表和字段之间的关系。 与之对应的，在ES中：Mapping定义索引下的Type的字段处理规则，即索引如何建立、索引类型、是否保存原始索引JSON文档、是否压缩原始JSON文档、是否需要分词处理、如何进行分词处理等。
⑤ 在数据库中的增insert、删delete、改update、查search操作等价于ES中的增PUT/POST、删Delete、改_update、查GET。

5、分词器

ES的默认分词设置是standard，单字拆分

了ik分词器，有两种

ik_smart会将“清华大学”整个分为一个词

ik_max_word 会将“清华大学”分为“清华大学”，“清华”和“大学”

6、文档

6.1、简介

es是面向文档的，文档是es中可搜索的最小单位，es的文档由一个或多个字段组成，类似于关系型数据库中的一行记录

6.2、元数据

_index：文档所在索引名称
_source：原始json数据
_type：文档所属类型，es7.0以后只有为 _doc
_version：文档版本,如果对文档进行修改，则该字段会增加
_score：相关性打分
id：文档唯一id

7、字段类型

核心数据类型
- 字符串： text,keyword(不分词，智能使用完整词搜索）
- 数值型：long,integer,short,byte,double,float,half_float,scaled_float
- 布尔类型：boolean
- 二进制： binary(二进制经过base64编码之后的字符串，不可搜索)
- 范围类型：integer_range,float_range,long_range,double_range,date_rage
- 日期: date
复杂数据类型
- 数组，Array: es并无专门的数组类型，可以在插入文档的时候直接声明一个数组插入即可,([1,2]-整形数组,[”1“,”2“]-字符串数组,[{"name":"nick"},{"name": "elaine"}])
- 对象，Object:
专用数据类型，如IP;

命令

环境：

kibana—>控制台

1、基本命令

method	url地址	描述
PUT	localhost:9200/索引名称/类型名称/文档id	创建文档（指定文档id）
POST	localhost:9200/索引名称/类型名称	创建文档（随机文档id）
POST	localhost:9200/索引名称/类型名称/文档id/_update	修改文档
DELETE	localhost:9200/索引名称/类型名称/文档id	删除文档
GET	localhost:9200/索引名称/类型名称/文档id	查询文档通过文档id
POST	localhost:9200/索引名称/类型名称/_search	查询所有数据

字符串类型
text 、 keyword
数值类型
long, integer, short, byte, double, float, half_float, scaled_float
日期类型
date
te布尔值类型
boolean
二进制类型
binary
范围类型
integer_range , float_range, long_range, double_range, date_range

1.1、查询所有_cat命令

GET /_cat

1.2、查询es集群服务健康状态

GET /_cat/health?v

结果：

epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1684116543 02:09:03  my-application yellow       1         1     15  15    0    0        2             0                  -                 88.2%

cluster ，集群名称
status，集群状态 green代表健康；yellow代表分配了所有主分片，但至少缺少一个副本，此时集群数据仍旧完整；red代表部分主分片不可用，可能已经丢失数据。
node.total，代表在线的节点总数量
node.data，代表在线的数据节点的数量
shards， active_shards 存活的分片数量
pri，active_primary_shards 存活的主分片数量 正常情况下 shards的数量是pri的两倍。
relo， relocating_shards 迁移中的分片数量，正常情况为 0
init， initializing_shards 初始化中的分片数量 正常情况为 0
unassign， unassigned_shards 未分配的分片 正常情况为 0
pending_tasks，准备中的任务，任务指迁移分片等 正常情况为 0
max_task_wait_time，任务最长等待时间
active_shards_percent，正常分片百分比 正常情况为 100%

1.3、查看es节点信息

GET /_cat/nodes?v

1.4、查看es指定节点信息-node-1

GET /_nodes/nodeName?pretty=true
示例：GET /_nodes/node-1?pretty=true

2、索引命令

2.1、查看es中所有索引

GET /_cat/indices?v

结果：

health status index                   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   index_1     rerwerwrewrwrwe  20   1        208            0      1.1mb        609.8kb
green  open   index_2     eewfdsffhwehfoeif3  30   1          4            1    222.4kb        111.2kb

health:  green代表健康；yellow代表分配了所有主分片，但至少缺少一个副本，此时集群数据仍旧完整；red代表部分主分片不可用，可能已经丢失数据。
pri：primary缩写，主分片数量
rep：副分片数量
docs.count： Lucene 级别的文档数量
docs.deleted： 删除的文档
store.size：全部分片大小（包含副本）
pri.store.size：主分片大小

2.2、新建索引

PUT /test

成功返回

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test"
}

demo1:

#自定义类型 type
PUT /test
{
  "mappings": {
    "properties": {
      "info": {
        "type": "text",
        "analyzer": "ik_smart"  #analyzer分词器选择
      },
      "email": {
        "type": "keyword", #字段类型
        "index": false
      },
      "name": {
        "properties": {
          "firstName": {
            "type": "keyword"
          },
          "lastName": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

demo2

#-----------用户user-----------------
#不自定义类型
PUT /user

#不自定义类型 会默认配置 如字段类型 分片 以及id
PUT /user/_doc/1
{
  "name":"张三",
  "age":10,
  "sex":"男",
  "address":"江苏苏州"
}

GET /user/_search

#批量创建文档数据
POST _bulk
{"create":{"_index":"user", "_type":"_doc", "_id":2}}
{"id":2,"name":"李四","age":"20","sex":"男","address":"苏州园区"}
{"create":{"_index":"user", "_type":"_doc", "_id":3}}
{"id":3,"name":"王芳","age":"30","sex":"女","address":"园区华为"}
{"create":{"_index":"user", "_type":"_doc", "_id":4}}
{"id":4,"name":"赵六","age":"40","sex":"女","address":"华为汽车"}

#批量获取文档数据
docs : 文档数组参数
_index : 指定index
_type : 指定type
_id : 指定id
_source : 指定要查询的字段
--------------------------------------------
GET _mget
{
  "docs": [
    {
      "_index": "user",
      "_type": "_doc",
      "_id": 1
    },
    {
      "_index": "user",
      "_type": "_doc",
      "_id": 2
    }
  ]
}

GET /user/_mget
{
  "docs": [
    {
      "_type": "_doc",
      "_id": 1
    },
    {
      "_type": "_doc",
      "_id": 2
    }
  ]
}

GET /user/_doc/_mget
{
  "docs": [
    {
      "_id": 1
    },
    {
      "_id": 2
    }
  ]
}

GET /user/_mget
{
  "docs": [
    {
      "_id": 1
    },
    {
      "_id": 2
    },
    {
      "_id": 3
    },
    {
      "_id": 4
    }
  ]
}

#批量修改文档数据，不存在则创建，存在则替换
POST _bulk
{"index":{"_index":"user", "_type":"_doc", "_id":2}}
{"id":2,"name":"李四","age":"20","sex":"男","address":"苏州园区"}
{"index":{"_index":"user", "_type":"_doc", "_id":3}}
{"id":3,"name":"王芳","age":"30","sex":"女","address":"园区华为"}
{"create":{"_index":"user", "_type":"_doc", "_id":4}}
{"id":4,"name":"赵六","age":"40","sex":"女","address":"华为汽车"}

#批量修改update
POST _bulk
{"update":{"_index":"user","_type":"_doc","_id":2}}
{"doc":{"address":"苏州园区XX"}}
{"update":{"_index":"user","_type":"_doc","_id":3}}
{"doc":{"address":"园区华为XX"}}

#批量删除
POST _bulk
{"delete":{"_index":"user", "_type":"_doc", "_id":3}}
{"delete":{"_index":"user", "_type":"_doc", "_id":4}}

2.3、删除索引,“acknowledged”:true表示删除成功

DELETE /test

2.4、查看索引的统计信息

GET /_stats?pretty

2.5、修改索引

倒排索引结构，一旦数据结构改变（比如改变了分词器），就需要重新创建倒排索引，这简直是灾难。因此索引库一旦创建，无法修改mapping。

然无法修改mapping中已有的字段，但是却允许添加新的字段到mapping中，因为不会对倒排索引产生影响。

方法1：覆盖PUT

PUT first/_doc/1
{
  "name":"林",
  "age":18,
  "from":"gu",
  "desc":"念能力,学生，暗属性",
  "tags":["能力者","男","暗"]
}

方法2：更新 POST

使用 POST 命令，在 id 后面跟 _update ，要修改的内容放到 doc 文档(属性)中即可。

POST first/_doc/3/_update 
{
  "doc": {
    "name":"愚者",
    "desc":"塔罗",
    "tags":["魔法","超能力","塔罗"]
  }
}

2.6、插入数据

PUT first/_doc/1
{
  "name":"林",
  "age":18,
  "from":"gu",
  "desc":"念能力",
  "tags":["能力者","学院","男"]
}

PUT first/_doc/2
{
  
  "name":"宝儿姐",
  "age":22,
  "from":"gu", 
  "desc":"道法",
  "tags":["道", "驱魔","女"]
}

2.7、查看索引

2.7.1、查看指定索引

GET /first?pretty  #查看结构

GET /first/_search #查看表内容 select * from first
or
GET /first/_search
{
  "query": {
    "match_all": {}
  }
}

{
  "took" : 787,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "春生",
          "age" : 18,
          "from" : "gu",
          "desc" : "念能力,学生，暗属性",
          "tags" : [
            "能力者",
            "男",
            "暗"
          ]
        }
      },
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "愚者",
          "age" : 22,
          "from" : "gu",
          "desc" : "塔罗",
          "tags" : [
            "魔法",
            "超能力",
            "塔罗"
          ]
        }
      },
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "宝儿姐",
          "age" : 18,
          "from" : "sheng",
          "desc" : "道法",
          "tags" : [
            "长生",
            "超能力",
            "道法"
          ]
        }
      }
    ]
  }
}

2.7.2、简单查询

GET first/_search?q=from:gu
#使用下面的查询，结果一样 查询条件添加到 match 
GET /first/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  }
}

结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.4700036,
    "hits" : [
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "3",
        "_score" : 0.4700036,
        "_source" : {
          "name" : "愚者",
          "age" : 22,
          "from" : "gu",
          "desc" : "塔罗",
          "tags" : [
            "魔法",
            "超能力",
            "塔罗"
          ]
        }
      },
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "1",
        "_score" : 0.4700036,
        "_source" : {
          "name" : "春生",
          "age" : 18,
          "from" : "gu",
          "desc" : "念能力,学生，暗属性",
          "tags" : [
            "能力者",
            "男",
            "暗"
          ]
        }
      }
    ]
  }
}

2.7.3、控制返回结果

 _source 来控制仅返回

GET /first/_search
{
  "query": {
    "match_all": {}
  },
      "_source": ["tags","name"]
}

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "宝儿姐",
          "tags" : [
            "长生",
            "超能力",
            "道法"
          ]
        }
      },
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "愚者",
          "tags" : [
            "魔法",
            "超能力",
            "塔罗"
          ]
        }
      },
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "春生",
          "tags" : [
            "能力者",
            "男",
            "暗"
          ]
        }
      }
    ]
  }
}

2.7.4、排序 sort

desc[倒序] or asc[正序]

GET /first/_search
{
  "query": {
    "match_all": {}
  },
  "_source": ["age","name"],
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ]
}

结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "宝儿姐",
          "age" : 18
        },
        "sort" : [
          18
        ]
      },
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "春生",
          "age" : 18
        },
        "sort" : [
          18
        ]
      },
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "愚者",
          "age" : 22
        },
        "sort" : [
          22
        ]
      }
    ]
  }
}

2.7.5、分页查询 from size

GET /first/_search
{
  "query": {
    "match_all": {}
  },
  "_source": ["age","name"],
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ],
    "from":0, #第n条开始
    "size":1 #返回多少条数据
}

2.7.6、布尔查询

MUST

“select age,name where first where from=gu and age=18”

GET /first/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "from": "gu"
        }
        },
        {"match": {
          "age": "18"}
        }
      ]
    }
  },
  "_source": ["age","name"],
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ]
}

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "春生",
          "age" : 18
        },
        "sort" : [
          18
        ]
      }
    ]
  }
}

shoud

“select age,name where first where from=gu or age=18”

GET /first/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {
          "from": "gu"
        }
        },
        {"match": {
          "age": "18"}
        }
      ]
    }
  },
  "_source": ["age","name","from"],
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ]
}

most_not

“select age,name where first where from!=gu and age!=18”

GET /first/_search
{
  "query": {
    "bool": {
      "must_not": [
        {"match": {
          "from": "gu"
        }
        },
        {"match": {
          "age": "22"}
        }
      ]
    }
  },
  "_source": ["age","name","from"],
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ]
}

filter 过滤查询

过滤条件的范围用 range 表示

gt 表示大于
gte 表示大于等于
lt 表示小于
lte 表示小于等于

“select age,name where first where from=gu and age>=18 and age<=20”

GET /first/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "from": "gu"
        }
        }
      ],
      "filter": [
        {"range": {
          "age": {
            "gte": 18,
            "lte": 20
          }
        }}
      ]
    }
  },
  "_source": ["age","name","from"],
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ]
}

2.7.7、短语检索【可用数组中检索关键字】

模糊查找

GET /first/_search
{
  "query": {
    "match": {
      "tags": "暗 魔"  #空格分开
    }
    } 
}

结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0732633,
    "hits" : [
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "1",
        "_score" : 1.0732633,
        "_source" : {
          "name" : "春生",
          "age" : 18,
          "from" : "gu",
          "desc" : "念能力,学生，暗属性",
          "tags" : [
            "能力者",
            "男",
            "暗"
          ]
        }
      },
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "3",
        "_score" : 0.9403362,
        "_source" : {
          "name" : "愚者",
          "age" : 22,
          "from" : "gu",
          "desc" : "塔罗",
          "tags" : [
            "魔法",
            "超能力",
            "塔罗"
          ]
        }
      }
    ]
  }
}

精准查找

GET /first/_search
{
  "query": {
    "match_phrase": {
      "tags": "魔法"
    }
    } 
}

2.7.8 、term查询

term查询是直接通过倒排索引指定的词条，也就是精确查找。

term和match的区别:

match是经过分析(analyer)的，也就是说，文档是先被分析器处理了，根据不同的分析器，分析出的结果也会不同，在会根据分词结果进行匹配。
term是不经过分词的，直接去倒排索引查找精确的值。

2.7.8.1、字段是否存在:exist

GET /first/_search
{
  "query": {
    "exists": {
      "field": "from_"
    }
  }
  
}

2.7.8.2、id查询:ids

ids 即对id查找

GET /first/_search
{
  "query": {
    "ids": {
      "values": [3, 1]
    }
  }
}

2.7.8.3、前缀:prefix

通过前缀查找某个字段

GET /first/_search
{
  "query": {
    "prefix": {
      "desc": {
        "value": "道"
      }
    }
  }
}

select * from first where match(desc,"^道")

2.7.8.4、分词匹配:term

前文最常见的根据分词查询

GET /first/_search
{
  "query": {
    "terms": {
      "tags": "长生"
    }
  }
}

select * from first where "长生" in tags

2.7.8.5、多个分词匹配:terms

按照读个分词term匹配，它们是or的关系

GET /test-dsl-term-level/_search
{
  "query": {
    "terms": {
      "programming_languages": ["php","c++"]
    }
  }
}

2.7.8.6、通配符:wildcard

GET /first/_search
{
  "query": {
    "wildcard": {
      "name": {
        "value": "儿*",
        "boost": 1.0,
        "rewrite": "constant_score"
      }
    }
  }
}

SELECT  * from accesslog a WHERE match(host,'儿');

模糊匹配:fuzzy

官方文档对模糊匹配：编辑距离是将一个术语转换为另一个术语所需的一个字符更改的次数。这些更改可以包括：

更改字符（box→ fox）
删除字符（black→ lack）
插入字符（sic→ sick）
转置两个相邻字符（act→ cat）

GET /first/_search
{
  "query": {
   "fuzzy": {
      "name": {
        "value": "shong"
      }
    }
  }
}
#可以匹配sheng

2.8、高亮显示

GET /first/_search
{
  "query": {
    "match_phrase": {
      "tags": "魔法"
    }
    },
    "highlight": {
      "fields": {
        "tags": {}
      }
    }
}

结果

{
  "took" : 108,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.390936,
    "hits" : [
      {
        "_index" : "first",
        "_type" : "chunsheng",
        "_id" : "3",
        "_score" : 1.390936,
        "_source" : {
          "name" : "愚者",
          "age" : 22,
          "from" : "gu",
          "desc" : "塔罗",
          "tags" : [
            "魔法",
            "超能力",
            "塔罗"
          ]
        },
        "highlight" : {
          "tags" : [
            "<em>魔</em><em>法</em>"  #this 
          ]
        }
      }
    ]
  }
}

2.9 深度分页

es有10000条限制，因此要使用分页

es深度分页https://blog.csdn.net/weixin_44799217/article/details/127100272

3.0、正则语法

[正则表达式语法 |弹性搜索指南 8.7] |弹性的 (elastic.co)

#找到所有外网ip @&~(

@运算符与 &运算符组合以创建 “一切除外”逻辑。例如：@``&``~
启用运算符。您可以使用匹配数字范围。为例：<>``<>

启用运算符。您可以使用匹配数字范围。为例：<>``<>

foo<1-100>      # matches 'foo1', 'foo2' ... 'foo99', 'foo100'
foo<01-100>     # matches 'foo01', 'foo02' ... 'foo99', 'foo100'

foo<1-100>      # matches 'foo1', 'foo2' ... 'foo99', 'foo100'
foo<01-100>     # matches 'foo01', 'foo02' ... 'foo99', 'foo100'

GET /_indexs-20230523*/_search
{
    "query": {
        "regexp":{
            "realip": "@&~((192\\.168\\.<0-255>\\.<0-255>)|(10\\..*)|(172\\.<16-31>\\.<0-255>\\.<0-255>))"
        }
    }
}

3,1、聚合

3.1.1、单个聚合

GET /test-agg-cars/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color.keyword"
            }
        }
    }
}

#原文
 {
        "_index" : "test-agg-cars",
        "_type" : "_doc",
        "_id" : "W8W6dYgBfCbtsoUlEOxh",
        "_score" : 1.0,
        "_source" : {
          "price" : 30000,
          "color" : "green",
          "make" : "ford",
          "sold" : "2014-05-18"
        }
      },
#响应
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "popular_colors" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "red",
          "doc_count" : 4
        },
        {
          "key" : "blue",
          "doc_count" : 2
        },
        {
          "key" : "green",
          "doc_count" : 2
        }
      ]
    }
  }
}

select color,count(color) from test-agg-cars group by color

3.1.2、多个聚合

{
 "aggs": {
    "actionflag_info": {
      "terms": {
        "script": {
          "inline": "doc['host'].value +':'+ doc['post'].value",
          "lang": "painless"
        },
        "size": 1000
      }
    }
  }
}
  
  #相当于 select  host+":"+"post" from ttt group by host,post

{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color.keyword"
            }
        },
        "make_by" : { 
            "terms" : { 
              "field" : "make.keyword"
            }
        }
    }
}

select color,count(color) from test-agg-cars group by color
select make,count(make) from test-agg-cars group by make

  "aggregations" : {
    "popular_colors" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "red",
          "doc_count" : 4
        },
        {
          "key" : "blue",
          "doc_count" : 2
        },
        {
          "key" : "green",
          "doc_count" : 2
        }
      ]
    },
    "make_by" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "honda",
          "doc_count" : 3
        },
        {
          "key" : "ford",
          "doc_count" : 2
        },
        {
          "key" : "toyota",
          "doc_count" : 2
        },
        {
          "key" : "bmw",
          "doc_count" : 1
        }
      ]
    }
  }

GET /test-agg-cars/_search
{
   "size" : 0,
   "aggs": {
      "colors": {
         "terms": {
            "field": "color.keyword"
         },
         "aggs": { 
            "avg_price": { 
               "avg": {
                  "field": "price" 
               }
            }
         }
      }
   }
}

select color,count(color),avg(price) from test-agg-cars group by color

3.1.3、聚合过滤

GET /test-agg-cars/_search
{
  "size": 0,
  "aggs": {
    "make_by": {
      "filter": { "term": { "type": "honda" } },
      "aggs": {
        "avg_price": { "avg": { "field": "price" } }
      }
    }
  }
}

select make,count(make),avg(price) from test-agg-cars where make=="handa" group by make

3.1.4、number分组聚合

GET /test-agg-cars/_search
{
  "size": 0,
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 20000 },
          { "from": 20000, "to": 40000 },
          { "from": 40000 }
        ]
      }
    }
  }
}

select count() from test-agg-cars where range<2000,
select count() from test-agg-cars where 4000>range>2000
select count() from test-agg-cars where range>4000

3.1.5、对IP类型聚合：IP Range

GET /ip_addresses/_search
{
  "size": 10,
  "aggs": {
    "ip_ranges": {
      "ip_range": {
        "field": "ip",
        "ranges": [
          { "to": "10.0.0.5" },
          { "from": "10.0.0.5" }
        ]
      }
    }
  }
}

3.1.6、CIDR Mask分组

此外还可以用CIDR Mask分组

GET /ip_addresses/_search
{
  "size": 0,
  "aggs": {
    "ip_ranges": {
      "ip_range": {
        "field": "ip",
        "ranges": [
          { "mask": "10.0.0.0/25" },
          { "mask": "10.0.0.127/25" }
        ]
      }
    }
  }
}

3.1.7、对日期类型聚合：Date Range

专用于日期值的范围聚合

GET /test-agg-cars/_search
{
  "size": 0,
  "aggs": {
    "range": {
      "date_range": {
        "field": "sold",
        "format": "yyyy-MM",
        "ranges": [
          { "from": "2014-01-01" },  
          { "to": "2014-12-31" } 
        ]
      }
    }
  }
}

3.2、Metric聚合

3.2.1、avg 平均值

POST /exams/_search?size=0
{
  "aggs": {
    "avg_grade": { "avg": { "field": "grade" } }
  }
}

二、python es模块

1、插入

1.1、单条插入（消耗较大，不建议使用）

def create_data():
    """ 写入数据 """
    for line in range(100):
        es.index(index='second', doc_type='doc', body={'title': line})

1.2、批量插入

#`helper`，通过`helper.bulk`来批量处理大量的数据。首先我们将所有的数据定义成字典形式
import time
from elasticsearch import Elasticsearch
from elasticsearch import helpers
def batch_data():
    # t=es.search(index='second')
    # print(t)
    """ 批量写入数据 """
    action = [{
        "_index": "second",
        "_type": "doc",
        "_source": {
            "title": i
        }
    } for i in range(100)]
    print(action)
    helpers.bulk(es, action)

链接：

ElasticSearch—查询es集群状态、分片、索引：https://blog.csdn.net/ss810540895/article/details/129279667?spm=1001.2101.3001.6650.1&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1-129279667-blog-126363246.235%5Ev35%5Epc_relevant_increate_t0_download_v2_base&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1-129279667-blog-126363246.235%5Ev35%5Epc_relevant_increate_t0_download_v2_base&utm_relevant_index=2

ES数据库入门：https://blog.csdn.net/m0_52165864/article/details/127047138

DSL查询语法：https://blog.csdn.net/K_zibeng/article/details/126970214

ElasticSearch的DSL高级查询操作：https://www.cnblogs.com/tanghaorong/p/16297788.html

Elasticsearch核心基础概念：文档与索引：https://juejin.cn/post/6844903905436909575

elasticsearch 7版本基础操作：https://www.cnblogs.com/jiangchunsheng/p/11756068.html

数据库详解

https://pdai.tech/md/db/nosql-es/elasticsearch-x-agg-metric.html#cardinality-%E5%9F%BA%E6%95%B0-distinct%E5%8E%BB%E9%87%8D

https://blog.csdn.net/lvdan1/article/details/78340231 spark正则

REGEXP_EXTRACT([字符串],'[\u4e00-\u9fa5]+',0)