ES的安装部署以及基本操作
一、背景
像百度、Google的网页搜索,能根据我们输入的关键字模糊匹配返回相关搜索信息,并高亮显示我们的搜索关键字。这种大量的非结构化文本检索,关系型数据库并不能很好的支持(1、非结构化数据关系型数据库支持并不好 2、即便依据常用于搜索的和关键字建立索引,模糊查询的效率也是低下的因为压根走不到索引 3、支持搜索的灵活性很重要)。
全文搜索引擎,是对文章中的每个词进行扫描,并对每个词建立索引,存下来这些词在文章中所在的页码和次数,检索的时候根据建立的索引进行查找,并将查询结果进行反馈。
二、什么是ES
ElasticSearch是一个分布式、Restful风格的搜索和数据分析引擎,能够解决不断涌现的各种用例。ES 是一个开源的高扩展的分布式全文搜索引擎,它可以近乎实时的存储、检索数据;本身扩展性很好,可以扩展到上百台服务器,处理 PB 级别的数据。
三、下载安装
下载链接
Past Releases of Elastic Stack Software | Elastic
目前最新版本是 Elasticsearch 8.17.1 2025年1月22日。但不建议直接使用最新,使用一个靠近最新版本的的就可(这样稳定性会更好一些)
windows版本的解压即安装结束
然后添加环境变量 ES_HOME 值为ES的根目录。在path中添加 %ES_HOME%\bin
3.1 主要目录
目录 | 含义 |
---|---|
bin | 用于启动和管理Elasticsearch的脚本和工具 |
config | Elasticsearch的配置文件 |
jdk | 内置 JDK 目录,如果本机配置了java_home使用的是本机的jdk |
lib | 依赖库目录 |
logs | 日志文件目录 |
modules | 核心模块目录 |
plugins | 插件目录 |
9300端口为组件通讯窗口,9200为浏览器访问的http协议RESTful端口
localhpost:9200
打开网页访问上述url,能看到正常的回显信息
3.2 什么是RESTful
REST 指的是一组架构约束条件和原则。满足这些约束条件和原则的应用程序或设计就
是 RESTful。Web 应用程序最重要的 REST 原则是,客户端和服务器之间的交互在请求之
间是无状态的。从客户端到服务器的每个请求都必须包含理解请求所必需的信息。如果服务
器在请求之间的任何时间点重启,客户端不会得到通知。此外,无状态请求可以由任何可用
服务器回答,这十分适合云计算之类的环境。客户端可以缓存数据以改进性能。
(太具体的规则真没看明白,后续补充吧)
四 HTTP操作
4.1 索引
索引相当于关系型数据库中的schema。
(1)创建索引
用apipost向ES发起put请求
http://127.0.0.1:9200/test
返回结果如下
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "test"
}
索引test建立成功。重复建立索引则会失败并返回如下结果
{
"error": {
"root_cause": [
{
"type": "resource_already_exists_exception",
"reason": "index [test/M6Qd4rchT56iUlDJZWHWmA] already exists",
"index_uuid": "M6Qd4rchT56iUlDJZWHWmA",
"index": "test"
}
],
"type": "resource_already_exists_exception",
"reason": "index [test/M6Qd4rchT56iUlDJZWHWmA] already exists",
"index_uuid": "M6Qd4rchT56iUlDJZWHWmA",
"index": "test"
},
"status": 400
}
索引test已经存在
(2)查看已有索引
发起get请求
http://127.0.0.1:9200/_cat/indices?v
响应结果如下
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open test M6Qd4rchT56iUlDJZWHWmA 1 1 0 0 208b 208b
表头 | 说明 | 案例 |
---|---|---|
health | 当前服务器健康状态 green(集群完整) yellow(单点正常、集群不完整) red(单点不正常) | yellow |
status | 索引打开、关闭状态 | open |
index | 索引名 | test |
uuid | 索引统一编号 | M6Qd4rchT56iUlDJZWHWmA |
pri | 主分片数量 | 1 |
rep | 副本数量 | 1 |
docs.count | 可用文档数量 | 0 |
docs.deleted | 文档删除状态(逻辑删除) | 0 |
store.size | 主分片和副分片整体占空间大小 | 208b |
pri.store.size | 主分片占空间大小 | 208b |
(3)查看单个索引
发起get请求
http://127.0.0.1:9200/test
响应结果如下
{
"test": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"creation_date": "1738905447076",
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "M6Qd4rchT56iUlDJZWHWmA",
"version": {
"created": "7080099"
},
"provided_name": "test"
}
}
}
}
(4)删除索引
发起DELETE请求
http://127.0.0.1:9200/test
响应结果如下
{
"acknowledged": true
}
4.2 文档
(1)创建文档
文档可以类比为关系型数据库中的 table
发起POST请求
http://127.0.0.1:9200/test/_doc
在body中选择 raw,body类型为json,body内容如下
{
"title":"天选之子",
"category":"老司机",
"images":"",
"price":100
}
响应结果如下
{
"_index": "test",
"_type": "_doc",
"_id": "0usM35QBJ6wHPuS349Jo",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
也可以执行文档id。比如指定文档的id为 123
http://127.0.0.1:9200/test/_doc/123
(2)查看文档
可以根据文档的id查看指定的文档
发起GET请求
http://127.0.0.1:9200/test/_doc/123
响应信息如下
{
"_index": "test",
"_type": "_doc",
"_id": "123",
"_version": 1,
"_seq_no": 1,
"_primary_term": 1,
"found": true,
"_source": {
"title": "天选之子",
"category": "老司机",
"images": "",
"price": 100
}
}
(3)修改文档
发起POST请求,修改id为 123 的文档
http://127.0.0.1:9200/test/_doc/123
body内容为
{
"title":"天选之子123",
"category":"老司机",
"images":"",
"price":100
}
响应信息如下
{
"_index": "test",
"_type": "_doc",
"_id": "123",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 2,
"_primary_term": 1
}
(4)修改字段
发起POST请求,修改id为 123 的文档
http://127.0.0.1:9200/test/_update/123
body内容为
{
"doc":{
"price":111
}
}
相应内容如下
{
"_index": "test",
"_type": "_doc",
"_id": "123",
"_version": 3,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 3,
"_primary_term": 1
}
(5)删除文档
发起DELETE请求,删除id为 123 的文档
http://127.0.0.1:9200/test/_doc/123
响应信息如下
{
"_index": "test",
"_type": "_doc",
"_id": "123",
"_version": 4,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 4,
"_primary_term": 1
}
也可以按照条件进行删除(不建议,保险起见还是按照id删除比较好)
发起 POST 请求,删除price字段为 111 的文档
http://127.0.0.1:9200/test/_delete_by_query
请求体内容为
{
"query":{
"match":{
"price":111
}
}
}
响应结果如下
{
"took": 1033,
"timed_out": false,
"total": 1,
"deleted": 1,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": []
}
4.3 映射
映射笔者理解其实就是文档的描述。文档类似于二维数据库中的二维表。而映射类似于二维表的元数据。
(1)创建映射
发起PUT请求
http://127.0.0.1:9200/test/_mapping
body
{
"properties": {
"name": {
"type": "text",
"index": true,
"store": false
},
"sex": {
"type": "text",
"index": false,
"store": false
},
"age": {
"type": "long",
"index": false,
"store": false
}
}
}
type类型
类型 | 描述 | |
---|---|---|
String | text:可分词 keyword:不可分词,数据会作为完整字段进行匹配 | |
Numerical | 基本数据类型:long、integer、short、byte、double、float、half_float 浮点数的高精度类型:scaled_float | |
Date | 日期类型 | |
Array | 数组类型 | |
Object | 对象 |
index是否索引,默认为true。true可以用来搜索。
store是否独立存储。默认为false,获取独立存储的字段会更快一些,但存储会占用更多的空间。
analyzer分词器
响应结果如下
{
"acknowledged": true
}
(2)查看映射
发送GET请求
http://127.0.0.1:9200/test/_mapping
响应结果如下
{
"test": {
"mappings": {
"properties": {
"age": {
"type": "long",
"index": false
},
"category": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"images": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text"
},
"price": {
"type": "long"
},
"sex": {
"type": "text",
"index": false
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
(3)索引和映射关联
建立索引并关联映射
发起PUT请求
http://127.0.0.1:9200/test1
body
{
"settings": {},
"mappings": {
"properties": {
"name": {
"type": "text",
"index": true
},
"sex": {
"type": "text",
"index": false
},
"age": {
"type": "long",
"index": false
}
}
}
}
响应结果如下
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "test1"
}
4.4 查询操作
(1)查询所有文档
查询索引 test1 下的所有文档
发起GET请求
http://127.0.0.1:9200/test1/_search
body
{
"query": {
"match_all": {}
}
}
响应结果如下
{
"took": 181,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "test1",
"_type": "_doc",
"_id": "1001",
"_score": 1,
"_source": {
"name": "zhangsan",
"nickname": "zhangsan",
"sex": "男",
"age": 30
}
},
{
"_index": "test1",
"_type": "_doc",
"_id": "1002",
"_score": 1,
"_source": {
"name": "lisi",
"nickname": "lisi",
"sex": "男",
"age": 20
}
},
{
"_index": "test1",
"_type": "_doc",
"_id": "1003",
"_score": 1,
"_source": {
"name": "wangwu",
"nickname": "wangwu",
"sex": "女",
"age": 40
}
}
]
}
}
(2)按照字段进行查询
查询name为张三的文档(此处因为name是映射索引,其他字段不可用于此查询)
发起GET请求
http://127.0.0.1:9200/test1/_search
body
{
"query": {
"match": {"name":"zhangsan" }
}
}
响应结果如下
{
"took": 360,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.9808291,
"hits": [
{
"_index": "test1",
"_type": "_doc",
"_id": "1001",
"_score": 0.9808291,
"_source": {
"name": "zhangsan",
"nickname": "zhangsan",
"sex": "男",
"age": 30
}
}
]
}
}
(3)按照多个字段进行查询
查询name和nickname都是zhangsan的文档
发起GET请求
http://127.0.0.1:9200/test1/_search
body
{
"query": {
"multi_match": {
"query": "zhangsan",
"fields": [
"name",
"nickname"
]
}
}
}
返回结果如下
{
"took": 116,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.9808291,
"hits": [
{
"_index": "test1",
"_type": "_doc",
"_id": "1001",
"_score": 0.9808291,
"_source": {
"name": "zhangsan",
"nickname": "zhangsan",
"sex": "男",
"age": 30
}
}
]
}
}
(4)字段精确查询
(此处因为name是映射索引,其他字段不可用于此查询)
发起GET请求
http://127.0.0.1:9200/test1/_search
body
{
"query": {
"term": {
"name": {
"value": "zhangsan"
}
}
}
}
(5)多字段精确查询
发起GET请求
http://127.0.0.1:9200/test1/_search
body
{
"query": {
"terms": {
"name": [
"zhangsan",
"lisi"
]
}
}
}
(6)查询部分字段
一般的查询回将保存在 _source 中的字段全部返回,如果返回个别则需要进行如下操作
发起GET请求
http://127.0.0.1:9200/test1/_search
body
{
"_source": [
"name",
"nickname"
],
"query": {
"terms": {
"name": [
"zhangsan"
]
}
}
}
也可以通过includes来指定需要显示的字段,或者用excludes指定不需要显示的字段
body
{
"_source": {
"includes": [
"name",
"nickname"
]
},
"query": {
"terms": {
"name": [
"zhangsan"
]
}
}
}
{
"_source": {
"excludes": [
"age",
"sex"
]
},
"query": {
"terms": {
"name": [
"zhangsan"
]
}
}
}
(7)组合查询
通过must(必须 )、must_not(必须不)、should(应该)的方式进行组合
should选项不会影响响应结果,但是满足should条件的文档评分会更高,可以通过 minimum_should_match参数指定至少需要满足的 should`条件的数量。如果未指定,则默认为 0
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "zhangsan"
}
}
],
"must_not": [
{
"match": {
"age": "40"
}
}
],
"should": [
{
"match": {
"sex": "男"
}
}
]
}
}
}
(8)范围查询
操作 | 说明 |
---|---|
gt | > |
lt | < |
gte | >= |
lte | <= |
body
{
"query": {
"range": {
"age": {
"gte": 30,
"lte": 35
}
}
}
}
此处笔者查询出错,因为我的age在映射中不是index
(9)模糊查询
当一个词变更为另一个词需要的变更次数叫做编辑距离
fuzzy 查询可以指定编辑距离内模糊匹配满足条件的结果
查询name包含 zhangsan 字符串的文档(所有编辑距离)
{
"query": {
"fuzzy": {
"name": {
"value": "zhangsan"
}
}
}
}
响应结果
{
"took": 365,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.2039728,
"hits": [
{
"_index": "test1",
"_type": "_doc",
"_id": "1001",
"_score": 1.2039728,
"_source": {
"name": "zhangsan",
"nickname": "zhangsan",
"sex": "男",
"age": 30
}
},
{
"_index": "test1",
"_type": "_doc",
"_id": "1004",
"_score": 1.0534762,
"_source": {
"name": "zhangsan1",
"nickname": "zhangsan1",
"sex": "女",
"age": 50
}
}
]
}
}
查询name包含 zhangsan 字符串的文档(编辑距离为2)
{
"query": {
"fuzzy": {
"name": {
"value": "zhangsan",
"fuzziness": 1
}
}
}
}
响应结果
{
"took": 365,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.2039728,
"hits": [
{
"_index": "test1",
"_type": "_doc",
"_id": "1001",
"_score": 1.2039728,
"_source": {
"name": "zhangsan",
"nickname": "zhangsan",
"sex": "男",
"age": 30
}
},
{
"_index": "test1",
"_type": "_doc",
"_id": "1004",
"_score": 1.0534762,
"_source": {
"name": "zhangsan1",
"nickname": "zhangsan1",
"sex": "女",
"age": 50
}
}
]
}
}
(10)排序
查询name为 zhangsan, age 降序,评分降序
{
"query": {
"match": {
"name": "zhangsan"
}
},
"sort": [
{
"age": {
"order": "desc"
}
},
{
"_score": {
"order": "desc"
}
}
]
}
响应结果
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "test1",
"_type": "_doc",
"_id": "1005",
"_score": 0.87546873,
"_source": {
"name": "zhangsan",
"nickname": "zhangsan",
"sex": "男",
"age": 40
},
"sort": [
40,
0.87546873
]
},
{
"_index": "test1",
"_type": "_doc",
"_id": "1001",
"_score": 0.87546873,
"_source": {
"name": "zhangsan",
"nickname": "zhangsan",
"sex": "男",
"age": 30
},
"sort": [
30,
0.87546873
]
}
]
}
}
(11)高亮查询
在使用 match 查询的同时,加上一个 highlight 属性,可以设置高亮显示(高亮显示字段需要是索引字段)
标签 | 说明 |
---|---|
pre_tags | 前置标签 |
post_tags | 后置标签 |
fields | 需要高亮的字段 |
title | 这里声明 title 字段需要高亮,后面可以为这个字段设置特有配置,也可以空 |
{
"query": {
"match": {
"name": "zhangsan"
}
},
"highlight": {
"pre_tags": "<font color='red'>",
"post_tags": "</font>",
"fields": {
"name": {}
}
}
}
响应结果如下
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.87546873,
"hits": [
{
"_index": "test1",
"_type": "_doc",
"_id": "1001",
"_score": 0.87546873,
"_source": {
"name": "zhangsan",
"nickname": "zhangsan",
"sex": "男",
"age": 30
},
"highlight": {
"name": [
"<font color='red'>zhangsan</font>"
]
}
},
{
"_index": "test1",
"_type": "_doc",
"_id": "1005",
"_score": 0.87546873,
"_source": {
"name": "zhangsan",
"nickname": "zhangsan",
"sex": "男",
"age": 40
},
"highlight": {
"name": [
"<font color='red'>zhangsan</font>"
]
}
}
]
}
}
(12)分页查询
当返回的文档数据量较多时,可以使用分页查询
from:当前页的起始索引,默认从 0 开始。
size:每页显示多少条
当 from 值较大时,性能会显著下降,因为 Elasticsearch 需要处理大量数据来跳过前面的记录。
Elasticsearch 默认限制了 from + size的最大值为10,000(index.max_result_window),超过此限制会导致查询失败
(分页查询一般都会结合排序使用)
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
}
],
"from": 1,
"size": 2
}
响应结果如下
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "test1",
"_type": "_doc",
"_id": "1003",
"_score": null,
"_source": {
"name": "wangwu",
"nickname": "wangwu",
"sex": "女",
"age": 40
},
"sort": [
40
]
},
{
"_index": "test1",
"_type": "_doc",
"_id": "1005",
"_score": null,
"_source": {
"name": "zhangsan",
"nickname": "zhangsan",
"sex": "男",
"age": 40
},
"sort": [
40
]
}
]
}
}
(13)聚合查询
ES也支持最大最小平均等的聚合,关键字分别为(max、min、avg、sum)
求平均年龄(这里的avg_age也可以是其他,仅代表一个结果返回值接收属性的叫法)
{
"aggs": {
"avg_age": {
"avg": {
"field": "age"
}
}
},
"size": 0
}
响应结果
{
"took": 20,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"avg_age": {
"value": 36
}
}
}
去重计数
{
"aggs": {
"distinct_age": {
"cardinality": {
"field": "age"
}
}
},
"size": 0
}
响应结果
{
"took": 112,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"distinct_age": {
"value": 4
}
}
}
stats 聚合,对某个字段一次性返回 count,max,min,avg 和 sum 五个指标
{
"aggs": {
"stats_age": {
"stats": {
"field": "age"
}
}
},
"size": 0
}
响应结果
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"stats_age": {
"count": 5,
"min": 20,
"max": 50,
"avg": 36,
"sum": 180
}
}
}
(14)桶聚合查询(分组查询)
按年龄分组统计
{
"aggs": {
"age_groupby": {
"terms": {
"field": "age"
}
}
},
"size": 0
}
响应结果
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"age_groupby": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 40,
"doc_count": 2
},
{
"key": 20,
"doc_count": 1
},
{
"key": 30,
"doc_count": 1
},
{
"key": 50,
"doc_count": 1
}
]
}
}
}