1-2 Elasticsearch基本介绍

启航zpyl

已于 2023-05-04 14:42:49 修改

阅读量451

点赞数

分类专栏：中间件文章标签： elasticsearch 搜索引擎

于 2023-05-04 14:30:37 首次发布

本文链接：https://blog.csdn.net/qq_40983975/article/details/130486226

版权

中间件专栏收录该内容

8 篇文章 0 订阅

订阅专栏

Elasticsearch

一、简介

1.1 基本介绍

是一款开源的分布式搜索引擎，提供搜集、分析、存储数据三大功能，具有以下特点：

分布式，无需人工搭建集群（solr就需要人为配置，使用Zookeeper作为注册中心）
Restful风格，一切API都遵循Rest原则，容易上手
近实时搜索，数据更新在Elasticsearch中几乎是完全同步的。

1.2 目录结构

bin：运行脚本
config：配置目录
lib：依赖目录
logs：日志目录
modules：模块目录
plugins：插件目录

1.3 基本概念

index索引：类似于mysql中的一个数据库
type类型：好比数据库中的一张表
field字段：相当于数据表中的列
mapping映射
document文档：最终的文档内容
shard分片
replica副本

1.4 ES的调用方式

ES的启动端口

9200：给外部用户（客户端调用的端口）
9300：给ES集群内部通信的（外部调用不了）

使用restful api调用
- GET请求：http://localhost:9200/
- curl 可以模拟发送请求:curl -X GET “localhost:9200/?pretty”
使用Kibana devtools
自由地对ES进行操作（本质也是restful api）
devtools不建议生产环境使用
客户端调用
java客户端、go客户端
参考文档：https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/7.17/_getting_started.html

二、基本概念

2.1 ES的语法

2.1.1 DSL

参考文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl.html
是json格式的，非常容易理解，和http的请求最兼容，基本的，命令不需要记忆，忘了就查
基本操作：

建表、插入数据

POST /zpyl/student/
{
  "id":"20230424",
  "name":"张三"
}

查询

# 查询所有
GET /zpyl/_search
{
  "query": {
    "match_all": {}
  }
}
# 根据id进行查询
GET /zpyl/_doc/student

3.修改

PUT /zpyl/_doc/student
{
  "id":"20230424",
  "name":"李四"
}

删除

DELETE zpyl

2.1.2 EQL

专门查询ECS文档（标准指标文档）的数据的语法，更加的规范，但是只适用于特定的场景（比如事件流）
参考文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/eql.html
示例：

# 新增
POST /zp/_doc
{
  "id":"1234343434",
  "@timestamp":"2099-05-06",
  "name":"12312412423"
}
# 查询
GET /zp/_eql/search
{
  "query": """ 
  any where 1==1
  """
}

2.1.3 SQL

参考文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/sql-getting-started.html
学习成本最低，但是可能需要插件支持、性能较差
示例：

POST /_sql?format=txt
{
  "query": "SELECT * FROM zpyl where name ='李四'"
}

2.1.4 Painless Scripting language

编程式取值，更灵活，但是学习成本更高

2.2 Mapping映射

参考文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/explicit-mapping.html#explicit-mapping
相当于定义数据库的表结构，比如：字段和字段类型，不过ES比数据库要强，ES支持动态的mapping，它的表结构可以动态改变，而不像MySQL一样必须手动的建表，没有的字段则不能插入

2,2,1 查看映射

GET /zpyl/_mapping

2.2.2 显示创建映射

PUT /my-index-000001
{
  "mappings": {
    "properties": {
      "age":    { "type": "integer" },  # 创建了一个age字段，是integer类型
      "email":  { "type": "keyword"  }, # 创建了一个email字段，是关键词，在搜索时是不会分词的
      "name":   { "type": "text"  }     # 创建了一个name字段，在搜索时可分词
    }
  }
}

2.3 分词器

2.3.1 内置分词器

参考文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-analyzers.html

空格分词器

POST _analyze
{
  "analyzer": "whitespace",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

使用这种分词器，会产生一下的内容

[ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]

标准分词器

POST _analyze
{
  "analyzer": "standard",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

使用这种分词器，会产生一下的内容

[ The, 2, quick, brown，foxes, jumped, over, the, lazy, dog's, bone ]

关键词分词器：就是不分词，整句话当做专业术语```

POST _analyze
{
  "analyzer": "keyword",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

使用这种分词器，会产生一下的内容

[The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. ]

注意：使用内置的分词器对中文进行分词时，不时很友好，这里介绍一款IK分词器

2.3.2 IK分词器

githup网址：https://github.com/medcl/elasticsearch-analysis-ik
下载地址：https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v7.17.7 （注意一定要和es的版本对应）
将下载好的文件放入es的plugins目录下解压即可

ik_smart 智能分词，尽量选择最像一个词的拆分方式

POST _analyze
{
  "analyzer": "ik_smart",
  "text": "我是小黑子"
}

使用这种分词器，会产生一下的内容

[我，是，小，黑子]

ik_max_word 尽可能的分词，包括组合词

# 建表，插入数据
POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "我是小黑子"
}

使用这种分词器，会产生一下的内容

[我，是，小黑，黑子]

2.3.3 打分机制

参考文章：https://liyupi.blog.csdn.net/article/details/119176943
官方文档：https://www.elastic.co/guide/en/elasticsearch/guide/master/controlling-relevance.html

三、基本操作

3.1 索引

相当于创建MySQL的数据库

3.1.1 创建索引

PUT /student
{
 "settings": { 
 "number_of_shards": 1, 
 "number_of_replicas": 0 
  }
}

其中student是索引库的名称相当于mysql中的一个数据库，settings是用来配置索引库的，其中number_of_shards代表分片的数量，number_of_replicas代表的是副本的数量

3.1.2 查看索引

GET /student

3.1.3 删除索引

DELETE /student

3.2 映射

相当于数据中的表

3.2.1 创建映射

PUT student/_mapping
{
 "properties": {
	 "stuId":{
		 "type": "keyword",
		 "index": true,
		 "store": false
	 },
	 "text":{
		 "type": "text",
		 "analyzer": "ik_max_word" 
	 },
	 "name":{
		 "type": "text",
		"analyzer": "ik_max_word"
	 },
	 "age":{
	   "type":"integer"
	 }
 }
}

属性：

type
- String类型
  - text：可分词
  - keyword：不可分词
- Numerical类型
  - 基本数据类型：long、integer、short、byte、double、float、half_float
  - 浮点数的高精度类型：scaled_float
- date类型
  - Elasticsearch可以对日期格式化为字符串存储
index
- true：默认值为true，表示该字段会被索引
- false：字段不会被索引，不能用来搜索
store
- true：在_source以外额外存储一份数据
- false：默认值为false，表示不会额外存储
analyzer 分词器

3.2.2 查看映射

GET student/_mapping

3.3 数据

3.3.1 添加数据

POST /student/_doc
{ 
    "stuId":"2020120726",
    "name":"张三" ,
    "text":"我叫张三，来自地球村",
    "age":"18"
}

当不指定id时，es在保存时会自动生成一个id，通过查询索引库中的数据可以看见多了个"_id" : "iI2nwocBAM17hqHZLiMz"的值

3.3.2 修改数据

根据id来进行修改,此时iI2nwocBAM17hqHZLiMz就是上面添加之后的id，则替换掉之前的数据

POST /student/_doc/d8nBxYcB8BtxoBLYwRgV
{ 
    "stuId":"2020120726",
    "name":"管理员" ,
    "text":"我叫管理员，来自安徽合肥",
    "age":"31"
}

如果添加的id是不存在的，则是新增

POST /student/_doc/1
{ 
    "stuId":"2020120729",
    "name":"王五" ,
    "text":"我叫王五，来自安徽合肥",
    "age":"20"
}

3.3.3 删除数据

DELETE /student/_doc/1

3.4 查询

3.4.1 查询所有

GET /student/_search
{
  "query": {
	  "match_all": {}
	}
}

3.4.2 匹配查询

针对于查询的字段是可分词的
参考文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-your-data.html

单字段查询：match

GET /student/_search
{
  "query": {
    "match": {
      "text": "来自安徽"
    }
  }
}

此时注意，使用match进行查询时，会先将查询条件进行分词，多个词条之间是or的关系，默认就是or
如果希望词条之间是and的关系，需要设置"operator":“and”

GET /student/_search
{
  "query": {
    "match": {
      "text": {
        "query": "来自安徽", 
        "operator": "and"
      }
    }
  }
}

如果不确定是and还是or的关系时，可以使用最小匹配参数"minimum_should_match": “50%”

GET /student/_search
{
  "query": {
    "match": {
      "text": {
        "query": "来自安徽", 
        "minimum_should_match": "70%"
      }
    }
  }
}

多字段查询：multi_match

GET student/_search
{
  "query": {
    "multi_match": {
      "query": "张三",
      "fields": ["name","text"]
    }
  }
}

3.4.3 词条查询

针对于查询的字段是数字、时间、布尔和那些未分词的字符串

单词条查询：term

GET student/_search
{
  "query": {
    "term": {
      "stuId": {
        "value": "2020120726"
      }
    }
  }
}

多词条查询：terms

GET student/_search
{
  "query": {
    "terms": {
      "stuId": ["2020120726","2020120727"]
    }
  }
}

3.4.4 结果过滤

针对显示的字段进行过滤：_source

直接指定要显示的字段

GET student/_search
{
  "_source": ["stuId","name"], 
  "query": {"match_all": {}}
}

使用include指定要显示的字段

GET student/_search
{
  "_source": {
    "includes": ["stuId","name"]
    }, 
  "query": {"match_all": {}}
}

使用excludes指定不显示的字段

GET student/_search
{
  "_source": {
    "excludes": ["text"] 
  }, 
  "query": {"match_all": {}}
}

3.4.5 高级查询

参考文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-bool-query.html

布尔组合：bool

must：与
查询出text包含安徽且name是李四的数据

GET student/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "text": "安徽"
          }
        },
        {
          "match": {
            "name": "李四"
          }
        }
      ]
    }
  }
}

must_not：非
查询出text不包含安徽和name不是李四的数据

GET student/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "text": "安徽"
          }
        },
        {
          "match": {
            "name": "李四"
          }
        }
      ]
    }
  }
}

should：或
查询出text包含安徽或者name是李四的数据

GET student/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "text": "安徽"
          }
        },
        {
          "match": {
            "name": "李四"
          }
        }
      ]
    }
  }
}

范围查询：range
查询stuId在2020120726和2020120727之间的数据

GET /student/_search
{
  "_source": ["stuId","name"], 
  "query": {
    "range": {
      "stuId": {
        "gte": 2020120726,
        "lte": 2020120727
      }
    }
  }
}

gt:大于 gte：大于等于 lt：小于 lte：小于等于

模糊查询：fuzzy
它允许用户搜索词条与实际词条的拼写出现偏差，但是偏差的编辑距离不得超过2 注意是词条是不可分词的

GET student/_search
{
 "query": {
   "fuzzy": {
      "stuId": "2020120722"
   }
 }
}

可以使用fuzziness来指定偏差距离，最大值为2，默认值为2

3.4.6 过滤

filter过滤

GET student/_search
{
 "query": {
   "bool": {
     "must": [{  
         "range": {
           "stuId": {
            "gte": 2020120726,
            "lte": 2020120727
           }
         }
       }],
      "filter": {
        "term": {
          "name": "李四"
        }
      }
    }
  }
}

这个过滤不同于_source，因为所有的查询都会影响到文档的评分及排名。如果我们需要在查询结果中进行过滤，并且不希望过滤条件影响评分，那么就不要把过滤条件作为查询条件来用

3.4.7 排序

sore

单字段排序

GET student/_search
{
  "_source": ["stuId","name"], 
  "query": {
    "match_all": {}
  }
  , "sort": [
    {
      "stuId": {
        "order": "desc"
      }
    }
  ]
}

多字段排序

GET student/_search
{ 
   "_source": ["stuId","name"], 
  "query": {
    "match_all": {}
  }
  , "sort": [
    {
      "stuId": {
        "order": "desc"
      },
      "age":{
        "order": "asc"
      }
    }
  ]
}

3.4.8 聚合

参考文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-aggregations.html

桶：按照某种方式对数据进行分组，每一组数据在ES中称为一个桶

GET /student/_search
{ 
  "aggs": {
    "name": {
      "terms": {
        "field": "stuId" 
      }
    }
  }
}

度量
如果要对桶内的数据进行聚合运算，例如求平均值、最大、最小、求和等，这些在ES中称为度量

GET /student/_search
{ 
  "size": 0, 
  "aggs": {
    "student": {
      "terms": {
        "field": "stuId" 
      },
      "aggs": {
        "sumAge":{
          "sum": {
            "field": "age"
          }
        }
      }
    }
  }
}