ElasticSearch学习
什么是Elasticsearch
ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java开发的,并作为Apache许可条款下的开放源码发布,是当前流行的企业级搜索引擎。设计用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。
优点:
- 分布式:节点对外表现对等,加入节点自动均衡
- elasticsearch完全支持Apache Lucene的接近实时的搜索
- 各节点组成对等的网络结构,当某个节点出现故障时会自动分配其他节点代替期进行工作
- 横向可扩展性,如果你需要增加一台服务器,只需要做点配置,然后启动就完事了
- 高可用:提供复制(replica)机制,一个分片可以设置多个复制,使得某台服务器宕机的情况下,集群仍旧可以照常运行,并会把由于服务器宕机丢失的复制恢复到其它可用节点上;这点也类似于HDFS的复制机制(HDFS中默认是3份复制)
缺点:
- 不支持事物
- 相对吃内存
ElasticSearcho数据组织
为了理解elasticsearch是如何组织数据的,我们可以从以下两个方面来观察:
- 逻辑设计,我们可以把elasticsearch与关系型数据做个客观对比:
Relational DB | Elasticsearch |
---|---|
数据库(database) | 索引(indices) |
表(tables) | types |
行(rows) | documents |
字段(columns) | fields |
elasticsearch(集群)中可以包含多个索引(数据库),每个索引中可以包含多个类型(表),每个类型下又包含多个文档(行),每个文档中又包含多个字段(列)。
- 物理设计,在elasticsearch后台是如何处理这些数据的呢?elasticsearch将每个索引划分为多个分片,每份分片又可以在集群中的不同服务器间迁移。
逻辑设计:文档、类型、索引
一个索引类型中,包含多个文档,比如说文档1,文档2。
当我们索引一篇文档时,可以通过这样的顺序找到它:索引
▷类型
▷文档ID
,通过这个组合我们就能索引到某个具体的文档。
注意:ID不必是整数,实际上它是个字符串。
文档
之前说elasticsearch是面向文档的,那么就意味着索引和搜索数据的最小单位是文档,elasticsearch中,文档有几个重要属性:
- 自我包含,一篇文档同时包含字段和对应的值,也就是同时包含
key:value
- 可以是层次型的,一个文档中包含自文档,复杂的逻辑实体就是这么来的
- 灵活的结构,文档不依赖预先定义的模式,我们知道关系型数据库中,要提前定义字段才能使用,在elasticsearch中,对于字段是非常灵活的,有时候,我们可以忽略该字段,或者动态的添加一个新的字段。
- 文档是无模式的,也就是说,字段对应值的类型可以是不限类型的。
尽管我们可以随意的新增或者忽略某个字段,但是,每个字段的类型非常重要,比如一个年龄字段类型,可以是字符串也可以是整型。因为elasticsearch会保存字段和类型之间的映射及其他的设置。这种映射具体到每个映射的每种类型,这也是为什么在elasticsearch中,类型有时候也称为映射类型。
类型
类型是文档的逻辑容器,就像关系型数据库一样,表格是行的容器。
类型中对于字段的定义称为映射,比如name
映射为字符串类型。
我们说文档是无模式的,它们不需要拥有映射中所定义的所有字段,比如新增一个字段,那么elasticsearch是怎么做的呢?elasticsearch会自动的将新字段加入映射,但是这个字段的不确定它是什么类型,elasticsearch就开始猜,如果这个值是18,那么elasticsearch会认为它是整型。
但是elasticsearch也可能猜不对,所以最安全的方式就是提前定义好所需要的映射,这点跟关系型数据库殊途同归了,先定义好字段,然后再使用,别整什么幺蛾子。后面在讨论更多关于映射的东西。
索引
索引是映射类型的容器,elasticsearch中的索引是一个非常大的文档集合。索引存储了映射类型的字段和其他设置。然后它们被存储到了各个分片上了。
我们来研究下分片是如何工作的。
物理设计:节点和分片
一个集群包含至少一个节点,而一个节点就是一个elasticsearch进程。节点内可以有多个索引。
默认的,如果你创建一个索引,那么这个索引将会有5个分片(primary shard,又称主分片)构成,而每个分片又有一个副本(replica shard,又称复制分片),这样,就有了10个分片。
那么这个索引是如何存储在集群中的呢?
上图是一个有3个节点的集群,可以看到主分片和对应的复制分片都不会在同一个节点内,这样有利于某个节点挂掉了,数据也不至于丢失。
实际上,一个分片是一个Lucene索引,一个包含倒排索引的文件目录,倒排索引的结构使得elasticsearch在不扫描全部文档的情况下,就能告诉你哪些文档包含特定的关键字。
倒排索引
elasticsearch使用的是一种称为倒排索引的结构,采用Lucene倒排索作为底层。这种结构适用于快速的全文搜索,一个索引由文档中所有不重复的列表构成,对于每一个词,都有一个包含它的文档列表。
倒排列表(Posting List)记录了词条对应的文档集合,由倒排索引项(Posting)组成。
倒排索引项主要包含如下信息:
- 文档id,用于获取原始信息。
- 词条频率(TF,Term Frequency),记录该词条在文档中出现的次数,用于后续相关性算分。
- 位置(Position),记录词条在文档中的分词位置(多个),用于做短语搜索(Phrase Query)。
- 偏移(Offset),记录词条在文档的开始和结束位置,用于做高亮显示。
ElasticSearch简单操作
增加操作
PUT表示创建命令。虽然命令可以小写,但是我们推荐大写。在以REST ful
风格返回的结果中:
PUT index/doc/1
{
"name": "zhangsan",
"sex": "男"
}
返回结果:
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
结果中的result
则是操作类型,现在是created
,表示第一次创建。如果我们再次点击执行该命令,那么result
则会是updated
。我们细心则会发现_version
开始是1,现在你每点击一次就会增加一次。表示第几次更改。
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
查询指定的索引
GET index
返回结果:
{
"index" : {
"aliases" : { },
"mappings" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"sex" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1597129469439",
"number_of_shards" : "1",
"number_of_replicas" : "1",
"uuid" : "mUo_KvdRTouvu0Kze_RdzQ",
"version" : {
"created" : "7060299"
},
"provided_name" : "index"
}
}
}
}
查询文档信息
GET index/doc/1
返回结果:
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_version" : 2,
"_seq_no" : 1,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "zhangsan",
"sex" : "男"
}
}
_search的使用
GET index/doc/_search
返回结果:
{
"took" : 38,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "zhangsan",
"sex" : "男"
}
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "lisi",
"sex" : "男"
}
}
]
}
}
DSL结构化查询 match的使用
GET index/doc/_search
{
"query": {
"match": {
"age": "18"
}
}
}
返回结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "zhangsan",
"sex" : "男",
"age" : 18
}
}
]
}
}
math_all的用法
GET index/doc/_search
{
"query": {
"match_all": {}
}
}
返回结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "zhangsan",
"sex" : "男",
"age" : 18
}
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "lisi",
"sex" : "男",
"age" : 19
}
}
]
}
}
删除索引
删除单条
DELETE index/doc/2
返回结果
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_version" : 6,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 7,
"_primary_term" : 1
}
其中result为“deleted”,表示删除
删除所有
DELETE index
返回结果
{
"acknowledged" : true
}
指定修改字段,使用POST
POST index/doc/2/_update
{
"doc": {
"age": 19
}
}
返回结果:
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_version" : 4,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 4,
"_primary_term" : 1
}
排序
降序
GET index/doc/_search
{
"query": {
"match_all": {}
}
, "sort": [
{
"age": {
"order": "desc"
}
}
]
}
返回结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "zhaoliu",
"sex" : "男",
"age" : 22
},
"sort" : [
22
]
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "lisi",
"sex" : "男",
"age" : 20
},
"sort" : [
20
]
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "wangwu",
"sex" : "男",
"age" : 19
},
"sort" : [
19
]
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "zhangsan",
"sex" : "男",
"age" : 18
},
"sort" : [
18
]
}
]
}
}
升序
GET index/doc/_search
{
"query": {
"match_all": {}
}
, "sort": [
{
"age": {
"order": "asc"
}
}
]
}
返回结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "zhangsan",
"sex" : "男",
"age" : 18
},
"sort" : [
18
]
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "wangwu",
"sex" : "男",
"age" : 19
},
"sort" : [
19
]
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "lisi",
"sex" : "男",
"age" : 20
},
"sort" : [
20
]
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "zhaoliu",
"sex" : "男",
"age" : 22
},
"sort" : [
22
]
}
]
}
}
分页
from:从什么位置开始
size: 每页几条
GET index/doc/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 2
}
返回结果
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "zhangsan",
"sex" : "男",
"age" : 18
}
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "lisi",
"sex" : "男",
"age" : 20
}
}
]
}
}
**布尔查询 **
should(or)、must(and)、must_not(not)
should查询 查询名字叫“zhangsan”或者年龄是20的
GET index/doc/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "zhangsan"
}
},
{
"match": {
"age": 20
}
}
]
}
}
}
返回结果:
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.2039728,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_score" : 1.2039728,
"_source" : {
"name" : "zhangsan",
"sex" : "男",
"age" : 18
}
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "lisi",
"sex" : "男",
"age" : 20
}
}
]
}
}
must查询 查询名字叫“zhangsan”并且年龄是18的
GET index/doc/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "zhangsan"
}
},
{
"match": {
"age": 18
}
}
]
}
}
}
返回结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 2.2039728,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_score" : 2.2039728,
"_source" : {
"name" : "zhangsan",
"sex" : "男",
"age" : 18
}
}
]
}
}
filter使用,查询年龄大于20岁的
GET index/doc/_search
{
"query": {
"bool": {
"should": [
{
"match_all": {}
}
],
"filter": [
{
"range": {
"age": {
"gt": 20
}
}
}
]
}
}
}
gt:大于,gte:大于等于,lt:小于, lte:小于等于
返回结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "zhaoliu",
"sex" : "男",
"age" : 22
}
}
]
}
}
高亮
GET index/doc/_search
{
"query": {
"match": {
"name": "lisi"
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
返回结果:
{
"took" : 46,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2039728,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_score" : 1.2039728,
"_source" : {
"name" : "lisi",
"sex" : "男",
"age" : 20
},
"highlight" : {
"name" : [
"<em>lisi</em>"
]
}
}
]
}
}
高亮,更换默认标签
pre_tags: 开始标签
post_tags: 结束标签
GET index/doc/_search
{
"query": {
"match": {
"name": "lisi"
}
},
"highlight": {
"pre_tags": "<b style='color: red'>",
"post_tags": "</b>",
"fields": {
"name": {}
}
}
}
返回结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2039728,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_score" : 1.2039728,
"_source" : {
"name" : "lisi",
"sex" : "男",
"age" : 20
},
"highlight" : {
"name" : [
"<b style='color: red'>lisi</b>"
]
}
}
]
}
}
结果字段过滤
_source:可以是单个key, 也可是列表
GET index/doc/_search
{
"query": {
"match": {
"name": "lisi"
}
},
"_source": ["name"]
}
返回结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2039728,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_score" : 1.2039728,
"_source" : {
"name" : "lisi"
}
}
]
}
}
聚合
sum(求和)、max(最大)、min(最小)、avg(平均)、group(分组)
GET index/doc/_search
{
"aggs": {
"mysum": {
"sum": {
"field": "age"
}
}
}
}
返回结果
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "zhangsan",
"sex" : "男",
"age" : 18
}
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "lisi",
"sex" : "男",
"age" : 20
}
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "wangwu",
"sex" : "男",
"age" : 19
}
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "zhaoliu",
"sex" : "男",
"age" : 22
}
}
]
},
"aggregations" : {
"mysum" : {
"value" : 79.0
}
}
}
分组
分组以后,对分组进行求和
GET index/doc/_search
{
"aggs": {
"group": {
"range": {
"field": "age",
"ranges": [
{
"from": 10,
"to": 20
},
{
"from": 20,
"to": 30
}
]
},
"aggs": {
"mysum": {
"sum": {
"field": "age"
}
}
}
}
}
}
返回结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "zhangsan",
"sex" : "男",
"age" : 18
}
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "lisi",
"sex" : "男",
"age" : 20
}
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "wangwu",
"sex" : "男",
"age" : 19
}
},
{
"_index" : "index",
"_type" : "doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "zhaoliu",
"sex" : "男",
"age" : 22
}
}
]
},
"aggregations" : {
"group" : {
"buckets" : [
{
"key" : "10.0-20.0",
"from" : 10.0,
"to" : 20.0,
"doc_count" : 2,
"mysum" : {
"value" : 37.0
}
},
{
"key" : "20.0-30.0",
"from" : 20.0,
"to" : 30.0,
"doc_count" : 2,
"mysum" : {
"value" : 42.0
}
}
]
}
}
}