ElasticSearch读书摘要(6.6官方文档)

Getting Started

概述

ES是一个可伸缩的、开源的全文检索和分析引擎,提供对海量数据的存储、检索、实时数据分析,广泛用于很多场景下的搜索需求。

基本概念

NRT:near realtime,低延迟,从创建所有到可以被检索到有一定延迟,但不大。

Cluster:ES是一个集群,注意不同环境下集群名字要有所不同,集群可以只有一个节点。

index:索引是文档的集合,索引名称必须是小写。

type:即将废弃,6.0版本已标记为decrepted,现在索引只有一个类型。

document:索引的基本的单位,本质是是存在于索引的类型中。

shard:每个分区都是一个单独的搜索引擎,每个lucen可存储对文档数量有上限,LUCENE-5843上限是2,147,483,519(= Integer.MAX_VALUE - 128)。

replica:为了防止node故障,需要保存数据副本。

索引被分散到很多分区中,分区有有零个或多个副本,副本和分区到数量在索引创建之初确定,之后也可以动态调整。默认情况下,一个索引有五个分区,每个分区有一个副本,这意味着集群至少有两个节点,整个集群有10个分区。

ES默认端口号9200(针对REST API)

安装

  1. 下载ES安装包
  2. 下载kibana安装包,于ES版本要一直,否则可能不兼容
  3. 启动ES
  4. 启动kibana
  5. http://localhost:5601

集群

集群健康

GET /_cat/health?v

green:particion和replica都可用
yellow:particion可用,部分replica不可用
red:部分particion不可用

集群节点

GET /_cat/nodes?v

集群索引列表

GET /_cat/indices?v

创建索引

PUT /customer?pretty

新建文档

PUT /customer/_doc/1?pretty
{
  "name":"helios"
}

POST /customer/_doc?pretty
{
  "name": "Jane Doe"
}

PUT原语可以指定文档ID,如果不需要,则采用POST,多次PUT操作,如果ID一样,则是更新(覆盖)操作,否则则是插入操作。

注意:文档对创建和索引是否已经创建无关,如果索引没有创建,则在创建文档时会自动创建索引。

检索文档

GET /customer/_doc/1?pretty

删除索引

DELETE /customer?pretty

更新文档

POST /customer/_doc/1/_update?pretty 
{
  "doc": {
    "name":"helios-update",
    "age":"27"
  }
}

POST /customer/_doc/1/_update?pretty 
{
  "script": "ctx._source.age += 5"
}

注意:文档的更新本质上并不是对某些字段做更新,而是会删除、合并旧文档,第二种方式为脚本更新。

删除文档

DELETE /customer/_doc/1?pretty

批处理

POST /customer/_doc/_bulk?pretty 
{"index":{"_id":"1"}}
{"name":"helios-bulk-1"}
{"index":{"_id":"2"}}
{"name":"helios-bulk-2"}

POST /customer/_doc/_bulk?pretty 
{"update":{"_id":"1"}}
{"doc":{"name":"bulk-update"}}
{"delete":{"_id":"2"}}

搜索

GET /bank/_search?q=*&sort=account_number:asc&pretty

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

说明:match_all在整个文档做匹配。

source

GET /bank/_search
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"]
}

说明:默认_source字段范围整个文档,可以在request中选择显示哪几个字段。

match

GET /bank/_search
{
  "query": { "match": { "address": "mill" } }
}

说明:不同于match_all,匹配到具体字段,关键字会被切词。

match_phrase

GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

bool must

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

说明:must中筛选项必须全部满足。

bool should

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

说明:should中筛选项至少满足一个。

bool must_not

GET /bank/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

说明:must_not必须全部都不满足。

聚合

terms:

request:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

response:

{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets" : [ {
        "key" : "ID",
        "doc_count" : 27
      }, {
        "key" : "TX",
        "doc_count" : 27
      }, {
        "key" : "AL",
        "doc_count" : 25
      }, {
        "key" : "MD",
        "doc_count" : 25
      }, {
        "key" : "TN",
        "doc_count" : 23
      }, {
        "key" : "MA",
        "doc_count" : 21
      }, {
        "key" : "NC",
        "doc_count" : 21
      }, {
        "key" : "ND",
        "doc_count" : 21
      }, {
        "key" : "ME",
        "doc_count" : 20
      }, {
        "key" : "MO",
        "doc_count" : 20
      } ]
    }
  }
}

注意:bucket默认返回十个

avg

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

range:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}

过滤

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

注意:过滤并不计算得分。

wechatimg21.jpeg

Set Up ES

安装

检查ES是否运行

GET /

后台运行

./bin/elasticsearch -d -p pid

说明: -p参数后跟文件名,将pid保存到文件中。

命令行配置ES

./bin/elasticsearch -d -Ecluster.name=my_cluster -Enode.name=node_1

聚合

  • metrics:avg
  • pipline:max、min
  • bucket:terms
  • matrix:

query dsl

Mapping

概述

元字段:_index、_type、_id、_source

字段:

基础类型:text, keyword, date, long, double, boolean, ip

防止mapping无限制增加:

index.mapping.total_fields.limit:索引字段的最大数量,默认是1000.

index.mapping.depth.limit:字段最大深度,默认是20.

index.mapping.nested_fields.limit:嵌套字段上限,默认是50,每个嵌套字段都是个独立的隐藏文档。

注意:索引创建的时候mapping已经固定了,如果想要更新mapping,可以重建索引,采用alias方式。

PUT my_index 
{
  "mappings": {
    "_doc": { 
      "properties": { 
        "title":    { "type": "text"  }, 
        "name":     { "type": "text"  }, 
        "age":      { "type": "integer" },  
        "created":  {
          "type":   "date", 
          "format": "strict_date_optional_time||epoch_millis"
        }
      }
    }
  }
}

Mapping多类型移除

6.x之前的版本,索引支持多类型,自6.0之后开始取消这一特性,仅支持单类型索引。

早期多类型实现:

PUT twitter
{
  "mappings": {
    "user": {
      "properties": {
        "name": { "type": "text" },
        "user_name": { "type": "keyword" },
        "email": { "type": "keyword" }
      }
    },
    "tweet": {
      "properties": {
        "content": { "type": "text" },
        "user_name": { "type": "keyword" },
        "tweeted_at": { "type": "date" }
      }
    }
  }
}

PUT twitter/user/kimchy
{
  "name": "Shay Banon",
  "user_name": "kimchy",
  "email": "shay@kimchy.com"
}

PUT twitter/tweet/1
{
  "user_name": "kimchy",
  "tweeted_at": "2017-10-24T09:00:00Z",
  "content": "Types are going away"
}

GET twitter/tweet/_search
{
  "query": {
    "match": {
      "user_name": "kimchy"
    }
  }
}

替代实现:

PUT twitter
{
  "mappings": {
    "_doc": {
      "properties": {
        "type": { "type": "keyword" }, 
        "name": { "type": "text" },
        "user_name": { "type": "keyword" },
        "email": { "type": "keyword" },
        "content": { "type": "text" },
        "tweeted_at": { "type": "date" }
      }
    }
  }
}

PUT twitter/_doc/user-kimchy
{
  "type": "user", 
  "name": "Shay Banon",
  "user_name": "kimchy",
  "email": "shay@kimchy.com"
}

PUT twitter/_doc/tweet-1
{
  "type": "tweet", 
  "user_name": "kimchy",
  "tweeted_at": "2017-10-24T09:00:00Z",
  "content": "Types are going away"
}

GET twitter/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "user_name": "kimchy"
        }
      },
      "filter": {
        "match": {
          "type": "tweet" 
        }
      }
    }
  }
}

多类型索引到单类型的迁移

PUT users
{
  "settings": {
    "index.mapping.single_type": true
  },
  "mappings": {
    "_doc": {
      "properties": {
        "name": {
          "type": "text"
        },
        "user_name": {
          "type": "keyword"
        },
        "email": {
          "type": "keyword"
        }
      }
    }
  }
}

PUT tweets
{
  "settings": {
    "index.mapping.single_type": true
  },
  "mappings": {
    "_doc": {
      "properties": {
        "content": {
          "type": "text"
        },
        "user_name": {
          "type": "keyword"
        },
        "tweeted_at": {
          "type": "date"
        }
      }
    }
  }
}

POST _reindex
{
  "source": {
    "index": "twitter",
    "type": "user"
  },
  "dest": {
    "index": "users"
  }
}

POST _reindex
{
  "source": {
    "index": "twitter",
    "type": "tweet"
  },
  "dest": {
    "index": "tweets"
  }
}

自定义类型字段

PUT new_twitter
{
  "mappings": {
    "_doc": {
      "properties": {
        "type": {
          "type": "keyword"
        },
        "name": {
          "type": "text"
        },
        "user_name": {
          "type": "keyword"
        },
        "email": {
          "type": "keyword"
        },
        "content": {
          "type": "text"
        },
        "tweeted_at": {
          "type": "date"
        }
      }
    }
  }
}


POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  },
  "script": {
    "source": """
      ctx._source.type = ctx._type;
      ctx._id = ctx._type + '-' + ctx._id;
      ctx._type = '_doc';
    """
  }
}

字段数据类型

String

text:分词
keyword:不分词。

Numeric datatypes

long, integer, short, byte, double, float, half_float, scaled_float

Date datatype

date

Boolean datatypes

boolean

Binary datatype

binary

Range datatypes

integer_range, float_range, long_range, double_range, date_range

Array datatype

数组类型并没有特别定义,当需要使用数据时,不需要显示声明字段类型,只要数组里对象直接或间接类型一致即可,需要注意的时,对于对象的数组,倒排索引存储的时候会打散对象各字段,如果想保持对象在数组里的独立实体属性,可以使用nexted。

Object datatype

PUT my_index
{
  "mappings": {
    "_doc": { 
      "properties": {
        "region": {
          "type": "keyword"
        },
        "manager": { 
          "properties": {
            "age":  { "type": "integer" },
            "name": { 
              "properties": {
                "first": { "type": "text" },
                "last":  { "type": "text" }
              }
            }
          }
        }
      }
    }
  }
}

Nested datatype

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "user": {
          "type": "nested" 
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "group" : "fans",
  "user" : [
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

GET my_index/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "Smith" }} 
          ]
        }
      }
    }
  }
}

GET my_index/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "White" }} 
          ]
        }
      },
      "inner_hits": { 
        "highlight": {
          "fields": {
            "user.first": {}
          }
        }
      }
    }
  }
}

Geo-point datatype

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "location": {
          "type": "geo_point"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "text": "Geo-point as an object",
  "location": { 
    "lat": 41.12,
    "lon": -71.34
  }
}

PUT my_index/_doc/2
{
  "text": "Geo-point as a string",
  "location": "41.12,-71.34" 
}

PUT my_index/_doc/3
{
  "text": "Geo-point as a geohash",
  "location": "drm3btev3e86" 
}

PUT my_index/_doc/4
{
  "text": "Geo-point as an array",
  "location": [ -71.34, 41.12 ] 
}

GET my_index/_search
{
  "query": {
    "geo_bounding_box": { 
      "location": {
        "top_left": {
          "lat": 42,
          "lon": -72
        },
        "bottom_right": {
          "lat": 40,
          "lon": -74
        }
      }
    }
  }
}

Geo-shape datatype

详情略,需要的话去查阅官方文档。

Ip datatype

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "ip_addr": {
          "type": "ip"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "ip_addr": "192.168.1.1"
}

GET my_index/_search
{
  "query": {
    "term": {
      "ip_addr": "192.168.0.0/16"
    }
  }
}

Alias datatype

PUT trips
{
  "mappings": {
    "_doc": {
      "properties": {
        "distance": {
          "type": "long"
        },
        "route_length_miles": {
          "type": "alias",
          "path": "distance" // 
        },
        "transit_mode": {
          "type": "keyword"
        }
      }
    }
  }
}

GET _search
{
  "query": {
    "range" : {
      "route_length_miles" : {
        "gte" : 39
      }
    }
  }
}

注意:path必须是字段全称。

多字段

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "city": {
          "type": "text",
          "fields": {
            "raw": { 
              "type":  "keyword"
            }
          }
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "city": "New York"
}

PUT my_index/_doc/2
{
  "city": "York"
}

GET my_index/_search
{
  "query": {
    "match": {
      "city": "york" 
    }
  },
  "sort": {
    "city.raw": "asc" 
  },
  "aggs": {
    "Cities": {
      "terms": {
        "field": "city.raw" 
      }
    }
  }
}

Analyzer

由字符过滤器、分词器、token过滤器组成。

Character Filter:字符流进行过滤,转换或者添加、删除一些字符。

Tokenizer:对字符流进行分词。

Token Filter:对token流进行过滤,删除、添加一些token,或者转换一些token为同义词。

索引阶段,分析器寻找顺序:

  1. mapping中字段定义的analyzer
  2. setting中定义的analyzer
  3. standard

搜索阶段,分析器寻找顺序

  1. 全文检索(full text query)中定义的analyzer
  2. mapping中字段定义的search_analyzer
  3. mapping中字段定义的analyzer
  4. setting定义的default_search
  5. setting定义的default
  6. standard

示例分析:

PUT my_index
{
   "settings":{
      "analysis":{
         "analyzer":{
            "my_analyzer":{ 
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "lowercase"
               ]
            },
            "my_stop_analyzer":{ 
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "lowercase",
                  "english_stop"
               ]
            }
         },
         "filter":{
            "english_stop":{
               "type":"stop",
               "stopwords":"_english_"
            }
         }
      }
   },
   "mappings":{
      "_doc":{
         "properties":{
            "title": {
               "type":"text",
               "analyzer":"my_analyzer", 
               "search_analyzer":"my_stop_analyzer", 
               "search_quote_analyzer":"my_analyzer" 
            }
         }
      }
   }
}

PUT my_index/_doc/1
{
   "title":"The Quick Brown Fox"
}

PUT my_index/_doc/2
{
   "title":"A Quick Brown Fox"
}

GET my_index/_search
{
   "query":{
      "query_string":{
         "query":"\"the quick brown fox\"" 
      }
   }
}

示例流程分析:

  1. filter为token filter。
  2. search_analyzer在搜索阶段,如果是non-phrase query则采用此分析器。
  3. search_quote_analyzer在搜索阶段,如果是phrase query则采用此分析器。
  4. query_string为phrase查询,此时启用my_analyze分析器,查询此条分解为[the, quick, brown, fox] 。
  5. 紧接着,分解的token流又是term query,为non-phrase query,采用search_analyzer,the为停用此,被删除。
  6. 因此查询词条为[quick, brown, fox]

Analysis

已经不需要看了。

索引别名

PUT /my_index_v1 
PUT /my_index_v1/_alias/my_index 

GET /*/_alias/my_index
GET /my_index_v1/_alias/*

POST /_aliases
{
    "actions": [
        { "remove": { "index": "my_index_v1", "alias": "my_index" }},
        { "add":    { "index": "my_index_v2", "alias": "my_index" }}
    ]
}
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值