es-1

最新推荐文章于 2021-08-19 16:51:59 发布

kobexzf

最新推荐文章于 2021-08-19 16:51:59 发布

阅读量348

点赞数

分类专栏：搜索

本文链接：https://blog.csdn.net/kobexzf/article/details/89681705

版权

搜索专栏收录该内容

6 篇文章 0 订阅

订阅专栏

Index 含多 Document（同index的Document结构可不同，但相同利于搜索），Document 是json（就是条记录）
Index 内的Document可以分组，一个分组就是一个Type，每个组(Type)内Document 相似，组间不相似
从6.0后，index里建多个types 不被支持了，只有一个type
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/glossary.html

lucence：analyzer主要包含分词器(tokenizer，单个)跟过滤器（token filter，可多个），还有char filter(可多个，可在分词之前进行过滤，如去除html标记)
tokenizer：切分，如whitespace,keyword（整块数据作为单独的token）
token filter：对token处理，转小写，转单数，去stop word，最终形成term进入index

索引：settings，mappings，aliases
别名：索引可有多个别名，别名也可指向多个索引的组合（is_write_index指定write index，即向别名写入应该写入哪个index），
别名也可以是索引经过filter或routing的一个视图

PUT: 用于幂等操作，如创建某index，同一index创建一次和100次都应该是一样的，POST不带有幂等
GET也可以有request body，但实际上客户端不一定支持，浏览器不一定会关注(取出)request body
HEAD 经常未返回实际资源，如没有response body，如可通过状态码等查看index,type是否存在

Index Template: 考虑settings，mappings，aliases可以被抽出来进行公用，并考虑哪些index创建时使用此模板（index_patterns，数组），考虑可匹配多个模板时适用模板的前后顺序（order，order越小先适用），/_template访问template api

Node: running instance, 进程，ip+port，一般一物理机开一个Node，也可多个
Cluster：多个Node
Shard：Index的部分数据，是Lucence index
Replica: Shard数据的copy，可0或多个
primary shard和replica shard，primary失效，replica可升级为primary，访问可均匀分布在primary和replica上
同一shard的primary和replica不要在同一node上，最好也不在同一host上，明明应该创建replica却没有，看看你是不是只有一个node？

?pretty=true 简写 ?pretty
filter_path控制response返回的字段，flat_settings返回的设置以flat形式显示

下图具体解释；https://www.jianshu.com/p/15837be98ffd
index buffer是Lucence的应用缓存，即ES写入=lucence写入，ES refresh=lucence的segment生成并写入filesystem cache

在这里插入图片描述

store 和 _source: 两个都有就重复存储了，获取多个field：从_source中取，一次磁盘io，快速高效，若用store，分别取field，多次磁盘io，es优先从store取，没有store再_source提取
routing：shard id = hash(routing) % number_of_primary_shards， routing值可以参数指定，也可以mapping中 “_routing”: { “required”: true, “path”:“customerID”}进行提取，没有使用document id，查找时需给定正确的routing，给错了或不给(按默认路由规则)都可能导致找不到
特殊：若有parent document, 使用parent document的routing值进行index

refresh参数：true,false,wait_for（请求不立即返回，等待refresh发生后返回，太多请求等待force a refresh）
version：可用户指定，也可系统自动生成，自动生成则更新version+1, 创建了新document，老version的document没有立即删除(标记)，后面慢慢删，但不可以访问
乐观并发控制：以前使用version进行乐观并发控制，但它并不严格递增，现在额外使用_seq_no
1 两线程同时修改一document，保证更新不丢失，获取_seq_no并在保证_seq_no没修改的情况下更新，并递增_seq_no，另外一个线程失败重试
2 现在primary shard有两次更新，同步到replicas，但是到达replicas可能就乱序了，后面的更新先到，没关系，更新已经对应了一个_seq_no，_seq_no不对就等前面的更新先执行
_primary_term：Primary Shard 重新分配加1，如重启
if_seq_no和if_primary_term进行条件式的更新（并发情况下容易失败），保证在某个版本的基础上进行更新

Document api：
index：put+id （可自动创建index，cluster settings的action.auto_create_index可false关闭，可指定哪些名称的index才创建，op_type=create，不允许update），post不加id
get：实时的（特殊），get会触发refresh，realtime=false关闭实时，_source，_source_includes，_source_excludes，stored_fields
delete：
delete by query：POST index_name/_delete_by_query 请求体含query
update：put+id 或 update api： POST test/_update/1，请求体可为script式或doc式

   {
    "script" : {
        "source": "ctx._source.counter += params.count",
        "lang": "painless",
        "params" : {
            "count" : 4
        }
    }
}
ctx._source.tags.add(params.tag) //数组field增加value
ctx._source.remove('new_field') //移除某field
{  //doc 式，和原document进行merge
    "doc" : {
        "name" : "new_name"
    }
}

update by query：POST index_name/_update_by_query 请求体可含query和script(用于update)
multi get：/mget

GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "1"
        },
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "2",
            "_source" : false
        }
    ]
}
GET /test/_mget
GET /test/_doc/_mget
{
"ids" : ["1", "2"]
}
 json内可含 _source, stored_fields,routing进行每个get的定制

bulk：_bulk 或 {index}/_bulk

 { "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

reindex：把document从source index拷贝到dest index

POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  }
}

_termvectors： GET /twitter/_termvectors/1?fields=message，对某document的某些field内的term进行分析

{
  "fields" : ["text"],
  "offsets" : true,
  "payloads" : true,
  "positions" : true,
  "term_statistics" : true,
  "field_statistics" : true
} //请求体可有这些参数

_mtermvectors：同时获取多个term vectors

POST /_mtermvectors
{
   "docs": [
      {
         "_index": "twitter",
         "_id": "2",
         "term_statistics": true
      },
      {
         "_index": "twitter",
         "_id": "1",
         "fields": [
            "message"
         ]
      }
   ]
}

/_stats 统计数据api : /_stats, /index1/_stats, /index1/_stats/docs
全部统计数据，某个index统计数据，某个index统计数据的docs部分，这类api访问格式比较通用
/_segments
/_recovery
/_shard_stores (参数status：green：完全ok的index ，yellow：replica shard没有分配，red： primary shard没有分配)
/_cache/clear （query，request，fielddata，fields=foo,bar，可刷新特定的缓存如query cache）
/_flush 确保仅在事务日志中的数据也持久化到lucence中
/_refresh 确保上次refresh后的所有操作都可见(available for search)，一般情况下，存在准实时特性，内部自动周期refresh，不大需要显式调用refresh
/_forcemerge 每个shard( lucence index）含多个segment, 用于减少segment数量，（参数max_num_segments：合并后最大数量，flush：合并后是否flush）

Cat api： /_cat，提供一些不错的数据
通用参数：v：verbose，help：此api输出哪些列，h：输出特定列，如h=ip,port，format：输出格式，text (default) - json - smile - yaml - cbor，s：排序，如s=col1:desc,col2
Cat api下的具体api：/aliases，/allocation：node上有多少shard，和占据多少磁盘空间，/count：集群有多少document，/count/twitter：twitter index有多少document，/health，/indices(参数health=yellow)，/master，/nodeattrs，/nodes，/pending_tasks，/plugins，
/recovery（和上述recovery api功能类似）：shard的分配或移动，如node启动(store recovery, shard 从磁盘加载)，node故障，更改replicas数量，快照恢复（对index备份，然后恢复） /repositories：查看snapshot repositories /thread_pool：列active: active thread数，queue：队列中任务数，rejected：被拒绝任务数 /shards，/segments，/snapshots/repo1 （查看repo1 repository 内的所有snapshot） /templates：查看index template

Cluster api：/_cluster Node filters选定特定node： _all，_local，_master（增加elected master），node id，node name，ip，hostname，master:true(增加master-eligible nodes)，master:false（移除master-eligible nodes），data:true，ingest:true，coordinating_only:true，attrname:attrvalue（节点属性匹配）
Cluster api下的具体api：/health，/state，可以/state/{metrics}和/state/{metrics}/{indices}，如/state/metadata,routing_table/foo,bar，/stats，可/stats/nodes/node1,node*,master:false， /pending_tasks，/reroute（进行shard移动，开启和取消分配，常见命令如move和allocate_replica），/settings(优先级：transient cluster settings，persistent cluster settings，本地elasticsearch.yml)，为保证各node的cluster settings一致，最好不要在本地elasticsearch.yml放入cluster settings，GET获取（include_defaults参数可获取默认设置），PUT更新，设为null进行reset， /allocation/explain(解释某个shard为什么保留在某节点上而未移动，或解释某shard为什么没有分配，请求体{“index”: “myindex”,“shard”: 0,“primary”: true}指定shard，若primary为false可加current_node定位哪个replica )，/voting_config_exclusions(决定了master-eligible nodes，POST _cluster/voting_config_exclusions/<node_name> 和DELETE _cluster/voting_config_exclusions)

Node Info api :/_nodes ，后面可接node filters，如/_nodes/node filters, 可再接具体信息item，如/_nodes/_all/process，如/_nodes/plugins
子api：/stats，可/stats/os,process，/usage(记录node上各类action的次数，如search了3次)，/hot_threads

Remote Cluster Info api：GET /_remote/info

Task Management API：GET /_tasks （参数nodes，actions）