es建立索引后的排序返回结果_Elasticsearch索引和检索优化与压测监控总结

最新推荐文章于 2024-05-27 11:45:09 发布

weixin_39821035

最新推荐文章于 2024-05-27 11:45:09 发布

阅读量560

点赞数

文章标签： es建立索引后的排序返回结果

本文链接：https://blog.csdn.net/weixin_39821035/article/details/111690924

版权

1. Overview
2. 索引 index
    - index优化项
3. 检索 search
    - search优化项
4. 系统配置优化项
5. 压测 esrally
6. 监控 marvel
7. 注意事项
8. Reference
9. More

Overview

先来看看es的整体架构图，上面有多个重要模块，今天主要写在lucene上面的index模块与search模块的优化经历，力求简要写出改变了configuration之后，会给es cluster带来什么样的影响。

上图展示了一个doc index/write请求过来，es为其建立倒排的过程，而index opt.的优化点就主要集中在该posting list building过程，先认识4个组件(heap buff, os cache, transLog, disk),

es实例在每个refresh interval里将heap里面的docs刷到Lucene利用着的系统缓存里(此时doc能够被search)

transLog根据配置的持久化到disk的策略，同步docs到磁盘(顺序写盘)

transLog的clean up

mapping禁用不需要的功能
- index，倒排索引，not_analyzed，注意是否分词，尽量精简schema字段个数，不会被检索的字段就不要建立倒排。.field("index", "no")
- doc values，正排索引，用于聚合或者排序
- norms，analyzed norms存储了多种正则化算子，用于docs的排序评分，如果不需要排序，可以disable norms
- index_options，有docs(文档有无), freqs(重复出现的文档评分更高), positions(涵盖了前2种，并且多了位置信息，用于临近查询), offsets(全部，用于高亮)四类
关闭_all，让查询匹配到具体schema，可以降低索引大小index.query.default_field：your_schema_replace_all, _all字段会给search带来方便，但是会增加index时间和index尺寸
indices.memory，es instance的memory buffer大小，buffer满了/一个refresh周期到了会刷到系统缓存，如果refresh足够大，buffer也足够大，与系统

关注