elastisearch多索引查询解析

烟火缠过客

于 2024-08-12 16:50:35 发布

阅读量776

点赞数 29

分类专栏：搜索引擎的学习文章标签： jenkins

本文链接：https://blog.csdn.net/LuckFairyLuckBaby/article/details/141136065

版权

搜索引擎的学习专栏收录该内容

9 篇文章 0 订阅

订阅专栏

elastisearch多索引查询解析

文章目录

elastisearch多索引查询解析

SearchRequest 是用来构建搜索请求的对象，它允许你指定要查询的一个或多个索引。下面我将详细解释 new SearchRequest(“index1”, “index2”) 如何实现多索引查询的过程。

1.创建 SearchRequest

当你创建一个 SearchRequest 对象时，例如 new SearchRequest(“index1”, “index2”)，你会传递索引名称作为构造函数的参数。

SearchRequest searchRequest = new SearchRequest("index1", "index2");

2.设置 SearchSourceBuilder

在创建了 SearchRequest 对象之后，你可以通过 source() 方法设置查询条件。这通常涉及到一个 SearchSourceBuilder 对象。

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
searchRequest.source(searchSourceBuilder);

3. 发送请求

SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

4. 内部处理流程

4.1 解析请求

当 Elasticsearch 收到请求时，它会解析 SearchRequest 中的信息，包括索引名称。

4.2 分发请求

Elasticsearch 会将请求分发到每个指定的索引。如果索引名称是以逗号分隔的列表，则会将其拆分成单独的索引名称

4.3 构建 ShardRequest

对于每个索引，Elasticsearch 会构建一个 ShardRequest，它包含了具体的分片信息和查询条件。

4.4 分发到分片

每个 ShardRequest 会被发送到对应的分片节点上。如果一个索引有多个分片，那么每个分片都会收到一个 ShardRequest。

4.5 执行查询

每个分片上的 ShardRequest 会被处理，执行查询并返回结果。这个过程可能涉及到读取索引文件、执行过滤器、评分等操作。

4.6 收集结果

每个分片完成后，结果会被收集到一个或多个 ShardSearchTransportService 实例中。

4.7 合并结果

结果会被合并到协调节点（coordinating node），这是接收原始 SearchRequest 的节点。协调节点负责将所有分片的结果合并成一个统一的响应。

4.8 返回结果

最终，协调节点会将合并后的结果打包成一个 SearchResponse 对象，并返回给客户端。

5.源码分析

5.1 SearchRequest

在源码中，SearchRequest 的构造函数接受一个可变参数的索引名称列表，这些名称会被保存在 indices 字段中。

public SearchRequest(String... indices) {
    this.indices = indices;
}

5.2 TransportSearchAction

当请求到达 Elasticsearch 时，它会被 TransportSearchAction 处理。TransportSearchAction 是处理搜索请求的核心类。

5.3 ShardSearchRequest

TransportSearchAction 会为每个索引构建 ShardSearchRequest 对象，这些对象包含了具体的分片信息和查询条件。

// 构建 ShardSearchRequest
ShardSearchRequest shardRequest = new ShardSearchRequest(indices, routing, preference, searchType, searchSource, globalContext);

5.4 ShardSearchTransportService

ShardSearchTransportService 负责将 ShardSearchRequest 发送到正确的分片，并收集结果。

// 发送 ShardSearchRequest 到分片
SearchPhaseController controller = clusterService.getClusterApplierService().getSearchPhaseController();
controller.executePhase(shardRequest, listener);

5.5 ShardSearchRequest 执行

在分片节点上，ShardSearchRequest 会被执行，执行过程中涉及到查询解析、评分、过滤等操作。

5.6 ShardSearchResponse

每个分片执行完毕后，会返回一个 ShardSearchResponse，其中包含了该分片的查询结果。

5.7 SearchPhaseController

SearchPhaseController 负责协调查询的不同阶段，包括查询、聚合、排序等。

5.8 SearchPhase

查询的不同阶段（如查询阶段、聚合阶段等）由 SearchPhase 实现，它们负责处理查询的不同方面。

5.9 SearchResponse

最终，所有分片的结果会被合并到一个 SearchResponse 对象中，该对象包含了整个查询的结果。

总结

通过上述流程，new SearchRequest(“index1”, “index2”) 实现了对多个索引的查询。Elasticsearch 会将请求分发到指定的索引，并在每个索引的所有分片上执行查询。最后，结果会被合并并返回给客户端。这种方式保证了查询能够高效地处理多个索引，同时也充分利用了集群的分布式特性。

6.索引数量设置多少合适

索引数量的影响

元数据开销: 每个索引都有一定的元数据开销，包括索引的元数据存储和维护。
分片管理: 每个索引都有主分片和副本分片，这增加了集群管理的复杂性。
资源消耗: 索引越多意味着更多的资源被占用，包括内存、CPU 和磁盘 I/O。
查询效率: 跨多个索引的查询会增加查询的时间，尤其是在没有使用别名的情况下。

索引数量的最佳实践

避免过多的索引: 通常建议尽量减少索引的数量，以降低系统开销。
使用别名: 利用别名来管理多个相关的索引，可以简化查询并提高效率。
定期合并索引: 如果可能的话，定期合并旧的索引到一个新的索引中，以减少索引的数量。
生命周期管理: 使用 Elasticsearch 的 Index Lifecycle Management (ILM) 来自动管理索引的生命周期，包括删除不再需要的索引。

索引设计

合理设计索引: 为了减少索引数量，可以考虑将相似的数据类型放在同一个索引中，使用别名来访问这些索引。
时间分区: 如果数据是按照时间进行分区的，可以使用日期模式来创建索引，并使用别名指向当前活动的索引。

监控和调整

监控性能: 定期监控集群的性能指标，如 CPU 使用率、内存使用情况、磁盘 I/O 等。
调整索引策略: 根据监控结果调整索引策略，比如合并索引、优化查询等。

总结

索引数量应该根据你的具体需求和集群的实际情况来确定。
使用别名和生命周期管理可以帮助你更好地管理索引。
监控和调整策略是保持集群高性能的关键。

如何合并索引

索引合并的背景

在 Elasticsearch 中，每个索引由一个或多个分片组成，每个分片又由一个或多个段（Segment）构成。每次写入新文档时，都会创建一个新的段。随着索引的增长，段的数量也会增加，这会导致查询性能下降，因为每次查询都需要遍历更多的段。

快速合并索引的方法

2.1 强制合并

你可以使用 _forcemerge API 来强制合并索引中的段。这可以显著减少段的数量，从而提高查询性能。但是，强制合并是一个资源密集型的操作，可能会暂时降低集群的写入性能。

POST /your_index/_forcemerge
{
  "max_num_segments": 1
}

这里的 max_num_segments 参数指定了合并后每个分片的最大段数量。设置为 1 表示每个分片只保留一个段。

2.2 配置合并策略

除了使用 _forcemerge API，你还可以通过调整合并策略来自动控制索引的合并行为。Elasticsearch 提供了一组默认的合并策略，你可以根据需要调整这些策略。

PUT /your_index/_settings
{
  "index": {
    "merge.policy": {
      "segments": {
        "max_merge_at_once": 2,
        "max_merged_segment_size": "5gb"
      }
    },
    "merge.scheduler.auto_throttle": false,
    "merge.scheduler.max_thread_count": 2
  }
}

这里的一些关键设置包括：

max_merge_at_once: 控制一次可以合并的段的最大数量。
max_merged_segment_size: 控制合并后段的最大大小。
auto_throttle: 是否自动调整合并的速度。
max_thread_count: 合并时使用的最大线程数。

3. 手动合并

手动合并通常在集群负载较低的时候进行，以减少对集群性能的影响。

POST /your_index/_forcemerge
{
  "max_num_segments": 1,
  "only_expunge_deletes": true
}

这里的 only_expunge_deletes 参数表示只删除已经标记为删除的文档，而不合并其他段。

4. 避免频繁合并

频繁合并可能会导致性能问题。你可以通过以下方式来避免频繁合并：

减少写入频率: 尽量减少小批量写入操作，可以考虑使用缓冲区来累积写入操作。
使用批量索引: 使用批量索引 API 来一次性写入多个文档。
调整刷新间隔: 通过调整 refresh_interval 来控制文档何时被刷新到磁盘。

PUT /your_index/_settings
{
  "index": {
    "refresh_interval": "-1"
  }
}

使用别名和重新索引

如果你想要合并多个索引为一个索引，可以使用别名和重新索引 API。

5.1 创建别名

创建一个别名指向所有需要合并的索引。

PUT /_aliases
{
  "actions": [
    { "add": { "index": "index1", "alias": "combined_index" } },
    { "add": { "index": "index2", "alias": "combined_index" } }
  ]
}

5.2 重新索引

使用重新索引 API 将数据从别名复制到一个新的索引。

POST _reindex
{
  "source": {
    "index": "combined_index"
  },
  "dest": {
    "index": "merged_index"
  }
}

5.3 删除旧索引

完成重新索引后，可以删除旧的索引。

DELETE index1,index2

5.4 总结

使用 _forcemerge API 来快速合并索引内的段。
调整合并策略来自动控制合并行为。
手动合并应在低峰时段进行。
避免频繁合并以减少性能影响。
使用别名和重新索引 API 来合并多个索引为一个索引。

示例

我现在索引的创建是按月动态创建索引的，我想让三个月之前的索引合并为一个索引，如何用代码实现

确定需要合并的索引
创建别名
重新索引
验证数据
删除旧索引

1. 确定需要合并的索引

首先，你需要确定哪些索引需要被合并。假设你的索引命名规则是 logs-yyyy-MM，你可以使用 Elasticsearch 的 API 来获取所有符合条件的索引名称。

import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsRequest;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest;
import org.elasticsearch.action.admin.indices.stats.IndicesStatsResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.builder.SearchSourceBuilder;

import java.io.IOException;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.ArrayList;
import java.util.List;

public class MergeOldIndices {

    private static final String INDEX_PATTERN = "logs-%d-%02d"; // logs-yyyy-MM
    private static final String ALIAS_NAME = "old_logs";
    private static final int MONTHS_TO_KEEP_SEPARATE = 3;

    public static void main(String[] args) throws IOException {
        RestHighLevelClient client = new RestHighLevelClient(...); // 创建客户端

        LocalDate now = LocalDate.now();
        LocalDate threeMonthsAgo = now.minusMonths(MONTHS_TO_KEEP_SEPARATE);

        List<String> indicesToMerge = new ArrayList<>();
        for (int month = threeMonthsAgo.getMonthValue(); month <= now.getMonthValue(); month++) {
            LocalDate date = LocalDate.of(now.getYear(), month, 1);
            String indexName = String.format(INDEX_PATTERN, date.getYear(), date.getMonthValue());
            if (indexExists(client, indexName)) {
                indicesToMerge.add(indexName);
            }
        }

        mergeIndices(client, indicesToMerge, ALIAS_NAME);
    }

    private static boolean indexExists(RestHighLevelClient client, String indexName) throws IOException {
        IndicesExistsRequest request = new IndicesExistsRequest(indexName);
        IndicesExistsResponse response = client.indices().exists(request, RequestOptions.DEFAULT);
        return response.isExists();
    }

    private static void mergeIndices(RestHighLevelClient client, List<String> indicesToMerge, String aliasName) throws IOException {
        createAlias(client, indicesToMerge, aliasName);
        reindex(client, aliasName);
        validateData(client, aliasName);
        deleteOldIndices(client, indicesToMerge);
    }

    private static void createAlias(RestHighLevelClient client, List<String> indicesToMerge, String aliasName) throws IOException {
        // 创建别名
        List<AliasActions> actions = new ArrayList<>();
        for (String index : indicesToMerge) {
            actions.add(AliasActions.add().alias(aliasName).index(index));
        }

        client.indices().putAlias(new PutAliasRequest().actions(actions), RequestOptions.DEFAULT);
    }

    private static void reindex(RestHighLevelClient client, String aliasName) throws IOException {
        // 重新索引到新索引
        ReindexRequest reindexRequest = new ReindexRequest()
            .source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()).index(aliasName))
            .destination("merged_logs_" + LocalDate.now().format(DateTimeFormatter.ofPattern("yyyy-MM")));

        ReindexResponse reindexResponse = client.reindex(reindexRequest, RequestOptions.DEFAULT);
        System.out.println("Reindexing took: " + reindexResponse.getTotal() + " documents.");
    }

    private static void validateData(RestHighLevelClient client, String aliasName) throws IOException {
        // 验证数据
        SearchRequest searchRequest = new SearchRequest(aliasName);
        SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

        long totalHits = searchResponse.getHits().getTotalHits().value;
        System.out.println("Total hits in the alias: " + totalHits);
    }

    private static void deleteOldIndices(RestHighLevelClient client, List<String> indicesToMerge) throws IOException {
        // 删除旧索引
        client.indices().delete(new DeleteIndexRequest(indicesToMerge.toArray(new String[0])), RequestOptions.DEFAULT);
    }
}