ElasticSearch利用Search After解决深度分页问题

1.ElasticSearch常见分页

ElasticSearch默认采用的分页方式是 from+ size 的形式,这种形式下,如果数据量不大或者from、size不大的情况下,效率还是蛮高的。但是在深度分页的情况下,这种使用方式效率是非常低的,并发一旦过大,还有可能直接拖垮整个ElasticSearch的集群。

那么 from+ size 这种分页方式的原理是什么呢?

我们知道,ElasticSearch本身就是分布式的,数据基本上会均匀的分布在各个分片上,比如当一个查询,from=990,size=10,ElasticSearch会在每个分片上都先获取1000个文档,然后通过Coordinating Node聚合所有结果,通过排序选取前1000个文档,最后再从这1000个选择出来的文档中,挑选出10个文档反回给客户端。试想一下,如果from=100000,size=10,三个分片,那么每个分片都要查询出100010个文档,那么Coordinating Node聚合三个分片的数据,就是3*100010个文档,这就非常恐怖了,会占用大量的内存

2.ElasticSearch深度分页问题

现在如果我们要做以下的操作:当前index采用了3个分片,from=100000,size=10,那么这种情况就是一种典型的深度分页。根据上面我们说的分页知识,这时候每个分片都要查询出100010个文档,那么Coordinating Node聚合三个分片的数据,就是3*100010个文档,这就非常恐怖了,会占用大量的内存。偶尔的查询可能还好,如果恰好遇到大的并发,直接就会把内存给打爆的,拖垮整个ElasticSearch集群。所以为了避免深度分页带来的内存开销,ElasticSearch内部有一个默认设定,即最多只能查询前10000个文档。那么如果产品必须要做深度分页,那么应该采取什么方案呢?这个时候,Search_After就开始登场了。

3.ElasticSearch深度分页问题的解决

Search_After通过维护一个实时游标来避免scroll的缺点,它可以用于实时请求和高并发场景。

每个文档具有一个唯一值的字段应该用作排序规范的仲裁器。否则,具有相同排序值的文档的排序顺序将是未定义的。建议的方法是使用字段_id或者业务Id,保证是每个文档的一个唯一值。

POST /kibana_sample_data_ecommerce/_search
{
  "size": 1,
  "query": {
    "match": {
      "customer_first_name": "Brigitte"
    }
  },
  "sort": [
    { "order_date": {"order": "desc"}},
    {"order_id": "asc"}
  ]      
}

返回结果:

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 135,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "kibana_sample_data_ecommerce",
        "_type" : "_doc",
        "_id" : "KEtz4XcBG0yVqdDwEOv7",
        "_score" : null,
        "_source" : {
          "category" : [
            "Women's Shoes",
            "Women's Clothing"
          ],
          "currency" : "EUR",
          "customer_first_name" : "Brigitte",
          "customer_full_name" : "Brigitte Cross",
          "customer_gender" : "FEMALE",
          "customer_id" : 12,
          "customer_last_name" : "Cross",
          "customer_phone" : "",
          "day_of_week" : "Saturday",
          "day_of_week_i" : 5,
          "email" : "brigitte@cross-family.zzz",
          "manufacturer" : [
            "Tigress Enterprises",
            "Pyramidustries"
          ],
          "order_date" : "2021-03-13T23:22:34+00:00",
          "order_id" : 592088,
          "products" : [
            {
              "base_price" : 36.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Tigress Enterprises",
              "tax_amount" : 0,
              "product_id" : 14810,
              "category" : "Women's Shoes",
              "sku" : "ZO0026600266",
              "taxless_price" : 36.99,
              "unit_discount_amount" : 0,
              "min_price" : 19.23,
              "_id" : "sold_product_592088_14810",
              "discount_amount" : 0,
              "created_on" : "2016-12-31T23:22:34+00:00",
              "product_name" : "Ankle boots - black",
              "price" : 36.99,
              "taxful_price" : 36.99,
              "base_unit_price" : 36.99
            },
            {
              "base_price" : 54.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Pyramidustries",
              "tax_amount" : 0,
              "product_id" : 9131,
              "category" : "Women's Clothing",
              "sku" : "ZO0184501845",
              "taxless_price" : 54.99,
              "unit_discount_amount" : 0,
              "min_price" : 25.3,
              "_id" : "sold_product_592088_9131",
              "discount_amount" : 0,
              "created_on" : "2016-12-31T23:22:34+00:00",
              "product_name" : "Light jacket - peacoat",
              "price" : 54.99,
              "taxful_price" : 54.99,
              "base_unit_price" : 54.99
            }
          ],
          "sku" : [
            "ZO0026600266",
            "ZO0184501845"
          ],
          "taxful_total_price" : 91.98,
          "taxless_total_price" : 91.98,
          "total_quantity" : 2,
          "total_unique_products" : 2,
          "type" : "order",
          "user" : "brigitte",
          "geoip" : {
            "country_iso_code" : "US",
            "location" : {
              "lon" : -74,
              "lat" : 40.8
            },
            "region_name" : "New York",
            "continent_name" : "North America",
            "city_name" : "New York"
          },
          "event" : {
            "dataset" : "sample_ecommerce"
          }
        },
        "sort" : [
          1615677754000,
          "592088"
        ]
      }
    ]
  }
}

上面的请求会为每一个文档返回一个包含sort排序值的数组。这些sort排序值可以被用于 search_after 参数里以便抓取下一页的数据。比如,我们可以使用最后的一个文档的sort排序值,将它传递给 search_after 参数

POST /kibana_sample_data_ecommerce/_search
{
  "size": 1,
  "query": {
    "match": {
      "customer_first_name": "Brigitte"
    }
  },
  "search_after":
     [
          1615677754000,
          "592088"
    ],
  "sort": [
    { "order_date": {"order": "desc"}},
    {"order_id": "asc"}
  ]      
}

返回结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 135,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "kibana_sample_data_ecommerce",
        "_type" : "_doc",
        "_id" : "w0ty4XcBG0yVqdDw29iP",
        "_score" : null,
        "_source" : {
          "category" : [
            "Women's Clothing"
          ],
          "currency" : "EUR",
          "customer_first_name" : "Brigitte",
          "customer_full_name" : "Brigitte Meyer",
          "customer_gender" : "FEMALE",
          "customer_id" : 12,
          "customer_last_name" : "Meyer",
          "customer_phone" : "",
          "day_of_week" : "Saturday",
          "day_of_week_i" : 5,
          "email" : "brigitte@meyer-family.zzz",
          "manufacturer" : [
            "Spherecords",
            "Tigress Enterprises"
          ],
          "order_date" : "2021-03-13T16:06:14+00:00",
          "order_id" : 591709,
          "products" : [
            {
              "base_price" : 7.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Spherecords",
              "tax_amount" : 0,
              "product_id" : 20734,
              "category" : "Women's Clothing",
              "sku" : "ZO0638206382",
              "taxless_price" : 7.99,
              "unit_discount_amount" : 0,
              "min_price" : 3.6,
              "_id" : "sold_product_591709_20734",
              "discount_amount" : 0,
              "created_on" : "2016-12-31T16:06:14+00:00",
              "product_name" : "Basic T-shirt - dark blue",
              "price" : 7.99,
              "taxful_price" : 7.99,
              "base_unit_price" : 7.99
            },
            {
              "base_price" : 32.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Tigress Enterprises",
              "tax_amount" : 0,
              "product_id" : 7539,
              "category" : "Women's Clothing",
              "sku" : "ZO0038800388",
              "taxless_price" : 32.99,
              "unit_discount_amount" : 0,
              "min_price" : 17.48,
              "_id" : "sold_product_591709_7539",
              "discount_amount" : 0,
              "created_on" : "2016-12-31T16:06:14+00:00",
              "product_name" : "Summer dress - scarab",
              "price" : 32.99,
              "taxful_price" : 32.99,
              "base_unit_price" : 32.99
            }
          ],
          "sku" : [
            "ZO0638206382",
            "ZO0038800388"
          ],
          "taxful_total_price" : 40.98,
          "taxless_total_price" : 40.98,
          "total_quantity" : 2,
          "total_unique_products" : 2,
          "type" : "order",
          "user" : "brigitte",
          "geoip" : {
            "country_iso_code" : "US",
            "location" : {
              "lon" : -74,
              "lat" : 40.8
            },
            "region_name" : "New York",
            "continent_name" : "North America",
            "city_name" : "New York"
          },
          "event" : {
            "dataset" : "sample_ecommerce"
          }
        },
        "sort" : [
          1615651574000,
          "591709"
        ]
      }
    ]
  }
}

4.ElasticSearch之Search_After的注意事项

  1. 搜索时,需要指定sort,并且保证值是唯一的(可以通过加入_id或者文档body中的业务唯一值来保证);
  2. 再次查询时,使用上一次最后一个文档的sort值作为search_after的值来进行查询;
  3. 不能使用随机跳页,只能是下一页或者小范围的跳页(一次查询出小范围内各个页数,利用缓存等技术,来实现小范围分页);
  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
Elasticsearch 中,深度分页功能的使用需要考虑到性能问题。一般来说,建议不要使用过深的分页,以避免对 Elasticsearch 的性能造成负面影响。 以下是在 Elasticsearch 中使用深度分页的方法: 1. 使用 scroll API 进行深度分页查询 scroll API 可以在内存中存储搜索上下文,而不是在每个请求之间重新计算。这使得在大数据集上进行深度分页查询变得更加有效。 示例代码: ``` SearchRequest searchRequest = new SearchRequest("indexName"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.matchAllQuery()); searchSourceBuilder.size(100); searchRequest.source(searchSourceBuilder); SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); String scrollId = searchResponse.getScrollId(); SearchHit[] searchHits = searchResponse.getHits().getHits(); while (searchHits != null && searchHits.length > 0) { SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId); scrollRequest.scroll(TimeValue.timeValueMinutes(1L)); SearchResponse scrollResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT); scrollId = scrollResponse.getScrollId(); searchHits = scrollResponse.getHits().getHits(); // Do something with searchHits } ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); clearScrollRequest.addScrollId(scrollId); client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT); ``` 在上面的示例中,size 参数设置为 100,表示每次检索返回 100 个结果。scroll API 的 scroll 参数设置为 1 分钟,表示在这段时间内保持搜索上下文。 2. 使用 search_after 参数进行深度分页查询 search_after 参数可以用来指定上一次搜索的最后一个结果,以便从下一个结果开始进行分页查询。 示例代码: ``` SearchRequest searchRequest = new SearchRequest("indexName"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.matchAllQuery()); searchSourceBuilder.size(100); searchRequest.source(searchSourceBuilder); SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); SearchHit[] searchHits = searchResponse.getHits().getHits(); while (searchHits != null && searchHits.length > 0) { SearchHit lastHit = searchHits[searchHits.length - 1]; searchSourceBuilder.searchAfter(lastHit.getSortValues()); searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); searchHits = searchResponse.getHits().getHits(); // Do something with searchHits } ``` 在上面的示例中,size 参数设置为 100,表示每次检索返回 100 个结果。search_after 参数使用上一次搜索的最后一个结果的排序值。 总之,深度分页查询在 Elasticsearch 中的实现需要考虑性能问题,建议使用 scroll API 或 search_after 参数来实现。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值