ElasticSearch利用Search After解决深度分页问题

最新推荐文章于 2024-03-17 22:28:08 发布

泗水长流

最新推荐文章于 2024-03-17 22:28:08 发布

阅读量2.8k

点赞数 1

分类专栏： Elasticsearch 文章标签： es深度分页深度分页 elasticsearch es

本文链接：https://blog.csdn.net/lvxinchun/article/details/114478340

版权

Elasticsearch 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

ElasticSearch利用Search After解决深度分页问题

1.ElasticSearch常见分页

ElasticSearch默认采用的分页方式是 from+ size 的形式，这种形式下，如果数据量不大或者from、size不大的情况下，效率还是蛮高的。但是在深度分页的情况下，这种使用方式效率是非常低的，并发一旦过大，还有可能直接拖垮整个ElasticSearch的集群。

那么 from+ size 这种分页方式的原理是什么呢？

我们知道，ElasticSearch本身就是分布式的，数据基本上会均匀的分布在各个分片上，比如当一个查询，from=990,size=10,ElasticSearch会在每个分片上都先获取1000个文档，然后通过Coordinating Node聚合所有结果，通过排序选取前1000个文档，最后再从这1000个选择出来的文档中，挑选出10个文档反回给客户端。试想一下，如果from=100000,size=10,三个分片，那么每个分片都要查询出100010个文档，那么Coordinating Node聚合三个分片的数据，就是3*100010个文档，这就非常恐怖了，会占用大量的内存

2.ElasticSearch深度分页问题

现在如果我们要做以下的操作：当前index采用了3个分片，from=100000,size=10,那么这种情况就是一种典型的深度分页。根据上面我们说的分页知识，这时候每个分片都要查询出100010个文档，那么Coordinating Node聚合三个分片的数据，就是3*100010个文档，这就非常恐怖了，会占用大量的内存。偶尔的查询可能还好，如果恰好遇到大的并发，直接就会把内存给打爆的，拖垮整个ElasticSearch集群。所以为了避免深度分页带来的内存开销，ElasticSearch内部有一个默认设定，即最多只能查询前10000个文档。那么如果产品必须要做深度分页，那么应该采取什么方案呢？这个时候，Search_After就开始登场了。

3.ElasticSearch深度分页问题的解决

Search_After通过维护一个实时游标来避免scroll的缺点，它可以用于实时请求和高并发场景。

每个文档具有一个唯一值的字段应该用作排序规范的仲裁器。否则，具有相同排序值的文档的排序顺序将是未定义的。建议的方法是使用字段_id或者业务Id，保证是每个文档的一个唯一值。

POST /kibana_sample_data_ecommerce/_search
{
  "size": 1,
  "query": {
    "match": {
      "customer_first_name": "Brigitte"
    }
  },
  "sort": [
    { "order_date": {"order": "desc"}},
    {"order_id": "asc"}
  ]      
}

返回结果：

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 135,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "kibana_sample_data_ecommerce",
        "_type" : "_doc",
        "_id" : "KEtz4XcBG0yVqdDwEOv7",
        "_score" : null,
        "_source" : {
          "category" : [
            "Women's Shoes",
            "Women's Clothing"
          ],
          "currency" : "EUR",
          "customer_first_name" : "Brigitte",
          "customer_full_name" : "Brigitte Cross",
          "customer_gender" : "FEMALE",
          "customer_id" : 12,
          "customer_last_name" : "Cross",
          "customer_phone" : "",
          "day_of_week" : "Saturday",
          "day_of_week_i" : 5,
          "email" : "brigitte@cross-family.zzz",
          "manufacturer" : [
            "Tigress Enterprises",
            "Pyramidustries"
          ],
          "order_date" : "2021-03-13T23:22:34+00:00",
          "order_id" : 592088,
          "products" : [
            {
              "base_price" : 36.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Tigress Enterprises",
              "tax_amount" : 0,
              "product_id" : 14810,
              "category" : "Women's Shoes",
              "sku" : "ZO0026600266",
              "taxless_price" : 36.99,
              "unit_discount_amount" : 0,
              "min_price" : 19.23,
              "_id" : "sold_product_592088_14810",
              "discount_amount" : 0,
              "created_on" : "2016-12-31T23:22:34+00:00",
              "product_name" : "Ankle boots - black",
              "price" : 36.99,
              "taxful_price" : 36.99,
              "base_unit_price" : 36.99
            },
            {
              "base_price" : 54.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Pyramidustries",
              "tax_amount" : 0,
              "product_id" : 9131,
              "category" : "Women's Clothing",
              "sku" : "ZO0184501845",
              "taxless_price" : 54.99,
              "unit_discount_amount" : 0,
              "min_price" : 25.3,
              "_id" : "sold_product_592088_9131",
              "discount_amount" : 0,
              "created_on" : "2016-12-31T23:22:34+00:00",
              "product_name" : "Light jacket - peacoat",
              "price" : 54.99,
              "taxful_price" : 54.99,
              "base_unit_price" : 54.99
            }
          ],
          "sku" : [
            "ZO0026600266",
            "ZO0184501845"
          ],
          "taxful_total_price" : 91.98,
          "taxless_total_price" : 91.98,
          "total_quantity" : 2,
          "total_unique_products" : 2,
          "type" : "order",
          "user" : "brigitte",
          "geoip" : {
            "country_iso_code" : "US",
            "location" : {
              "lon" : -74,
              "lat" : 40.8
            },
            "region_name" : "New York",
            "continent_name" : "North America",
            "city_name" : "New York"
          },
          "event" : {
            "dataset" : "sample_ecommerce"
          }
        },
        "sort" : [
          1615677754000,
          "592088"
        ]
      }
    ]
  }
}

上面的请求会为每一个文档返回一个包含sort排序值的数组。这些sort排序值可以被用于 search_after 参数里以便抓取下一页的数据。比如，我们可以使用最后的一个文档的sort排序值，将它传递给 search_after 参数

POST /kibana_sample_data_ecommerce/_search
{
  "size": 1,
  "query": {
    "match": {
      "customer_first_name": "Brigitte"
    }
  },
  "search_after":
     [
          1615677754000,
          "592088"
    ],
  "sort": [
    { "order_date": {"order": "desc"}},
    {"order_id": "asc"}
  ]      
}

返回结果：

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 135,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "kibana_sample_data_ecommerce",
        "_type" : "_doc",
        "_id" : "w0ty4XcBG0yVqdDw29iP",
        "_score" : null,
        "_source" : {
          "category" : [
            "Women's Clothing"
          ],
          "currency" : "EUR",
          "customer_first_name" : "Brigitte",
          "customer_full_name" : "Brigitte Meyer",
          "customer_gender" : "FEMALE",
          "customer_id" : 12,
          "customer_last_name" : "Meyer",
          "customer_phone" : "",
          "day_of_week" : "Saturday",
          "day_of_week_i" : 5,
          "email" : "brigitte@meyer-family.zzz",
          "manufacturer" : [
            "Spherecords",
            "Tigress Enterprises"
          ],
          "order_date" : "2021-03-13T16:06:14+00:00",
          "order_id" : 591709,
          "products" : [
            {
              "base_price" : 7.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Spherecords",
              "tax_amount" : 0,
              "product_id" : 20734,
              "category" : "Women's Clothing",
              "sku" : "ZO0638206382",
              "taxless_price" : 7.99,
              "unit_discount_amount" : 0,
              "min_price" : 3.6,
              "_id" : "sold_product_591709_20734",
              "discount_amount" : 0,
              "created_on" : "2016-12-31T16:06:14+00:00",
              "product_name" : "Basic T-shirt - dark blue",
              "price" : 7.99,
              "taxful_price" : 7.99,
              "base_unit_price" : 7.99
            },
            {
              "base_price" : 32.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Tigress Enterprises",
              "tax_amount" : 0,
              "product_id" : 7539,
              "category" : "Women's Clothing",
              "sku" : "ZO0038800388",
              "taxless_price" : 32.99,
              "unit_discount_amount" : 0,
              "min_price" : 17.48,
              "_id" : "sold_product_591709_7539",
              "discount_amount" : 0,
              "created_on" : "2016-12-31T16:06:14+00:00",
              "product_name" : "Summer dress - scarab",
              "price" : 32.99,
              "taxful_price" : 32.99,
              "base_unit_price" : 32.99
            }
          ],
          "sku" : [
            "ZO0638206382",
            "ZO0038800388"
          ],
          "taxful_total_price" : 40.98,
          "taxless_total_price" : 40.98,
          "total_quantity" : 2,
          "total_unique_products" : 2,
          "type" : "order",
          "user" : "brigitte",
          "geoip" : {
            "country_iso_code" : "US",
            "location" : {
              "lon" : -74,
              "lat" : 40.8
            },
            "region_name" : "New York",
            "continent_name" : "North America",
            "city_name" : "New York"
          },
          "event" : {
            "dataset" : "sample_ecommerce"
          }
        },
        "sort" : [
          1615651574000,
          "591709"
        ]
      }
    ]
  }
}

4.ElasticSearch之Search_After的注意事项

搜索时，需要指定sort,并且保证值是唯一的（可以通过加入_id或者文档body中的业务唯一值来保证）；
再次查询时，使用上一次最后一个文档的sort值作为search_after的值来进行查询；
不能使用随机跳页，只能是下一页或者小范围的跳页（一次查询出小范围内各个页数，利用缓存等技术，来实现小范围分页）；

泗水长流

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
2
评论
ElasticSearch利用Search After解决深度分页问题

ElasticSearch利用Search After解决深度分页问题1.ElasticSearch常见分页2.ElasticSearch深度分页问题3.ElasticSearch深度分页问题的解决4.ElasticSearch之Search_After的注意事项1.ElasticSearch常见分页ElasticSearch默认采用的分页方式是 from+ size 的形式，这种形式下，如果数据量不大或者from、size不大的情况下，效率还是蛮高的。但是在深度分页的情况下，这种使用方式效率是非常低的
复制链接

扫一扫

专栏目录