ElasticSearch【八】浅析Scroll

41 篇文章 0 订阅
5 篇文章 0 订阅

【起因】

  正常查某索引下全部数据的dsl举例如下:

POST /fcar_city/city/_search?scroll=10m
{
    "query": {
        "bool": {
            "must": [
                {
                    "match_all": { }
                }
            ]
        }
    }
}

   我的意图是把该索引下的全部数据查询出来,上述代码查询结果如下:

{
  "_shards": {
    "total": 5,
    "failed": 0,
    "successful": 5
  },
  "hits": {
    "hits": [
      {
        "_index": "fcar_city",
        "_type": "city",
        "_source": {
          "t_b_city|administrative_name": "扬州",
          "t_b_city|create_emp": "1",
          "t_b_city|create_time": "2016-06-28 11:59:58",
          "t_b_city|id": "60",
          "t_b_city|modify_time": "2016-06-28 11:59:58",
          "t_b_city|operate_range": "1",
          "t_b_city|channel_status": "2",
          "t_b_city|is_business": "1",
          "t_b_city|modify_emp": "1",
          "t_b_city|name": "扬州",
          "t_b_city|en_name": "yz"
        },
        "_id": "60",
        "_score": 1
      },
      {
        "_index": "fcar_city",
        "_type": "city",
        "_source": {
          "t_b_city|administrative_name": "通化",
          "t_b_city|create_emp": "1",
          "t_b_city|create_time": "2016-06-28 11:59:58",
          "t_b_city|id": "44",
          "t_b_city|modify_time": "2016-06-28 11:59:58",
          "t_b_city|operate_range": "1",
          "t_b_city|channel_status": "2",
          "t_b_city|is_business": "1",
          "t_b_city|modify_emp": "1",
          "t_b_city|name": "通化",
          "t_b_city|en_name": "th"
        },
        "_id": "44",
        "_score": 1
      },
      {
        "_index": "fcar_city",
        "_type": "city",
        "_source": {
          "t_b_city|create_emp": "1",
          "t_b_city|create_time": "2016-06-28 11:59:58",
          "t_b_city|modify_time": "2016-10-09 08:40:00",
          "t_b_city|center_lat": "28.656386",
          "t_b_city|is_business": "1",
          "t_b_city|modify_emp": "253",
          "t_b_city|name": "台州",
          "t_b_city|en_name": "tz",
          "t_b_city|administrative_name": "台州",
          "t_b_city|id": "48",
          "t_b_city|operate_range": "2",
          "t_b_city|channel_status": "2",
          "t_b_city|status": "2",
          "t_b_city|center_lon": "121.420757"
        },
        "_id": "48",
        "_score": 1
      },
      {
        "_index": "fcar_city",
        "_type": "city",
        "_source": {
          "t_b_city|administrative_name": "咸阳",
          "t_b_city|create_emp": "1",
          "t_b_city|create_time": "2016-06-28 11:59:58",
          "t_b_city|id": "52",
          "t_b_city|modify_time": "2016-06-28 11:59:58",
          "t_b_city|operate_range": "1",
          "t_b_city|channel_status": "2",
          "t_b_city|is_business": "1",
          "t_b_city|modify_emp": "1",
          "t_b_city|name": "咸阳",
          "t_b_city|en_name": "xiy"
        },
        "_id": "52",
        "_score": 1
      },
      {
        "_index": "fcar_city",
        "_type": "city",
        "_source": {
          "t_b_city|administrative_name": "烟台",
          "t_b_city|create_emp": "1",
          "t_b_city|create_time": "2016-06-28 11:59:58",
          "t_b_city|id": "29",
          "t_b_city|modify_time": "2016-06-28 11:59:58",
          "t_b_city|operate_range": "1",
          "t_b_city|channel_status": "2",
          "t_b_city|is_business": "1",
          "t_b_city|modify_emp": "1",
          "t_b_city|name": "烟台",
          "t_b_city|en_name": "yt"
        },
        "_id": "29",
        "_score": 1
      },
      {
        "_index": "fcar_city",
        "_type": "city",
        "_source": {
          "t_b_city|administrative_name": "晋城",
          "t_b_city|create_emp": "1",
          "t_b_city|create_time": "2016-06-28 11:59:58",
          "t_b_city|id": "40",
          "t_b_city|modify_time": "2016-06-28 11:59:58",
          "t_b_city|operate_range": "1",
          "t_b_city|channel_status": "2",
          "t_b_city|is_business": "1",
          "t_b_city|modify_emp": "1",
          "t_b_city|name": "晋城",
          "t_b_city|en_name": "jc"
        },
        "_id": "40",
        "_score": 1
      },
      {
        "_index": "fcar_city",
        "_type": "city",
        "_source": {
          "t_b_city|administrative_name": "聊城",
          "t_b_city|create_emp": "1",
          "t_b_city|create_time": "2016-06-28 11:59:58",
          "t_b_city|id": "41",
          "t_b_city|modify_time": "2016-06-28 11:59:58",
          "t_b_city|operate_range": "1",
          "t_b_city|channel_status": "2",
          "t_b_city|is_business": "1",
          "t_b_city|modify_emp": "1",
          "t_b_city|name": "聊城",
          "t_b_city|en_name": "lc"
        },
        "_id": "41",
        "_score": 1
      },
      {
        "_index": "fcar_city",
        "_type": "city",
        "_source": {
          "t_b_city|administrative_name": "柳州",
          "t_b_city|create_emp": "1",
          "t_b_city|create_time": "2016-06-28 11:59:58",
          "t_b_city|id": "22",
          "t_b_city|modify_time": "2016-06-28 11:59:58",
          "t_b_city|operate_range": "1",
          "t_b_city|channel_status": "2",
          "t_b_city|is_business": "1",
          "t_b_city|modify_emp": "1",
          "t_b_city|name": "柳州",
          "t_b_city|en_name": "lz"
        },
        "_id": "22",
        "_score": 1
      },
      {
        "_index": "fcar_city",
        "_type": "city",
        "_source": {
          "t_b_city|administrative_name": "萍乡",
          "t_b_city|create_emp": "1",
          "t_b_city|create_time": "2016-06-28 11:59:58",
          "t_b_city|id": "24",
          "t_b_city|modify_time": "2016-06-28 11:59:58",
          "t_b_city|operate_range": "1",
          "t_b_city|channel_status": "2",
          "t_b_city|is_business": "1",
          "t_b_city|modify_emp": "1",
          "t_b_city|name": "萍乡",
          "t_b_city|en_name": "px"
        },
        "_id": "24",
        "_score": 1
      },
      {
        "_index": "fcar_city",
        "_type": "city",
        "_source": {
          "t_b_city|administrative_name": "随州",
          "t_b_city|create_emp": "1",
          "t_b_city|create_time": "2016-06-28 11:59:58",
          "t_b_city|id": "25",
          "t_b_city|modify_time": "2016-06-28 11:59:58",
          "t_b_city|operate_range": "1",
          "t_b_city|channel_status": "2",
          "t_b_city|is_business": "1",
          "t_b_city|modify_emp": "1",
          "t_b_city|name": "随州",
          "t_b_city|en_name": "sz"
        },
        "_id": "25",
        "_score": 1
      }
    ],
    "total": 152,
    "max_score": 1
  },
  "took": 3,
  "timed_out": false
}

  不难发现,tota显示l一共152条,但是默认只查了10条,这就是我前几天遇到的一个问题。

  鉴于上一篇博客,我尝试通过使用from,size搭配,改写了dsl,如下:

POST /fcar_city/city/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match_all": { }
                }
            ]
        }
    },
    "from": 0,
    "size": 1000
}



可见,此时已经查出来全部的152条记录,但是通过from,size查询,就像我上一篇博客所说,可能会耗费性能较大,而且导致“Result window is too large”的问题,之后通过查询官方网站,scroll走进我的视线里。

【Scroll】

  es官方对scroll特性介绍的第一句话是这样:

A scroll query is used to retrieve large numbers of documents from Elasticsearch efficiently, without paying the penalty of deep pagination.

  即scroll适用于大量数据的查询,而且无需担心深度分页带来的问题。

  基本写法如下:

GET /old_index/_search?scroll=1m 
{
    "query": { "match_all": {}},
    "sort" : ["_doc"], 
    "size":  1000
}

 注意2点:

(1)scroll=1m,代表scroll开启时间为1分钟;

(2)“_doc”是最有效的排序手段。

 当在“_search”之后使用了“scroll”,即使“size”设置的很大,也不会出现“Result window is too large”问题,亲测。而且对cup占用过大对问题也没有出现,原因就在于scroll的原理上。其中的奥妙就在这2段介绍中:

Scrolling allows us to do an initial search and to keep pulling batches of results from Elasticsearch until there are no more results left. It’s a bit like a cursor in a traditional database.
 
A scrolled search takes a snapshot in time. It doesn’t see any changes that are made to the index after the initial search request has been made. It does this by keeping the old data files around, so that it can preserve its “view” on what the index looked like at the time it started.

  可见,scroll所查询的,正式某一个时刻的“snapshot”,类似于视图,所以说,对于实时性要求特别高的场景,不适合适用scroll,l列表查询的话,通过from,size也是OK的。查询“字典表”的所有数据,适用scroll就很有必要。

   同时要滚动查看结果,我们执行搜索请求并将scroll值设置为我们要保持滚动窗口打开的时间长度。每次运行滚动请求时都会刷新滚动到期时间,因此只需要足够长的时间来处理当前批次的结果,而不是所有与查询匹配的文档。超时非常重要,因为保持滚动窗口打开会消耗资源,我们希望在不再需要它们时立即释放它们。设置超时使Elasticsearch能够在一段时间不活动后自动释放资源。

 so,that's all. 后续分享java代码对scroll的封装。

作者:暂7师师长常乃超
来源:CSDN
原文:https://blog.csdn.net/zzh920625/article/details/84556548
版权声明:本文为博主原创文章,转载请附上博文链接!

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值