python 广告滚动_使用python客户端弹性搜索滚动

When scrolling in elasticsearch it is important to provide at each scroll the latest scroll_id:

The initial search request and each subsequent scroll request returns

a new scroll_id — only the most recent scroll_id should be used.

The following example (taken from here) puzzle me. First, the srolling initialization:

rs = es.search(index=['tweets-2014-04-12','tweets-2014-04-13'],

scroll='10s',

search_type='scan',

size=100,

preference='_primary_first',

body={

"fields" : ["created_at", "entities.urls.expanded_url", "user.id_str"],

"query" : {

"wildcard" : { "entities.urls.expanded_url" : "*.ru" }

}

}

)

sid = rs['_scroll_id']

and then the looping:

tweets = [] while (1):

try:

rs = es.scroll(scroll_id=sid, scroll='10s')

tweets += rs['hits']['hits']

except:

break

It works, but I don't see where sid is updated... I believe that it happens internally, in the python client; but I don't understand how it works...

解决方案

In fact the code has a bug in it - in order to use the scroll feature correctly you are supposed to use the new scroll_id returned with each new call in the next call to scroll(), not reuse the first one:

Important

The initial search request and each subsequent scroll request returns

a new scroll_id — only the most recent scroll_id should be used.

It's working because Elasticsearch does not always change the scroll_id in between calls and can for smaller result sets return the same scroll_id as was originally returned for some time. This discussion from last year is between two other users seeing the same issue, the same scroll_id being returned for awhile:

So while your code is working for a smaller result set it's not correct - you need to capture the scroll_id returned in each new call to scroll() and use that for the next call.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值