下面是出错内容:
WARNING:elasticsearch:POST http://es-cn-09k1o69vj0006jcz9.public.elasticsearch.aliyuncs.com:9200/crawl_basis_pn/_update_by_query [status:500 request:0.015s]
DEBUG:elasticsearch:> {"query":{"term":{"_id":"bQlgboYBwWirVBbOLVBj"}},"script":{"source":"ctx._source.ProductUrl='https://www.bom2buy.com/partIntelligence/TL431AIYDT/';ctx._source.SubStatus=1"}}
DEBUG:elasticsearch:< {"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting","bytes_wanted":0,"bytes_limit":0,"durability":"TRANSIENT"}],"type":"general_script_exception","reason":"Failed to compile inline script [ctx._source.ProductUrl='https://www.bom2buy.com/partIntelligence/TL431AIYDT/';ctx._source.SubStatus=1] using lang [painless]","caused_by":{"type":"circuit_breaking_exception","reason":"[script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting","bytes_wanted":0,"bytes_limit":0,"durability":"TRANSIENT"}},"status":500}
ERROR:scrapy.core.engine:Error while obtaining start requests
ElasticSearch5分钟内执行脚本编译超过75个,编译太多而拒绝编译。编译是非常耗时的,这是ES的自我保护功能。下面是源码:
这个函数会时刻调用,要更新200w 条
def update_producturl(self,item):
time.sleep(0.5)
productUrl="https://www.bom2buy.com/partIntelligence/"+urllib.parse.quote(item['PN'],safe='')+"/"
ubq = UpdateByQuery(using=esclient(), index=index_name) \
.query("term", _id=item['Id']) \
.script(source=f"ctx._source.ProductUrl='{productUrl}';ctx._source.SubStatus=1")
res=ubq.execute()
r=res
尝试解决办法:
将参数写入params,源码source就不需要重复编译。
def update_producturl(self,item):
time.sleep(0.5)
productUrl="https://www.bom2buy.com/partIntelligence/"+urllib.parse.quote(item['PN'],safe='')+"/"
ubq = UpdateByQuery(using=esclient(), index=index_name) \
.query("term", _id=item['Id']) \
.script(source=f"ctx._source.ProductUrl=params.productUrl;ctx._source.SubStatus=1",
params={
'productUrl': productUrl
})
res=ubq.execute()
r=res