问题描述:
当往ES批量导入数据时,如果需要index的columns比较多,会出现如下的超时错误。
success, errors = bulk(self.es, self.set_data())
File "/home/dxi/PycharmProjects/OEMDB/venv/lib/python3.5/site-packages/elasticsearch/helpers/__init__.py", line 257, in bulk
for ok, item in streaming_bulk(client, actions, *args, **kwargs):
File "/home/dxi/PycharmProjects/OEMDB/venv/lib/python3.5/site-packages/elasticsearch/helpers/__init__.py", line 192, in streaming_bulk
raise_on_error, *args, **kwargs)
File "/home/dxi/PycharmProjects/OEMDB/venv/lib/python3.5/site-packages/elasticsearch/helpers/__init__.py", line 99, in _process_bulk_chunk
raise e
File "/home/dxi/PycharmProjects/OEMDB/venv/lib/python3.5/site-packages/elasticsearch/helpers/__init__.py", line 95, in _process_bulk_chunk
resp = client.bulk('\n'.join(bulk_actions) + '\n', *args, **kwargs)
File "/home/dxi/PycharmProjects/OEMDB/venv/lib/python3.5/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/dxi/PycharmProjects/OEMDB/venv/lib/python3.5/site-packages/elasticsearch/client/__init__.py", line 1155, in bulk
headers={'content-type': 'application/x-ndjson'})
File "/home/dxi/PycharmProjects/OEMDB/venv/lib/python3.5/site-packages/elasticsearch/transport.py", line 318, in perform_request
status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
File "/home/dxi/PycharmProjects/OEMDB/venv/lib/python3.5/site-packages/elasticsearch/connection/http_urllib3.py", line 180, in perform_request
raise ConnectionTimeout('TIMEOUT', str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=10))
解决方法
在bukl方法中增加 parameter ‘request_timeout’
success, errors = bulk(self.es, self.set_data(), **request_timeout=30**)
参考链接
https://discuss.elastic.co/t/bulk-indexing-raise-read-timeout-error/798
https://discuss.elastic.co/t/es-bulk-insert-time-out/20794/7