python实现将dataframe直接插入Elasticsearch数据库的方法

原创于 2020-11-19 16:08:23 发布

· 1.1k 阅读

7 ·

版权

python 专栏收录该内容

15 篇文章

订阅专栏

由于比较喜欢使用pandas包中的dataframe进行数据预处理，处理完后需要导入到数据库中，为避免重复转换数据格式，一下函数可直接完成导入。


from elasticsearch import Elasticsearch

def connect_es(frame, index_, type_):

    try:
        es = Elasticsearch(host, http_auth=(user, password), port='9200')
        df_as_json = frame.to_json(orient='records', lines=True)
        bulk_data = []
        
        for json_document in df_as_json.split('\n'):
            bulk_data.append({"index": {
                '_index': index_,
                '_type': type_,
            }})
            bulk_data.append(json.loads(json_document))
            
            # 一次bulk request包含1000条数据
            if len(bulk_data) > 1000:
                es.bulk(bulk_data)
                bulk_data = []
                
        es.bulk(bulk_data)
        print('database connet successfully')

    except Exception as e:
    
        print(e)