pyspider添加elasticsearch的支持

最新推荐文章于 2021-09-06 10:22:27 发布

paulluo0739

最新推荐文章于 2021-09-06 10:22:27 发布

阅读量329

点赞数

分类专栏： pyspider python

本文链接：https://blog.csdn.net/paulluo0739/article/details/93209529

版权

python 同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

pyspider

5 篇文章 0 订阅

订阅专栏

背景

根据项目的情况，需要将pyspider采集的结果存入es，便于后续的处理。因此需要做以下工作：

在python中安装elasticsearch库
编写基本的es操作类（库）
加入pyspider的库路径中，便于后续引用

实施步骤

python安装elasticsearch库

pip install elasticsearch

目前默认安装的是7.0.2版本的库，即支持elasticsearch7.x以上的

编写基本的es操作类，以下为简单的示例，后续可追加更丰富的功能

from elasticsearch import Elasticsearch

class EsUtil:

    def __init__(self, host='localhost', port=9200):
        self.es = Elasticsearch([{'host': host, 'port': port}])

    def info(self):
        return self.es.info()

    def get(self, index, id, doc_type='_doc'):
        return self.es.get(index=index, id=id, doc_type=doc_type)

    def create_index(self, index, body=None):
        """
        :arg index: The name of the index
        :arg body: The configuration for the index (`settings` and `mappings`)
        """
        # if not self.es.indices.exists(index):
        #    self.es.indices.create(index=index, body=body)
        return self.es.indices.create(index=index, body=body)

保存为es_util.py

pyspider的库路径

由于我安装的是python3.6，所以不是系统默认的python库路径，经查询，路径为：

/usr/local/python3/lib/python3.6/site-packages/pyspider/libs/

将刚才的es_util.py文件放入该路径即可

在pyspider项目中引用

从pyspider的Handler示例中即可看出，引用规则与库路径一致：from pyspider.libs.xxxx，因此在Handler头部加入：

from pyspider.libs.es_util import EsUtil

并在其中初始化后便可使用了：

    es_util = EsUtil(host='127.0.0.1')

    @every(minutes=24 * 60)
    def on_start(self):
        print(self.es_util.info())
        self.crawl('http://www.xxx.cn/', fetch_type='js', callback=self.index_page)