Scrapy连接MongoDB
0.安装部署MongoDB
1.在scrapy项目settings.py配置MongoDB连接信息
(必须为可写的节点,ps:副本节点可读不可写)
settings.py
Mongoip='192.xxx.xx.xx' #mongoDB节点 ip地址
MongoPort=27017 #端口号
MongoDBname='datago306' #文档名
MongoItem='jobItem' #item名
2.在pipeline.py中将item写入mongo
连接mongo会用到pymongo,使用pip install安装即可。
编写一个将item写入mongo的middleware
pipeline.py
from pymongo import MongoClient # 使用MongoClient连接mongo
from XXX.settings import Mongoip,MongoDBname,MongoPort,MongoItem
# 从settings.py导入第一步配置的连接信息
# XXX为scrapy工程名字
class CrawldataToMongoPipline(object):
def __init__(self):
host=Mongoip
port=MongoPort
dbName=MongoDBname
client=MongoClient(host=host,port=port) # 创建连接对象client
db=client[dbName] # 使用文档dbName='datago306'
self.post = db[MongoItem] # 使用item MongoItem='jobItem'
def process_item(self, item, spider):
job_info = dict(item) # item转换为字典格式
self.post.insert(job_info) # 将item写入mongo
return item
2.在settings.py中启用写好的CrawldataToMongoPipline middleware
settings.py
ITEM_PIPELINES = {
# 'crawlData.pipelines.CrawldataPipeline': 300,
'crawlData.pipelines.CrawldataToMongoPipline': 300,
}