这个爬虫是接着上个爬虫做的,先送上传送门:https://blog.csdn.net/yao09605/article/details/94596341
我们的目标网址是
http://quotes.money.163.com/trade/lsjysj_股票代码.html
股票代码的来源就是上个爬虫存到mongodb里面的股票列表
先在terminal中新建项目:
scrapy startproject stock_history
同样将项目在pycharm中打开,
首先编辑stock_history_spider.py
第一步,初始化的时候连接上MONGODB,并取出列表。
class StockHistorySpider(scrapy.Spider):
collection = 'stock_list'
name = 'stock_history_spider'
headers = {
'Referer': 'http://quotes.money.163.com/',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
}
def __init__(self):
scrapy.Spider.__init__(self) # 必须显式调用父类的init
self.log(sys.getdefaultencoding())
self.current_stock_code = ''
self.mongo_url = MONGO_URI
self.mongo_db = MONGO_DB
self.client = MongoClient(self.mongo_url)
self.db = self.client[self.mongo_db]
self.stock_list = self.db[self.collection].find({
}, {
'stock_id': 1, '_id': 0}