不就是同时启动多个爬虫程序吗?
文章里这么多大神搞这么复杂干嘛?
简单事情复杂化干嘛?
第一步:设置好多个爬虫程序
这是第一个test_1.py文件
import scrapy
class XiachufangSpider(scrapy.Spider):
name = 'test_1'
start_urls = ['http://www.qingnian8.com/']
def parse(self, response, **kwargs):
url = response.xpath('/html/body/div[2]/div[3]/div[1]/div[1]/div[2]/ul/li[2]/a//text()').extract()
print(url)
这是第二个test_2.py文件
import scrapy
class XiachufangSpider(scrapy.Spider):
name = 'test_2'
start_urls = ['http://www.qingnian8.com/']
def parse(self, response, **kwargs):
url = response.xpath('/html/body/div[2]/div[3]/div[1]/div[1]/div[2]/ul/li[12]/a//text()').extract()
print(url)
然后再main.py启动文件中输入以下代码,这么简单,就自己看吧
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
# 在Scrapy框架内控制爬虫
if __name__ == "__main__":
process = CrawlerProcess(get_project_settings())
process.crawl("zhihu")
process.crawl("test_1")
print('-----爬虫启动-----')
process.start()
process.start()