我有一个Flask项目运行与Scrapy蜘蛛的子进程调用:
class Utilities(object):
@staticmethod
def scrape(inputs):
job_id = str(uuid.uuid4())
project_folder = os.path.abspath(os.path.dirname(__file__))
subprocess.call(['scrapy', 'crawl', "ExampleCrawler", "-a", "inputs=" + str(inputs), "-s", "JOB_ID=" + job_id],
cwd="%s/scraper" % project_folder)
return job_id
即使我在项目的Python调试器中启用了“在调试时自动附加到子进程”,但蜘蛛内部的断点也不起作用.第一个再次起作用的断点是返回job_id的断点.
这是蜘蛛代码的一部分,我希望断点能够工作:
from scrapy.http import FormRequest
from scrapy.spiders import Spider
from scrapy.loader import ItemLoader
from Handelsregister_Scraper.scraper.items import Product
import re
class ExampleCrawler(Spider):
name = "ExampleCrawler"
def __init__(self, inputs='', *args, **kwargs):
super(ExampleCrawler, self).__init__(*args, **kwargs)
self.start_urls = ['https://www.example-link.com']
self.inputs = inputs
def parse(self, response):
yield FormRequest(self.start_urls[0], callback=self.parse_list_elements, formdata=self.inputs)
除了启用我所做的选项之外,我找不到任何解决方案.
有关如何在蜘蛛内部获得断点的任何建议吗?
解决方法:
调试器不起作用,因为它不是子进程,而是外部调用.请参阅this answer以获取可能的解决方法.
标签:python,debugging,flask,scrapy,pycharm
来源: https://codeday.me/bug/20190710/1428907.html