我有一个正在工作的scrapy项目,我现在想在其中添加一些自定义中间件。在
我在settings.py中启用了Spider中间件,方法是取消注释下面的三行。在# Enable or disable spider middlewares
# See http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html
SPIDER_MIDDLEWARES = {
'sweden.middlewares.SwedenSpiderMiddleware': 543,
}
尽管如此,我添加到middlewares.py的任何代码似乎都被忽略了。例如,我添加到下面最后一个方法的input()命令没有执行,即使我成功地刮取了一些页面。在
^{pr2}$
我没有修改默认的文件夹结构。我做不到这一点,而且似乎缺乏例子。。。在
它也不会显示在启动日志中:2017-08-21 16:59:41 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot)
2017-08-21 16:59:41 [scrapy.utils.log] INFO: Overridden settings: {'FEED_URI': 'result.jl', 'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'}
2017-08-21 16:59:41 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2017-08-21 16:59:41 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-08-21 16:59:41 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-08-21 16:59:41 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-08-21 16:59:41 [scrapy.core.engine] INFO: Spider opened
2017-08-21 16:59:41 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
以下是文件结构:.
├── venv
├── tutorial
└── sweden
├── __pycache__
├── scrapy.cfg
└── sweden
├── __init__.py
├── __pycache__
├── items.py
├── middlewares.py
├── pipelines.py
├── settings.py
└── spiders
├── __init__.py
├── __pycache__
└── sweden_spider.py