【pyinstaller】打包scrapy的处理过程记录

本文记录了使用pyinstaller打包Scrapy项目时遇到的问题,包括自定义中间件和pipeline失效的情况。通过分析源码并修改scrapy/util/misc.py,成功解决了因环境变量导致的中间件和pipeline导入失败的问题。
摘要由CSDN通过智能技术生成

打包方法参见:https://zhuanlan.zhihu.com/p/41875047

以下为个人遇到的问题:

(1)模块导入部分,还需要导入自定义的的scrapy脚本,比如自定义的中间件和pipeline等:

import robotparser
 
import scrapy.spiderloader
import scrapy.statscollectors
import scrapy.logformatter
import scrapy.dupefilters
import scrapy.squeues
 
import scrapy.extensions.spiderstate
import scrapy.extensions.corestats
import scrapy.extensions.telnet
import scrapy.extensions.logstats
import scrapy.extensions.memusage
import scrapy.extensions.memdebug
import scrapy.extensions.feedexport
import scrapy.extensions.closespider
import scrapy.extensions.debug
import scrapy.extensions.httpcache
import scrapy.extensions.statsmailer
import scrapy.extensions.throttle
 
import scrapy.core.scheduler
import scrapy.core.engine
import scrapy.core.scraper
import scrapy.core.spidermw
import scrapy.core.downloader
 
import scrapy.downloadermiddlewares.stats
import scrapy.downloadermiddlewares.httpcache
import scrapy.downloadermiddlewares.cookies
import scrapy.downloadermiddlewares.useragent
import scrapy.downloadermiddlewares.httpproxy
import scrapy.downloadermiddlewares.ajaxcrawl
import scrapy.downloadermiddlewares.chunked
import scrapy.downloadermiddlewares.decompression
import scrapy.downloadermiddlewares.defaultheaders
import scrapy.downloadermiddlewares.downloadtimeout
import scrapy.downloadermiddlewares.httpauth
import scrapy.downloadermiddlewares.httpcompression
import scrapy.downloadermiddlewares.redirect
import scrapy.downloadermiddlewares.retry
import scrapy.downloadermiddlewares.robotstxt
 
import scrapy.spidermiddlewares.depth
import scrapy.spidermiddlewares.httperror
import scrapy.spidermiddlewares.offsite
import scrapy.spidermiddlewares.referer
import scrapy.spidermiddlewares.urllength
 
import scrapy.pipelines
 
import scrapy.core.downloader.handlers.http
import scrapy.core.downloader.contextfactory
 
import scrapy.pipelines.images  # 用到图片管道
import openpyxl  # 用到openpyxl库


#自定义中间件和pipelines
import spiders.myscrapy.middlewares.agentmiddleware
import spiders.myscrapy.pipelines
import spiders.myscrapy.settings
import spiders.myscrapy.items

(2)当调用爬虫的脚本不在scrapy目录下时,发生了的pipeline和中间件失效的问题,查看日志,发现setting中配置的中间件和pipelines都没有被识别,获取到的list为空:

于是尝试过修改settings中的配置路径如下:

ITEM_PIPELINES={
‘scrapys.spider1.wallpaper.pipelines.myPipeline’:300
}

修改后,还是不生效,日志中enable pipelines依旧为空

查看源码后,发现导入失败的中间件和pipelines会被丢弃,于是考虑应该是PATH环境变量问题,但项目打包之后,不存在目录层级关系了,无法通过导入sys.path的方式解决。只好追溯报错堆栈,修改源码scrapy/util/misc.py如下:

def walk_modules(path):
	"""Loads a module and all its submodules from the given module path and
	returns them. If *any* module throws an exception while importing, that
	exception is thrown back.
	For example: walk_modules('scrapy.utils')
	"""

	mods = []
	try:
		mod = import_module(path)
	except:
		mod = import_module('spiders.' + path) #添加自己的当前目录
	mods.append(mod)
	if hasattr(mod, '__path__'):
		for _, subpath, ispkg in iter_modules(mod.__path__):
			fullpath = path + '.' + subpath
			if ispkg:
				mods += walk_modules(fullpath)
			else:
				submod = import_module(fullpath)
				mods.append(submod)
	return mods

问题解决。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值