pyspider安装坑:
cmd上安装时出现的错误:
-
AttributeError: module 'fractions' has no attribute 'gcd'
解决方案:在 pyspider\libs\base_handler 文件中上方加入 import math
文件下面将fractions.gcd()改为math.gcd()就可以了
参考:https://blog.csdn.net/Hunter_Bug/article/details/111136047 -
import pycurl # type: ignore
ImportError: pycurl: libcurl link-time ssl backends (schannel) do not include compile-time ssl backend (openssl)
解决方案:pip uninstall pycurl
set PYCURL_SSL_LIBRARY=schannel
pip install pycurl
如果还是不行的话,就卸载pycurl44.1,安装pycurl43.0就能解决了,应该是版本太高不符合条件。
我的电脑只有安装python3.6才能和pycurl43.0匹配,所以我逐个把python3.9,python3.8,python3.7都是了个遍都不能成功,最后使用python3.6就能行了。
检查该pycurl whl文件是否和你的电脑匹配,在cmd输入(8)的代码。
下载pycurl whl网站:https://www.lfd.uci.edu/~gohlke/pythonlibs/#pycurl -
pkg_resources.DistributionNotFound: The 'wsgidav>=2.0.0' distribution was not found and is required by pyspider
解决方案:pip uninstall wsgidav
python -m pip install wsgidav==2.4.1 -
from werkzeug.wsgi import DispatcherMiddleware
ImportError: cannot import name 'DispatcherMiddleware'
解决方案:python -m pip uninstall werkzeug
python -m pip install werkzeug==0.16.1 -
卡在result_worker starting…不动
解决方案:控制面板(control panel)-防火墙把phantomjs.exe加进允许
参考链接:链接
首次执行pyspider all可能会卡在result_worker starting ...,用Ctrl + C 中断后重新打开cmd输入pyspider all就成功了。 -
(async=True, get_object=False, no_input=False):
解决方案:修改文件(async)
lib\site-packages\pyspider\run.py(四处):231行、245行(两个)、365行
lib\site-packages\pyspider\fetcher\tornado_fetcher.py(五处):81行、89行(两个)、95行、117行
lib\site-packages\pyspider\webui\app.py(一处):95行 -
Deprecated option 'domaincontroller': use 'http_authenticator.domain_controller' instead.
解决方案:修改文件(domaincontroller)
lib\site-packages\pyspider\webui\webdav.py(一处)
将'domaincontroller': NeedAuthController(app),修改成
'http_authenticator':{'HTTPAuthenticator':NeedAuthController(app)}, -
检查该pycurl whl文件是否和你的电脑匹配,[5]在cmd输入,[1][2][3][4]在python脚本执行,选择其一即可。(若未安装wheel:pip install wheel)
[1] 32bit: import pip; print(pip.pep425tags.get_supported())
[2] 64bit: import pip._internal; print(pip._internal.pep425tags.get_supported())
[3] 32bit: import wheel.pep425tags as w; print(w.get_supported("win_win32"))
[4] 64bit: import wheel.pep425tags as w; print(w.get_supported("win_amd64"))
[5] python -m pip debug --verbose -
flask 2.0.0 requires Werkzeug>=2.0, but you have werkzeug 0.16.1 which is incompatible.
解决方案: python -m pip uninstall flask
python -m pip install flask==0.11
pyspider运行坑:
- Exception: HTTP 599: SSL certificate problem: unable to get local issuer certificate
错误原因:这个错误会发生在请求 https 开头的网址,SSL 验证错误,证书有误。
解决方案:使用self.crawl(url, callback=self.index_page, validate_cert=False) - modulenotfounderror: no module named 'bs4'
解决方案:pip install bs4 - TypeError: __init__() takes 1 positional argument but 5 positional arguments (and 1 keyword-only argument) were given
解决方案:这是pymysql公认的一个bug,修改方法如下:
import pymysql
connection = pymysql.connect(host='localhost',
user='user',
password='passwd',
database='db',
charset='utf8mb4',
cursorclass=pymysql.cursors.DictCursor) - pymysql.err.OperationalError: (2003,"Can't connect to MySQL server on 'localhost' ([WinError 10061] No connection could be made because the target machine actively refused it)")
解决方案:mysql未启动: Computer Management --> Services and Applications --> Services --> 启动 Status --> Running
找不到的话需要先下载Mysql - Exception: HTTP 599: Failed to connect to search.51job.com port 80: Timed out
解决方案:cmd 运行 setx HTTP_PROXY "",然后重新reboot电脑 - Exception: HTTP 599: Failed to connect to 127.0.0.1 port 25555: Connection refused
解决方案:这个错误十有八九是 phantomjs 挂了,重新启动一下phantomjs即可解决