python3.6安装pyspider报错

最近在学python爬虫,在安装pyspider库的时候,爆了一堆的错,查了好久才解决了,我把这些错误都集中记录在这个帖里,给大家做个参考。

首先是安装,用pip安装,用如下命令:

pip install pyspider -i https://pypi.tuna.tsinghua.edu.cn/simple

博主的python没有永久更换源,所以在安装命令中临时用了更换源的命令。
使用了上述命令后,安装了好长时间,报错:

ERROR: flask 1.1.2 has requirement Jinja2>=2.10.1, but you'll have jinja2 2.10 which is incompatible.

看样子是jinja这个库版本不够,那行,那我来更新一下:

pip install Jinja2==2.10.2 -i https://pypi.tuna.tsinghua.edu.cn/simple

接下来报了一连串的错:

ERROR: pandas-profiling 2.5.4 requires astropy>=4.0, which is not installed.
ERROR: pandas-profiling 2.5.4 requires confuse>=1.0.0, which is not installed.
ERROR: pandas-profiling 2.5.4 requires htmlmin>=0.1.12, which is not installed.
ERROR: pandas-profiling 2.5.4 requires phik>=0.9.10, which is not installed.
ERROR: pandas-profiling 2.5.4 requires statsmodels>=0.11.1, which is not installed.
ERROR: pandas-profiling 2.5.4 requires tangled-up-in-unicode>=0.0.4, which is not installed.
ERROR: pandas-profiling 2.5.4 requires tqdm>=4.43.0, which is not installed.
ERROR: pandas-profiling 2.5.4 requires visions[type_image_path]>=0.4.1, which is not installed.
ERROR: pandas-profiling 2.5.4 has requirement ipywidgets>=7.5.1, but you'll have ipywidgets 7.4.0 which is incompatible.
ERROR: pandas-profiling 2.5.4 has requirement jinja2>=2.11.1, but you'll have jinja2 2.10.2 which is incompatible.
ERROR: pandas-profiling 2.5.4 has requirement requests>=2.23.0, but you'll have requests 2.19.1 which is incompatible.

妈耶,感觉能报的错都被我遇上了,但还好,貌似这些错误都只是库没安装,或者就是库的版本不够,那行,没安装的现在安装,版本不够的更新呗:
依次执行下述命令:

pip install astropy==4.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install confuse==1.0.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install htmlmin==0.1.12 -i https://pypi.tuna.tsinghua.edu.cn/simple

上述命令都成功了,但是在安装phik==0.9.10时报了错,如下:

Traceback (most recent call last):
  File "e:\python\lib\site-packages\pip\_vendor\urllib3\response.py", line 425, in _error_catcher
    yield
  File "e:\python\lib\site-packages\pip\_vendor\urllib3\response.py", line 507, in read
    data = self._fp.read(amt) if not fp_closed else b""
  File "e:\python\lib\site-packages\pip\_vendor\cachecontrol\filewrapper.py", line 62, in read
    data = self.__fp.read(amt)
  File "e:\python\lib\http\client.py", line 449, in read
    n = self.readinto(b)
  File "e:\python\lib\http\client.py", line 493, in readinto
    n = self.fp.readinto(b)
  File "e:\python\lib\socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "e:\python\lib\ssl.py", line 1009, in recv_into
    return self.read(nbytes, buffer)
  File "e:\python\lib\ssl.py", line 871, in read
    return self._sslobj.read(len, buffer)
  File "e:\python\lib\ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "e:\python\lib\site-packages\pip\_internal\cli\base_command.py", line 186, in _main
    status = self.run(options, args)
  File "e:\python\lib\site-packages\pip\_internal\commands\install.py", line 331, in run
    resolver.resolve(requirement_set)
  File "e:\python\lib\site-packages\pip\_internal\legacy_resolve.py", line 177, in resolve
    discovered_reqs.extend(self._resolve_one(requirement_set, req))
  File "e:\python\lib\site-packages\pip\_internal\legacy_resolve.py", line 333, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "e:\python\lib\site-packages\pip\_internal\legacy_resolve.py", line 282, in _get_abstract_dist_for
    abstract_dist = self.preparer.prepare_linked_requirement(req)
  File "e:\python\lib\site-packages\pip\_internal\operations\prepare.py", line 482, in prepare_linked_requirement
    hashes=hashes,
  File "e:\python\lib\site-packages\pip\_internal\operations\prepare.py", line 287, in unpack_url
    hashes=hashes,
  File "e:\python\lib\site-packages\pip\_internal\operations\prepare.py", line 159, in unpack_http_url
    link, downloader, temp_dir.path, hashes
  File "e:\python\lib\site-packages\pip\_internal\operations\prepare.py", line 303, in _download_http_url
    for chunk in download.chunks:
  File "e:\python\lib\site-packages\pip\_internal\utils\ui.py", line 160, in iter
    for x in it:
  File "e:\python\lib\site-packages\pip\_internal\network\utils.py", line 39, in response_chunks
    decode_content=False,
  File "e:\python\lib\site-packages\pip\_vendor\urllib3\response.py", line 564, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "e:\python\lib\site-packages\pip\_vendor\urllib3\response.py", line 529, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "e:\python\lib\contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "e:\python\lib\site-packages\pip\_vendor\urllib3\response.py", line 430, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='pypi.tuna.tsinghua.edu.cn', port=443): Read timed out.

哇,这个错真的好长啊,我又开始百度,找到了一篇博文,按照该博文的建议,我输入了以下命令:

pip --default-timeout=100 install phik==0.9.10 -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

执行成功,没有报错,此前还有几个库没有更新,接着更新:

pip install statsmodels==0.11.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install tangled-up-in-unicode==0.0.4 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install tqdm==4.43.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install visions[type_image_path]==0.4.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install ipywidgets==7.5.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install requests==2.23.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

全部执行成功,至此,需要更新的库就全部更新或安装完成了。
OK,执行pyspider,但是又报错了:

ValueError: Deprecated option ‘domaincontroller’: use ‘http_authenticator.domain_controller’ instead

又上网查了一下,网上建议将python\Lib\site-packages\pyspider\webui\webdav.py文件中的第209行的如下内容:

'domaincontroller': NeedAuthController(app),

修改为:

'http_authenticator':{
        'HTTPAuthenticator':NeedAuthController(app),
    },

ok,按照这个修改,修改完后再运行pyspider,这次没有报错,但是一直停留在以下界面:

PS E:\软件> pyspider all
e:\python\lib\site-packages\pyspider\libs\utils.py:196: FutureWarning: timeout is not supported on your platform.
  warnings.warn("timeout is not supported on your platform.", FutureWarning)
[W 200420 19:11:37 run:413] phantomjs not found, continue running without it.
[I 200420 19:11:40 result_worker:49] result_worker starting...

一直卡在这里。
我又上网查了一下,找到了解决方法,是因为wsgidav的版本过高了,安装个低版本的即可,鉴于参考的博客的网页关了,这里就不引用了,直接上方法:

pip uninstall wsgidav

pip install wsgidav==2.4.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

再次执行pyspider,又报错了:

e:\python\lib\site-packages\pyspider\libs\utils.py:196: FutureWarning: timeout is not supported on your platform.
  warnings.warn("timeout is not supported on your platform.", FutureWarning)
[W 200420 19:06:07 run:413] phantomjs not found, continue running without it.
[I 200420 19:06:09 result_worker:49] result_worker starting...
[I 200420 19:06:17 processor:211] processor starting...
[I 200420 19:06:18 scheduler:647] scheduler starting...
[I 200420 19:06:18 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 200420 19:06:23 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 200420 19:06:24 tornado_fetcher:638] fetcher starting...
[I 200420 19:06:30 tornado_fetcher:671] fetcher exiting...
[I 200420 19:06:30 app:84] webui exiting...
[I 200420 19:06:30 scheduler:663] scheduler exiting...
[I 200420 19:06:30 result_worker:66] result_worker exiting...
[I 200420 19:06:31 processor:229] processor exiting...
Traceback (most recent call last):
  File "e:\python\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "e:\python\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "E:\python\Scripts\pyspider.exe\__main__.py", line 7, in <module>
  File "e:\python\lib\site-packages\pyspider\run.py", line 754, in main
    cli()
  File "e:\python\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "e:\python\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "e:\python\lib\site-packages\click\core.py", line 1236, in invoke
    return Command.invoke(self, ctx)
  File "e:\python\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "e:\python\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "e:\python\lib\site-packages\click\decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "e:\python\lib\site-packages\pyspider\run.py", line 165, in cli
    ctx.invoke(all)
  File "e:\python\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "e:\python\lib\site-packages\click\decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "e:\python\lib\site-packages\pyspider\run.py", line 497, in all
    ctx.invoke(webui, **webui_config)
  File "e:\python\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "e:\python\lib\site-packages\click\decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "e:\python\lib\site-packages\pyspider\run.py", line 384, in webui
    app.run(host=host, port=port)
  File "e:\python\lib\site-packages\pyspider\webui\app.py", line 64, in run
    from werkzeug.wsgi import DispatcherMiddleware
ImportError: cannot import name 'DispatcherMiddleware'

一长串的错误,再次用下述方法解决:

pip uninstall werkzeug

pip install werkzeug==0.16.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

将此前运行pyspider的界面关掉,再次执行pyspider,终于成功了:

PS E:\软件> pyspider all
e:\python\lib\site-packages\pyspider\libs\utils.py:196: FutureWarning: timeout is not supported on your platform.
  warnings.warn("timeout is not supported on your platform.", FutureWarning)
[W 200420 19:11:37 run:413] phantomjs not found, continue running without it.
[I 200420 19:11:40 result_worker:49] result_worker starting...
[I 200420 19:11:40 processor:211] processor starting...
[I 200420 19:11:40 scheduler:647] scheduler starting...
[I 200420 19:11:40 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 200420 19:11:43 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 200420 19:11:45 tornado_fetcher:638] fetcher starting...
[I 200420 19:11:50 app:76] webui running on 0.0.0.0:5000
[I 200420 19:12:40 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 200420 19:31:58 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 200420 19:32:58 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 200420 19:33:58 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 200420 19:34:59 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0

不容易啊,希望我的经历能给大家一个参考。
另外,在查找方法的过程中,还看到几个问题的解决方法,但是我没遇到就没试,大家可以做个参考


唉,昨天弄好后明明可以正常启动的,今早起床美滋滋的准备启动的时候,发现又不行了,又卡在result_worker starting…这一步了,有上网查了查,说是需要下载phantomjs,解压后找到bin文件夹里的zd phantomjs.exe文件,拷贝到安装的python.exe同级目录,OK,那我照做吧,完成后还是不行,又继续找答案,又发现说是需要把phantomjs的目录加到系统变量的path中,当卡在result_worker starting…不动的时候,等待一段时间关闭cmd,重新执行命令pyspider all即可,OK,此前我是把phantomjs解压到E盘里的,于是我将phantomjs解压后的bin目录添加到了环境变量里,再次执行pyspider:
在这里插入图片描述
终于没问题了,希望明天不会再卡主了=.=

  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值