最近在自学python爬虫
在安装完scrapy及其库完毕后:
- 新建项目
scrapy startproject ArticleSpider
- 这时候它会提示你可以新建一个spider模板
scrapy genspider jobbole blog.jobbole.com
建立一个scrapy+genspider+域名+需要爬取的网站地址
这里我是选择比较简单的爬虫网站进行实验(伯乐网)
- 然后开始运行爬虫
scrapy crawl jobbole
然后出现以下问题
Traceback (most recent call last):
File "F:\python3.7\Lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "F:\python3.7\Lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "F:\Evns\article_spider\Scripts\scrapy.exe\__main__.py", line 9, in <module>
File "f:\evns\article_spider\lib\site-packages\scrapy\cmdline.py", line 150, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "f:\evns\article_spider\lib\site-packages\scrapy\cmdline.py", line 90, in _run_print_help
func(*a, **kw)
File "f:\evns\article_spider\lib\site-packages\scrapy\cmdline.py", line 157, in _run_command
cmd.run(args, opts)
File "f:\evns\article_spider\lib\site-packages\scrapy\commands\crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "f:\evns\article_spider\lib\site-packages\scrapy\crawler.py", line 170, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "f:\evns\article_spider\lib\site-packages\scrapy\crawler.py", line 198, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "f:\evns\article_spider\lib\site-packages\scrapy\crawler.py", line 203, in _create_crawler
return Crawler(spidercls, self.settings)
File "f:\evns\article_spider\lib\site-packages\scrapy\crawler.py", line 55, in __init__
self.extensions = ExtensionManager.from_crawler(self)
File "f:\evns\article_spider\lib\site-packages\scrapy\middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "f:\evns\article_spider\lib\site-packages\scrapy\middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "f:\evns\article_spider\lib\site-packages\scrapy\utils\misc.py", line 44, in load_object
mod = import_module(module)
File "f:\evns\article_spider\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "f:\evns\article_spider\lib\site-packages\scrapy\extensions\telnet.py", line 12, in <module>
from twisted.conch import manhole, telnet
File "f:\evns\article_spider\lib\site-packages\twisted\conch\manhole.py", line 154
def write(self, data, async=False):
^
SyntaxError: invalid syntax
SyntaxError: invalid syntax代表语法错误,但显然还没开始涉及到py文件的语法问题。
最后的解决方法
找到运行文件的目录下(有虚拟环境的则是虚拟环境下的目录)找到
\Lib\site-packages\twisted\conch\manhole.py
把带有async关键字的全部改成别的名字(比如async1)
大概有5个左右。保存关闭,重新运行就ok了
INFO: Spider closed (finished)