在用scrapy爬虫的时候,有时直接去爬是没有编码报错的,或者直接报编码出错,类似的是:UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xaf in position 235: illegal multibyte sequence
因为当前爬取的网页是gbk格式的,而Python里面是utf-8格式的
这个时候可能会有下面类似的提示:
Traceback (most recent call last):
File "D:\python\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "D:\python\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\www\Scripts\scrapy.exe\__main__.py", line 9, in <module>
File "d:\www\lib\site-packages\scrapy\cmdline.py", line 114, in execute
settings = get_project_settings()
File "d:\www\lib\site-packages\scrapy\utils\project.py", line 63, in get_project_settings
init_env(project)
File "d:\www\lib\site-packages\scrapy\utils\conf.py", line 87, in init_env
cfg = get_config()
File "d:\www\lib\site-packages\scrapy\utils\conf.py", line 101, in get_config
cfg.read(sources)
File "D:\python\lib\configparser.py", line 696, in read
self._read(fp, filename)
File "D:\python\lib\configparser.py", line 1014, in _read
for lineno, line in enumerate(fp, start=1):
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 235: illegal multibyte sequence
根据提示:
self._read(fp, filename) #我判断这里可能有转码方面的问题
for filename in filenames:
try:
with open(filename, encoding=encoding) as fp: # gbk编码搞事情,源码是encoding=encoding,我改成了"utf-8",完美解决
self._read(fp, filename)
except OSError:
continue
注意:在你修改源码的时候,Python会有有提示,让选三种选择,直接确定就可以修改源码了