它抓取法语页面,所以我必须管理UTF-8编码以避免错误。我在剧本开头用了以下几行:#!/usr/bin/python
# -*- coding: utf-8 -*-
我也会像这样对擦掉的字符串进行编码:
^{pr2}$
我的第一个脚本在Python2.7上运行得非常好,但是我为Python3重写了它(特别是为了使用urllib.请求)UTF-8不能再工作了。在
我在抓取前几个元素后发现了这些错误:File "scraper_monu_historiques_ge_py3.py", line 19, in
url = urllib.request.urlopen(url_ville).read() # et on ouvre chacune d'entre elles
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 455, in open
response = self._open(req, data)
File "/usr/lib/python3.4/urllib/request.py", line 473, in _open
'_open', req)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 1217, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/usr/lib/python3.4/urllib/request.py", line 1174, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "/usr/lib/python3.4/http/client.py", line 1090, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.4/http/client.py", line 1118, in _send_request
self.putrequest(method, url, **skips)
File "/usr/lib/python3.4/http/client.py", line 975, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 58: ordinal not in range(128)
我不明白为什么,因为它在Python2.7中运行得很好。。。我发表了a version of this WIP on Github。在