这几天我一直在研究利用pyspider框架爬取新闻网站。
但是由于诸多原因,这一工作进展缓慢。
接下来我阐述的问题困惑了我很长时间。
先给出问题代码:
raw_time="YYYY年mm月dd日 HH:MM"
FORMAT='%Y年%m月%d日 %H:%M'
raw_time=datetime.datetime.strptime(raw_time,FORMAT)
...
return {
"content":"<p>"+content+"</p",
"title":title,
"url": response.url,
"time":raw_time,//返回datetime对象
"source":source,
"title": title
}
但是在运行时候,会遇到如下错误信息:
[E 170517 15:01:59 result_worker:63] Object of type 'datetime' is not JSON serializable
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/pyspider/result/result_worker.py", line 54, in run
self.on_result(task, result)
File "/usr/local/lib/python3.6/site-packages/pyspider/result/result_worker.py", line 38, in on_result
result=result
File "/usr/local/lib/python3.6/site-packages/pyspider/database/sqlite/resultdb.py", line 58, in save
return self._replace(tablename, **self._stringify(obj))
File "/usr/local/lib/python3.6/site-packages/pyspider/database/sqlite/resultdb.py", line 44, in _stringify
data['result'] = json.dumps(data['result'])
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 180, in default
o.__class__.__name__)
TypeError: Object of type 'datetime' is not JSON serializable
实际上,类似代码在我自己环境中是可行的,显然问题不在于代码本身。
可以看到第一行的错误信息时在:
File "/usr/local/lib/python3.6/site-packages/pyspider/result/result_worker.py", line 54, in run
self.on_result(task, result)
这个result_worker.py文件是框架内的py文件。我们可以猜测:此框架不支持返回非字符串类型对象
尝试转化datetime类型对象为字符串
raw_time=raw_time.strftime('%Y-%m-%d %H:%M')
再次运行,无错误信息。