我正在尝试解析html页面并保存在数据库中。使用页面的标记创建json。在
一些标记包括类似javascript的
这是正常的标签项目,没有问题。在
^{2}$
但是使用javascript标记时,我遇到了错误{'text': 'IK F uu ph---------------------', 'tag': , 'unqid': '.....'}
这是我的代码:ac = requests.get(url)
html_text = ac.text
lx = html.fromstring(html_text)
...some parsing codes
json.dumps(items).decode('utf-8') --> where I am getting error
错误如下Traceback (most recent call last):
File "main3.py", line 132, in
PageRunner(url)
File "main3.py", line 122, in PageRunner
InsertPageTags(1, url)
File "main3.py", line 58, in InsertPageTags
parameter = (WebsiteID, Url, json.dumps(items).decode('utf-8'))
File "C:\Python27\lib\json\__init__.py", line 244, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "C:\Python27\lib\json\encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: is not JSON serializable
如何转储带有注释的html或从html中删除注释?在