python的items不是迭代器_“list”对象在Python的beauthulsoup renderContents中没有属性“items”...

为了从输入中删除不需要的/不安全的标记和属性,我使用了以下代码(几乎完全是通过http://djangosnippets.org/snippets/1655/):def html_filter(value, allowed_tags = 'p h1 h2 h3 div span a:href:title img:src:alt:title table:cellspacing:cellpadding th tr td:colspan:rowspan ol ul li br'):

js_regex = re.compile(r'[\s]*(.{1,7})?'.join(list('javascript')))

allowed_tags = [tag.split(':') for tag in allowed_tags.split()]

allowed_tags = dict((tag[0], tag[1:]) for tag in allowed_tags)

soup = BeautifulSoup(value)

for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):

comment.extract()

for tag in soup.findAll(True):

if tag.name not in allowed_tags:

tag.hidden = True

else:

tag.attrs = [(attr, js_regex.sub('', val)) for attr, val in tag.attrs.items() if attr in allowed_tags[tag.name]]

return soup.renderContents().decode('utf8')

它适用于不需要的或白名单上的标签,没有白名单的属性,甚至是格式不好的html。但是,如果列出了任何白色属性,则会引发

^{pr2}$

最后一行,对我没什么帮助。type(soup)是{}它是否引发错误,所以我不知道它指的是什么。在Traceback:

[...]

File "C:\Users\Mark\Web\www\fnwidjango\src\base\functions\html_filter.py" in html_filter

30. return soup.renderContents().decode('utf8')

File "C:\Python27\lib\site-packages\bs4\element.py" in renderContents

1098. indent_level=indentLevel, encoding=encoding)

File "C:\Python27\lib\site-packages\bs4\element.py" in encode_contents

1089. contents = self.decode_contents(indent_level, encoding, formatter)

File "C:\Python27\lib\site-packages\bs4\element.py" in decode_contents

1074. formatter))

File "C:\Python27\lib\site-packages\bs4\element.py" in decode

1021. indent_contents, eventual_encoding, formatter)

File "C:\Python27\lib\site-packages\bs4\element.py" in decode_contents

1074. formatter))

File "C:\Python27\lib\site-packages\bs4\element.py" in decode

1021. indent_contents, eventual_encoding, formatter)

File "C:\Python27\lib\site-packages\bs4\element.py" in decode_contents

1074. formatter))

File "C:\Python27\lib\site-packages\bs4\element.py" in decode

1021. indent_contents, eventual_encoding, formatter)

File "C:\Python27\lib\site-packages\bs4\element.py" in decode_contents

1074. formatter))

File "C:\Python27\lib\site-packages\bs4\element.py" in decode

983. for key, val in sorted(self.attrs.items()):

Exception Type: AttributeError at /"nieuws"/article/3-test/

Exception Value: 'list' object has no attribute 'items'

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值