python白名单验证-Python中XSS白名单过滤的实现

weixin_37988176

于 2020-11-01 12:58:37 发布

阅读量558

点赞数

在Web开发中很多地方需要用户输入富文本但又要确保输入的这些内容绝对安全不会引发XSS漏洞，那么最常用的技术就是白名单技术。

白名单的通常做法都是构建一个允许使用的标签及对应属性的列表，然后对用户输入的HTML文本进行解析，解析出的tag及属性去白名单中进行查找，如果对应上了，那么就保留下来，没有对应上就进行移除。白名单的结构都是这样一个层次: 允许的tag->允许的属性->允许的属性值。拿img标签来举例，我们允许在博客中插入图片标签，那么img标签就被允许，img标签有很多的属性，比如src、alt、onerror、onload等等，其中src是必须的,alt属性无法执行任何javascript代码，但onerror和onload是可以执行javascript代码，因此只保留src和alt属性，src属性允许很多类型的值，比如以http开头的链接，以javascript:开头的bookmarklet还有以data:开头的data url, javascript:和data:在某些类型的浏览器里面都有引发XSS的风险，因此不能允许这两种类型的值，只允许http://开头的图片链接，这就需要继续对属性值进行匹配。每个标签完成这三个层次的匹配才能说是安全的，我们在设计白名单的时候这三个层次任何一个层次也都不能漏掉。

代码实现如下:

regex_cache = {}

def search(text, regex):

regexcmp = regex_cache.get(regex)

if not regexcmp:

regexcmp = re.compile(regex)

regex_cache[regex] = regexcmp

return regexcmp.search(text)

# XSS白名单

VALID_TAGS = {'h1':{}, 'h2':{}, 'h3':{}, 'h4':{}, 'strong':{}, 'em':{},

'p':{}, 'ul':{}, 'li':{}, 'br':{}, 'a':{'href':'^http://', 'title':'.*'},

'img':{'src':'^http://', 'alt':'.*'}}

def parsehtml(html):

soup = BeautifulSoup(html)

for tag in soup.findAll(True):

if tag.name not in VALID_TAGS:

tag.hidden = True

else:

attr_rules = VALID_TAGS[tag.name]

for attr_name, attr_value in tag.attrs:

#检查属性类型

if attr_name not in attr_rules:

del tag[attr_name]

continue

#检查属性值格式

if not search(attr_value, attr_rules[attr_name]):

del tag[attr_name]

return soup.renderContents()

下面拿一段html来做测试:

if __name__ == '__main__':

text = '''

Hello!

我是一副正常的图片

Hello

alert(1);' title='sddasdsadsd'/>

'''

print parsehtml(text) 过滤的结果:

Hello!

"我是一副正常的图片

Hello

alert(1);' title='sddasdsadsd'/>

效果还不错～

weixin_37988176

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。