用python做网页与html_用python 实现中文与html实体相互转换

最新推荐文章于 2024-03-10 10:17:22 发布

weixin_39948824

最新推荐文章于 2024-03-10 10:17:22 发布

阅读量619

点赞数

文章标签：用python做网页与html

一些网页会把中文转为html实体，做爬虫时就需要把html实体转换为中文，下面介绍使用python 对它们作相互转换。

html 实体

python & #20013;& #25991;& #21644;html & #23454;& #20307;& #30456;& #20114;& #36716;& #25442;

相互转换

把html 实体和中文互转：

import re

s = 'python 中文和html 实体相互转换'

# s = 'python & #20013;& #25991;& #21644;html & #23454;& #20307;& #30456;& #20114;& #36716;& #25442;'

print s

def convert_callback(matches):

char_id = matches.group(1)

try:

return unichr(int(char_id))

except:

return char_id

s2 = re.sub("(\d+)(;|(?=\s))", convert_callback, s)

print s2

# print s2.decode('utf-8').encode('ascii','xmlcharrefreplace')

print s2.encode('ascii','xmlcharrefreplace')

输出

python & #20013;& #25991;& #21644;html & #23454;& #20307;& #30456;& #20114;& #36716;& #25442;

python 中文和html 实体相互转换

python & #20013;& #25991;& #21644;html & 23454;& #20307;& #30456;& #20114;& #36716;& #25442;

优惠劵

关注关注