[python][原创]对&#编码转成中文的3种方法

FL1623863129

已于 2023-06-14 16:01:21 修改

阅读量1k

点赞数

文章标签： python 数学建模开发语言

于 2023-06-14 14:25:05 首次发布

本文链接：https://blog.csdn.net/FL1623863129/article/details/131207133

版权

第一种：html模块，强烈推荐

import html
print(html.unescape('&#20013;&#22269;'))

第二种：自定义函数

#第二种方法
def convert_unicode(text):
    text = text.replace('&#', '')
    text = [i for i in text.split(';') if i]
    text = [hex(int(i)) for i in text]
    text = [i.replace('0x', '') for i in text]
    string = ''
    flag = '\\u'
    for i in text:
        string += flag + format(i, '0>4s')
    return string.encode('utf-8').decode('unicode-escape')


a = '&#32541;&#22836;&#21360;'
print(convert_unicode(a))

第三种：使用HTMParse模块

#第三种方法
import HTMLParser
s = '&#x3010;&#x8BD5;&#x547C;&#x3011;'
h = HTMLParser.HTMLParser()
print(h.unescape(s))

但是第三种方法不是pip install HTMLParser就完了运行会报错，可以参考博客

python3 解决&#开头的Unicode编码的字符串问题的通用方法_&#9776 python_zhaojiafu666的博客-CSDN博客

由于麻烦点因此不推荐第三种方法。

最后还有人需要把中文转成&#格式这里我写个一个函数，大家可以试试

a = '&#32541;&#22836;&#21360;'
b = '缝头印'


def chinese2html(chars):
    all = chars.encode('unicode-escape').decode()
    lines = all.split('\\u')[1:]
    result = ''
    for line in lines:
        result += "&#" + str(int(line, 16)) + ';'
    return result


res = chinese2html(b)
print(res)

FL1623863129

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
打赏
0
评论
[python][原创]对&#编码转成中文的3种方法

但是第三种方法不是pip install HTMLParser就完了运行会报错，可以参考博客。第三种：使用HTMParse模块。由于麻烦点因此不推荐第三种方法。第一种：html模块，强烈推荐。
复制链接

扫一扫