python中HTML Entities处理及print特殊字符

最新推荐文章于 2022-04-28 16:33:07 发布

chuqiang9438

最新推荐文章于 2022-04-28 16:33:07 发布

阅读量361

点赞数

文章标签： python java

原文链接：https://my.oschina.net/u/2411067/blog/868179

版权

python3中处理HTML Entities：

from html.parser import HTMLParser
str = HTMLParser.unescape('utf8’,’Orange Blossom Body Cr&#232;me/5.9 oz.’)
str = HTMLParser.unescape('&copy; 2010')

python2中：

import HTMLParser
html_cont = " asdfg>123<  &#62;"
html_parser = HTMLParser.HTMLParser()
new_cont = html_parser.unescape(html_cont)
print new_cont #new_cont = " asdfg>123<"

html特殊字符转移表：http://www.cnblogs.com/lf6112/p/4952001.html
参考文章：http://fredericiana.com/2010/10/08/decoding-html-entities-to-text-in-python/

Java中可用：

org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(String);

print特殊字符：


#方式1
import sys
sys.stdout = open(1, 'w', encoding='utf-8', closefd=False)
print("vadsэавфыаЭХÜÜÄ")

#方式2
print(bytes("аЭХÜ", "utf-8"))

#方式3
TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this NOT utf-8...it is a Unicode string in Python 3.X.
TestText2 = TestText.encode('utf8') # THIS is "just bytes" in UTF-8.
print(TestText2)

import sys
sys.stdout.buffer.write(TestText2)

#方式4
utf8stdout = open(1, 'w', encoding='utf-8', closefd=False) # fd 1 is stdout
print('Test - āĀēĒčČ..šŠūŪžŽ33', file=utf8stdout)

#方式5
print('Test - āĀēĒčČ..šŠūŪžŽ33'.encode('utf8'))
text='中文524μg/m³'.encode('gbk', 'ignore').decode('gbk') 
print(text)

转载于:https://my.oschina.net/u/2411067/blog/868179

chuqiang9438

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python中HTML Entities处理及print特殊字符

python3中处理HTML Entities： from html.parser import HTMLParserstr = HTMLParser.unescape('utf8’,’Orange Blossom Body Cr&#232;me/5.9 oz.’)str = H...
复制链接

扫一扫