python error:‘gbk‘ codec can‘t encode character ‘\xa0‘ in position 389: illegal multibyte sequence

最新推荐文章于 2023-02-02 10:11:59 发布

creepyzzz

最新推荐文章于 2023-02-02 10:11:59 发布

阅读量109

点赞数

分类专栏： Python 程序文章标签： python unicode

原文链接：https://stackoverflow.com/questions/10993612/how-to-remove-xa0-from-string-in-python#:~:text=%5B%26xa0%26%5D%20is%20actually%20non-breaking%20space%20in%20Latin1%20%28ISO,could%20be%20represented%20by%201%20to%204%20bytes.

版权

程序同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

Python

1 篇文章 0 订阅

订阅专栏

unicodedata.normalize("NFKD", unicode_str)



import unicodedata
text_string = BeautifulSoup(raw_html, "lxml").text
clean_text = unicodedata.normalize("NFKD",text_string)
print clean_text

REF:https://stackoverflow.com/questions/10993612/how-to-remove-xa0-from-string-in-python#:~:text=%5B%26xa0%26%5D%20is%20actually%20non-breaking%20space%20in%20Latin1%20%28ISO,could%20be%20represented%20by%201%20to%204%20bytes.

优惠劵

creepyzzz

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python error:‘gbk‘ codec can‘t encode character ‘\xa0‘ in position 389: illegal multibyte sequence

unicodedata.normalize("NFKD", unicode_str)import unicodedatatext_string = BeautifulSoup(raw_html, "lxml").textclean_text = unicodedata.normalize("NFKD",text_string)print clean_textREF:https://stackoverflow.com/questions/10993612/how-to-remove-xa0.
复制链接

扫一扫