unicodedata.normalize("NFKD", unicode_str)
import unicodedata
text_string = BeautifulSoup(raw_html, "lxml").text
clean_text = unicodedata.normalize("NFKD",text_string)
print clean_text
REF:https://stackoverflow.com/questions/10993612/how-to-remove-xa0-from-string-in-python#:~:text=%5B%26xa0%26%5D%20is%20actually%20non-breaking%20space%20in%20Latin1%20%28ISO,could%20be%20represented%20by%201%20to%204%20bytes.
python error:‘gbk‘ codec can‘t encode character ‘\xa0‘ in position 389: illegal multibyte sequence
最新推荐文章于 2023-02-02 10:11:59 发布