python 字节编码_Python检测字符串字节编码

最新推荐文章于 2022-05-31 22:28:14 发布

weixin_39836536

最新推荐文章于 2022-05-31 22:28:14 发布

阅读量100

点赞数

文章标签： python 字节编码

I've got about 1000 filenames read by os.listdir()

some of them are encoded 'utf-8' and some are 'cp1252'.

I want to decode all of them to unicode for further processing in my script. Is there a way to get the source encoding to correctly decode into unicode?

Example:

for item in os.listdir(rootPath):

#Convert to Unicode

if isinstance(item, str):

item = item.decode('cp1252') # or item = item.decode('utf-8')

print item

解决方案

if your files either in cp1252 and utf-8, then there is an easy way.

import logging

def force_decode(string, codecs=['utf8', 'cp1252']):

for i in codecs:

try:

return string.decode(i)

except UnicodeDecodeError:

pass

logging.warn("cannot decode url %s" % ([string]))

for item in os.listdir(rootPath):

#Convert to Unicode

if isinstance(item, str):

item = force_decode(item)

print item

otherwise, there is a charset detect lib.

weixin_39836536

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 字节编码_Python检测字符串字节编码

I've got about 1000 filenames read by os.listdir()some of them are encoded 'utf-8' and some are 'cp1252'.I want to decode all of them to unicode for further processing in my script. Is there a way to ...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。