使用python处理文本时,碰到各种稀奇古怪的编码格式,索性统统给转成UTF-8
定义了一个conv2utf8函数,转换成功返回True,未知错误返回UnicodeError
def conv2utf8(file):
ENCODE='utf-8*','utf-16*','utf-32*','gbk','big5','big5hkscs','cp950','gb2312','hz'
try:
```
try open file with UTF-8,if no error return True
```
with open(file,encoding='utf-8') as f:
a=f.read()
return True
except:
```
Function:
file is not encode by UTF-8,try open with code in ENCODE
ENCODE:
a list of common unicode
```
for i in ENCODE:
try:
with open(file,encoding=i) as f:
w=f.read()
with open(file,'w',encoding='utf-8') as nf:
nf.write(w)
return True
except UnicodeError:
print('unkonwing unicode type')