python 文件关于读写编码的几个错误

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u010420283/article/details/84874455

1. UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 55: illegal multibyte sequence

Traceback (most recent call last):
  File "E:/ice_experiment_lmh/github_code/ChineseNER-master/data/renMinRiBao/data_renmin_word.py", line 151, in <module>
    
  File "E:/ice_experiment_lmh/github_code/ChineseNER-master/data/renMinRiBao/data_renmin_word.py", line 11, in originHandle
    with open('./renmin.txt','r') as inp,open('./renmin2.txt','w') as outp:
UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 55: illegal multibyte sequence

解决方案:添加 encoding='utf-8'

with open('./renmin.txt','r',encoding='utf-8') as inp,open('./renmin2.txt','w',) as outp:

2. AttributeError: 'str' object has no attribute 'decode'

Traceback (most recent call last):
  File "E:/ice_experiment_lmh/github_code/ChineseNER-master/data/renMinRiBao/data_renmin_word.py", line 154, in <module>
    sentence2split()
  File "E:/ice_experiment_lmh/github_code/ChineseNER-master/data/renMinRiBao/data_renmin_word.py", line 61, in sentence2split
    sentences = re.split('[,。!?、‘’“”:]/[O]'.decode('utf-8'), texts)
AttributeError: 'str' object has no attribute 'decode'

解决方案:可能是版本问题,将 'decode' 变成 'encode'即可。

sentences = re.split('[,。!?、‘’“”:]/[O]'.encode('utf-8'), texts)

3. TypeError: can't concat bytes to str

解决方案:这是因为encode返回的是bytes型的数据,不可以和str相加,将‘\n’前加b,write函数参数需要为str类型,转化为str即可

outp.write(sentence.strip()+b'\n')

4. Can't convert 'bytes' object to str implicitly

有些特殊的字符不能转换,可以选择忽略它们。解决方案:添加 ‘ignore’

if sentence.decode('utf-8','ignore') != " ":
    outp.write(sentence.decode('utf-8','ignore').strip()+ '\n')

5. ImportError: No module named 'compiler.ast'

“from compiler.ast import flatten” 这条语句在python3 以后就废除了,如果使用的话就会报错。解决方案,根据这个方法的作用,自己写一个替代方法。

import collections
def flatten(x):
    result = []
    for el in x:
        if isinstance(x, collections.Iterable) and not isinstance(el, str):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result
 
print(flatten(["junk",["nested stuff"],[],[[]]]))  

 

没有更多推荐了,返回首页