1. UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 55: illegal multibyte sequence
Traceback (most recent call last):
File "E:/ice_experiment_lmh/github_code/ChineseNER-master/data/renMinRiBao/data_renmin_word.py", line 151, in <module>
File "E:/ice_experiment_lmh/github_code/ChineseNER-master/data/renMinRiBao/data_renmin_word.py", line 11, in originHandle
with open('./renmin.txt','r') as inp,open('./renmin2.txt','w') as outp:
UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 55: illegal multibyte sequence
解决方案:添加 encoding='utf-8'
with open('./renmin.txt','r',encoding='utf-8') as inp,open('./renmin2.txt','w',) as outp:
2. AttributeError: 'str' object has no attribute 'decode'
Traceback (most recent call last):
File "E:/ice_experiment_lmh/github_code/ChineseNER-master/data/renMinRiBao/data_renmin_word.py", line 154, in <module>
sentence2split()
File "E:/ice_experiment_lmh/github_code/ChineseNER-master/data/renMinRiBao/data_renmin_word.py", line 61, in sentence2split
sentences = re.split('[,。!?、‘’“”:]/[O]'.decode('utf-8'), texts)
AttributeError: 'str' object has no attribute 'decode'
解决方案:可能是版本问题,将 'decode' 变成 'encode'即可。
sentences = re.split('[,。!?、‘’“”:]/[O]'.encode('utf-8'), texts)
3. TypeError: can't concat bytes to str
解决方案:这是因为encode返回的是bytes型的数据,不可以和str相加,将‘\n’前加b,write函数参数需要为str类型,转化为str即可
outp.write(sentence.strip()+b'\n')
4. Can't convert 'bytes' object to str implicitly
有些特殊的字符不能转换,可以选择忽略它们。解决方案:添加 ‘ignore’
if sentence.decode('utf-8','ignore') != " ":
outp.write(sentence.decode('utf-8','ignore').strip()+ '\n')
5. ImportError: No module named 'compiler.ast'
“from compiler.ast import flatten” 这条语句在python3 以后就废除了,如果使用的话就会报错。解决方案,根据这个方法的作用,自己写一个替代方法。
import collections
def flatten(x):
result = []
for el in x:
if isinstance(x, collections.Iterable) and not isinstance(el, str):
result.extend(flatten(el))
else:
result.append(el)
return result
print(flatten(["junk",["nested stuff"],[],[[]]]))