@TOP
1. “unicodeDecodeError: ‘ascii’ codec can’t decode byte 0xe9 in position 146: ordinal not in range(128)”
解决方法: 添加 encoding=“utf-8”
#coding=utf-8
import os
import codecs
import sys
#sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
input_file = '/cephfs/group/teg-pot-ai-algorithm/katezhou/gen_sim_query/bert-master/'
data_dir = input_file
file_path = os.path.join(data_dir, 'train.txt')
with open(file_path, 'r', encoding='utf-8') as f:
reader = f.readlines()
examples = []
for index, line in enumerate(reader):
split_line = line.strip().split(',')
examples.append(split_line)
print('*************************examples',len(examples))
2. “json.decoder.JSONDecodeError: Invalid control character at: line 1 column 11 (char 10)”
s= '{"input":"\x08静公子", "weight":"712"}'
line = json.loads(s, strict=False)
. 删除 nan:
删除指定列任意一个包含nan,则删除该行
df_author.dropna(axis=0, how='any', thresh=None, subset=['sugg_word_pro', 'sugg_word'], inplace=True)
=========================================
3. np.array写入txt时,将array先转为list再写入txt ,否则txt文件读入时会出现一些\n字符,需要特殊处理。
4. “ModuleNotFoundError: No module named ‘pytest’”
原因: 选择了pytest运行方式,需要改为普通运行方式。
如下圈选位置进行修改,先删除pytest,然后进行ed
将当前需要运行的脚本添加到这里并进行配置。
注:这一步也可以通过点击脚本右键,选择edite,一样可以处理。
================================
5. Cannot uninstall ‘certifc’. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
解决办法:找到 certific.egg-info,然后删除。
找到的方法是:可以通过pip show 包名 的方式查找到包所在的路径,包对应的egg-info文件也在那里。
或者:
sudo pip install --ignore-installed +模块名,强制升级一下就可以解决
6.ERROR: Cannot uninstall ‘certifi’. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead
解决办法:
pip install certifi --ignore-installed
7. UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x80 in position 3131: invalid start byte
原因: 因为目录下存在…DS_Store文件,需要删除。一般这类文件是隐藏的,需要通过ll -a查看到。
[torch] 8. An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
ner_loader_train = DataLoader(ner_train , batch_size=BATCH_SIZE, collate_fn=ner_train.collate, shuffle=True, num_workers=0)
#num_workers 16 dataloader的时候有时会有num_workers的参数,这表示线程数,num_workers=0表示单线程,num_workers = 2则表示多线程。当是多线程的时候直接运行程序也许会出现这种错误。
[torch] 9. TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]
原因:样本中存在空置, 检查样本。
[torch] 10. size mismatch for classifier.weight: copying a param with shape torch.Size([9, 768]) from checkpoint. Current model is torch.Size([2, 768])
原因: 使用的初始化模型是2分类,但建模型的时候改变了类别数
9 ‘utf-8’ codec can’t decode byte 0xd1 in position 31: invalid continuation byte
df_data = pd.read_csv(file, sep='\t', encoding='utf-8'), 报如下错
df_data = pd.read_csv(file, sep='\t', encoding='gbk'), 报如下错
df_data = pd.read_csv(file, sep='\t', encoding='gb18030'), 问题解决
10 python 夸目录引用模块
假设现在运行father.py脚本,则脚本导入如下:
'''
--path_test(文件夹)
|--father.py
|--children(文件夹)
|--son.py
'''
import sys
sys.path.append('..')
import son