python实现将文本格式改为utf-8withoutBom格式

最新推荐文章于 2024-05-14 10:44:29 发布

老大的小跟班999

最新推荐文章于 2024-05-14 10:44:29 发布

阅读量1.6k

点赞数

文章标签： niutrans双语训练文本格式utf-8withoutBom

本文链接：https://blog.csdn.net/qq_41949211/article/details/94132831

版权

在使用niutrans是用自己的数据进行模型训练时翻译失败，原来是双语文件编码问题，前提要将预料文件改成utf-8withoutBom的格式，否则造成训练失败。
f = open(“sourcedata/english.raw.sample.txt”, “rb”)
s = f.read()
if s.startswith(codecs.BOM_UTF8):
s = s[len(codecs.BOM_UTF8):]
f.close()
判断文本的编码类型
f = open(‘sourcedata/chinese.raw.sample.txt’, ‘rb’)
data = f.read()
print(chardet.detect(data))

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

老大的小跟班999

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python实现将文本格式改为utf-8withoutBom格式

在使用niutrans是用自己的数据进行模型训练时翻译失败，原来是双语文件编码问题，前提要将预料文件改成utf-8withoutBom的格式，否则造成训练失败。f = open(“sourcedata/english.raw.sample.txt”, “rb”)s = f.read()if s.startswith(codecs.BOM_UTF8):s = s[len(codecs.BOM...
复制链接

扫一扫