CpmTokenizer requires the SentencePiece library but it was not found in your environment.

一、报错信息分析

完整报错信息:

ImportError: 
CpmTokenizer requires the SentencePiece library but it was not found in your environment. Checkout the instructions on the installation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones that match your environment. Please note that you may need to restart your runtime after installation.

ImportError是Python解释器在导入模块时出现的错误,也就是导包不成功,看到这个报错就知道它的复杂度不高,最多是考虑清楚包之间的依赖关系。

CpmTokenizer requires the SentencePiece library but it was not found in your environment.这一句便是关键,某个类缺了库,缺少安装包要么直接pip install,要么去官网下载下来,按照依赖自己安装。

Checkout the instructions on the installation page of its repo: https://github.com/google/sentencepiece #installation and follow the ones that match your environment. 这一句的意思是你可以去github仓库里找到适合你环境的指引

Please note that you may need to restart your runtime after installation.这一句就是让你记得安装好环境之后重启运行环境。

二、安装对应库

安装SentencePiece即解决问题

pip install SentencePiece

实际上pycharm有自动更新包的功能,不用手动重启环境。

三、依赖关系分析

SentencePiece 是一种无监督的文本 tokenizer 和 detokenizer,主要用于基于神经网络的文本生成系统,其中,词汇量在神经网络模型训练之前就已经预先确定了。 SentencePiece 实现了subword单元(例如,字节对编码 (BPE))和 unigram 语言模型),并可以直接从原始句子训练字词模型(subword model)。 这使得我们可以制作一个不依赖于特定语言的预处理和后处理的纯粹的端到端系统。

上很多tokenizer的实现调用了SentencePiece 来处理文本,主要用在分词这个领域,看代码说明。预训练的分词器需要jieba、SentencePiece。

Construct a CPM tokenizer. Based on [Jieba](https://pypi.org/project/jieba/) and
[SentencePiece](https://github.com/google/sentencepiece).

This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should
refer to this superclass for more information regarding those methods.

 四、总结

读懂英文,问题归类,探究多一点点,理清部分脉络。

  • 20
    点赞
  • 19
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值