Torchtext中使用spaCy作为Tokenizer加载IMDB数据集速度太慢的问题

最新推荐文章于 2023-11-22 15:54:42 发布

再见孙俉空

最新推荐文章于 2023-11-22 15:54:42 发布

阅读量779

点赞数 1

文章标签：自然语言处理神经网络 pytorch

本文链接：https://blog.csdn.net/weixin_43390599/article/details/116938431

版权

https://github.com/pytorch/text/issues/481 这个issue提到了，我这边的解决方法是把语言模型换成小的

# tokenizer_language 使用sm大小，要不然加载IMDB用时太久！
TEXT = data.Field(tokenize = 'spacy',
                  tokenizer_language = 'en_core_web_sm',
                  include_lengths = True)

LABEL = data.LabelField(dtype = torch.float)

en_core_web_sm是一个python包，安装方法见我的这篇博客https://blog.csdn.net/weixin_43390599/article/details/116887184

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

再见孙俉空

关注关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Torchtext中使用spaCy作为Tokenizer加载IMDB数据集速度太慢的问题

https://github.com/pytorch/text/issues/481 这个issue提到了，我这边的解决方法是把语言模型换成小的# tokenizer_language 使用sm大小，要不然加载IMDB用时太久！TEXT = data.Field(tokenize = 'spacy', tokenizer_language = 'en_core_web_sm', include_lengths = True)LA
复制链接

扫一扫