tokenizer，nltk，spacy

c橙橙橙橙橙

已于 2023-10-09 23:16:23 修改

阅读量46

点赞数

文章标签：深度学习

于 2023-10-02 23:35:09 首次发布

本文链接：https://blog.csdn.net/weixin_51207423/article/details/133326284

版权

在这里插入图片描述
padding 是否需要用0填充（attention_mask中为0的不参与计算）
max_length 处理的最大长度
truncation 截断，超过最长的不处理
return_tensor 可以指定返回pytorch的类型

NLTK包（英文分词）
在这里插入图片描述

停用词过滤

词性标注
spacy库

# 导入spaCy库
import spacy
# 加载英语语言模型
nlp = spacy.load("en_core_web_sm")
# 创建一个文档
doc = nlp("This is a sentence. l love you. hello bob! jack go to china")

# 打印文档中每个单词的文本和词性标签
print([(w.text, w.pos_) for w in doc])
#[('This', 'PRON'), ('is', 'AUX'), ('a', 'DET'), ('sentence', 'NOUN'), ('.', 'PUNCT'), ('l', 'NOUN'), ('love', 'VERB'), ('you', 'PRON'), ('.', 'PUNCT'), ('hello', 'PROPN'), ('bob', 'PROPN'), ('!', 'PROPN'), ('jack', 'PROPN'), ('go', 'VERB'), ('to', 'ADP'), ('china', 'PROPN')]

在这里插入图片描述

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

c橙橙橙橙橙

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
tokenizer，nltk，spacy

padding 是否需要用0填充（attention_mask中为0的不参与计算）return_tensor 可以指定返回pytorch的类型。truncation 截断，超过最长的不处理。max_length 处理的最大长度。NLTK包（英文分词）
复制链接

扫一扫