目录
NLTK被常用于 处理语料库、分类文本、分析语言结构 中
https://www.nltk.org/ #NLTK官网有教程
NLTK支持python3.7及以上的版本
安装分两步
(1)pip install nltk
(2)去 Gitee网站下载nltk数据包
nltk.find('.') #可以找到 nltk在找东西时的调用目录
1.断句模块:
import nltk
from nltk.tokenize import sent_tokenize #英文断句模块
#要断句的文本
paragraph = 'You must follow me carefully. I shall have to controvert one or twoideas that are almost universally accepted. The geometry, forinstance, they taught you at school is founded on a misconception.'
tokenized_text = sent_tokenize(paragraph)
print(tokenized_text)
tokenized_text输出结果: ['You must follow me carefully.', 'I shall have to controvert one or twoideas that are almost universally accepted.', 'The geometry, forinstance, they taught you at school is founded on a misconception.']
2.分词模块:
from nltk import word_tokenize #导入分词模块
text = 'You must follow me carefully.'
tokenized_word = word_tokenize(text)
print(tokenized_word)
tokenized_word输出结果:
['You', 'must', 'follow', 'me', 'carefully', '.']
3.去除文本中的除标点符号:
import string #python自带的英文标点模块
punctuation = string.punctuation #英文标点符号
text = 'You must follow me carefully.' #待处理文本
#设置映射关系: 用空格替代标点=删除掉标点
#translate()函数功能: 用A替代B
text_1 = text.translate(str.maketrans(punctuation, ' ' * len(punct

最低0.47元/天 解锁文章
3万+





