初次认识NLTK

最新推荐文章于 2024-07-08 16:14:38 发布

刹那永恒HB

最新推荐文章于 2024-07-08 16:14:38 发布

阅读量174

点赞数 1

分类专栏：计算机科学文章标签：数据库

本文链接：https://blog.csdn.net/qq_43165081/article/details/108845266

版权

计算机科学专栏收录该内容

66 篇文章 8 订阅

订阅专栏

NLTK是一个比较优秀的自然语言处理工具包，是我们聊天机器人需要的比较重要的一个工具，本节介绍它的安装和基本使用

NLTK库安装

pip install nltk

执行python并下载书籍：

[root@centos #] python
Python 2.7.11 (default, Jan 22 2016, 08:29:18)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download()

在这里插入图片描述
选择book后点Download开始下载
下载完成以后再输入：

>>> from nltk.book import *

你会看到可以正常加载书籍如下：

*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908

这里面的text*都是一个一个的书籍节点，直接输入text1会输出书籍标题：

>>> text1
<Text: Moby Dick by Herman Melville 1851>

搜索文本

执行

>>> text1.concordance("former")

会显示20个包含former的语句上下文

>>> text1.similar("ship")
whale boat sea captain world way head time crew man other pequod line
deck body fishery air boats side voyage

输入了ship，查找了boat，都是近义词
我们还可以查看某个词在文章里出现的位置

>>> text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])

词统计

len(text1)：返回总字数

set(text1)：返回文本的所有词集合

len(set(text4))：返回文本总词数

text4.count(“is”)：返回“is”这个词出现的总次数

FreqDist(text1)：统计文章的词频并按从大到小排序存到一个列表里

fdist1 = FreqDist(text1);fdist1.plot(50, cumulative=True)：统计词频，并输出累计图像

刹那永恒HB

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录