python自然语言处理笔记（一）

最新推荐文章于 2021-11-23 16:51:06 发布

是小李呀~

最新推荐文章于 2021-11-23 16:51:06 发布

阅读量226

点赞数 1

分类专栏： python 机器学习算法自然语言处理

本文链接：https://blog.csdn.net/qq_44543774/article/details/119767418

版权

python 同时被 3 个专栏收录

27 篇文章 0 订阅

订阅专栏

机器学习算法

26 篇文章 10 订阅

订阅专栏

自然语言处理

24 篇文章 4 订阅

订阅专栏

一． NLTK的几个常用函数

Concordance

实例如下：

>>> text1.concordance("monstrous")
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us ,
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u
>>>

这个函数就是用来搜索单词word在text 中出现多的情况,包括出现的那一行,重点强调上下文。从输出来看 concordance 将要查询的单词,基本显示在一列,这样容易观察其上下文.

Similar

实例：

>>> text1.similar("monstrous")
modifies horrible singular mouldy contemptible determined tyrannical
candid wise lamentable pitiable fearless loving maddens domineering
careful true mystifying part passing
>>>

这个函数的作用则是根据word 的上下文的单词的情况,来查找具有相似的上下文的单词. 比如monstrous 在上面可以看到,有这样的用法:

most monstrous size 
the monstrous pictures 
this monstrous cabinet

等等, similar() 函数会在文本中搜索具有类似结构的其他单词, 不过貌似这个函数只会考虑一些简单的指标,来作为相似度,比如上下文的词性,更多的完整匹配, 不会涉及到语义.

Common_contexts

实例：

text1.common_contexts(["monstrous", "very"])
No common contexts were found
 text2.common_contexts(["monstrous", "very"])
a_pretty a_lucky am_glad be_glad is_pretty

这个函数跟simailar() 有点类似,也是在根据上下文搜索的.
不同的是,这个函数是用来搜索共用参数中的列表中的所有单词,的上下文.即: word1,word2 相同的上下文.

Dispersion_plot

实例：

>>> text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "Americ
a"])

这个函数是用离散图表示语料中word 出现的位置序列表示. 效果如下：
在这里插入图片描述
其中横坐标表示文本的单词位置.纵坐标表示查询的单词, 坐标里面的就是,单词出现的位置.就是单词的分布情况。

generate

实例：

>>> text3.generate()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: generate() missing 1 required positional argument: 'words'
>>>