python自然语言处理第一二章

最新推荐文章于 2021-02-19 23:45:09 发布

qq_34505594

最新推荐文章于 2021-02-19 23:45:09 发布

阅读量313

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/qq_34505594/article/details/79496018

版权

Python 专栏收录该内容

16 篇文章 1 订阅

订阅专栏

随笔记录，有待完善。
1.text1.concordance("word") 在text1中搜索word这个词
text2.similar("word")      在text1中搜索在类似上写问中出现的词
text2.comon_text("word1"，"word2")   在text2中搜索研究共用两个或两个以上词汇的上下文
text2.dispersion_plot(["word1","word2"]) 词汇分布图
text3.generate() 随机生成文本
text.index("word") 查找词汇位置
2.set(text3) text3的词汇表
sort(text2) text2排序
sent.append("some") sent中添加词汇
' '.join(['Monty','Python']) 将词汇组合成链表
'Monty Python'.split() 将链表拆分成词汇
3.fdist1=FreqDist(text1) 单词频率分布
fdist1.plot(50,cumulative=True) 词汇累积频率图
fdist1.hapaxes() 只出现一次的词汇
fdist1[w]>3 w出现3次以上
fdist1.inc(sample) 增加样本
fdist1.freq('monstrous') 样本频率
fdist1.collocations() 双连词
fdist1.items() 所有键值对
P23 more
4.w for w in V if len(w)>15 细粒度的选择词
5.P25 词汇比较运算符

第二章获取文本语料和词汇资源
1.raw()函数能在没有进行过任何语言学处理之前把文件的内容分析出来，len(futenberg.raw('blake_poems.txt'))告诉我们文本中词汇的个数，包括词之间的空格
sent()函数吧文本划分成句子，没一个句子是一个词链表
2.P45布朗语料部分示例文档
p49NLK中的一些语料库和语料库样本
P53 NLTK中定义的基本语料库函数
3.cfd=nltk.ConditionalFreqDist(
(genre,word)
for genre in brown.categories()    语料库类别
for word in brown.words(categories=genre)) 条件频率分布函数
cfd.tabulate(condtions=genres,sample=modals)
4.nltk.bigrams(sent) 生成双连词，产生随机文本