python自然语言处理学习笔记一

本篇博客是Python自然语言处理学习的第一部分,涵盖了NLTK库的使用,包括安装、下载数据、文本查询、词汇分析、词频统计、简单统计、条件控制等基本操作。通过实例展示了如何在Python中进行语言处理,例如使用nltk库进行词汇分析、统计分析、词语搭配等,是初学者入门的好材料。
摘要由CSDN通过智能技术生成

第一章 语言处理与python

1 语言计算 文本与词汇

NLTK入门

下载安装nltk

http://www.nltk.org

下载数据

>>> import nltk

>>> nltk.download()

 

 下载完成,加载文本

>>> from nltk.book import *

*** Introductory Examples for the NLTK Book***

Loading text1, ..., text9 and sent1, ...,sent9

Type the name of the text or sentence toview it.

Type: 'texts()' or 'sents()' to list thematerials.

text1: Moby Dick by Herman Melville 1851

text2: Sense and Sensibility by Jane Austen1811

text3: The Book of Genesis

text4: Inaugural Address Corpus

text5: Chat Corpus

text6: Monty Python and the Holy Grail

text7: Wall Street Journal

text8: Personals Corpus

text9: The Man Who Was Thursday by G . K .Chesterton 1908

 

输入名字即可查询对应文本

>>> text1

<Text: Moby Dick by Herman Melville1851>

>>> text2

<Text: Sense and Sensibility by JaneAusten 1811>

 

搜索文本

>>> text1.concordance('monstrous')

Displaying 11 of 11 matches:

ong the former , one was of a mostmonstrous size . ... This came towards us ,

ON OF THE PSALMS . " Touching thatmonstrous bulk of the whale or ork we have r

ll over with a heathenish array ofmonstrous clubs and spears . Some were thick

d as you gazed , and wondered whatmonstrous cannibal and savage could ever hav

that has survived the flood ; mostmonstrous and most mountainous ! That Himmal

they might scout at Moby Dick as amonstrous fable , or still worse and more de

th of Radney .'" CHAPTER 55 Of theMonstrous Pictures of Whales . I shall ere l

ing Scenes . In connexion with themonstrous pictures of whales , I am strongly

ere to enter upon those still moremonstrous stories of them which are to be fo

ght have been rummaged out of thismonstrous cabinet there is no telling . But

of Whale - Bones ; for Whales of amonstrous size are oftentimes cast up dead u

 

#还有哪些词出现在相同的上文中

>>> text1.similar('monstrous')

impalpable puzzled part mystifying gamesomehorrible maddens

domineering curious exasperate untowardcontemptible careful

trustworthy delightfully christian meanuncommon abundant wise

>>> text2.similar('monstrous')

very exceedingly so heartily as vastamazingly extremely great sweet a

remarkably good

 

 

>>>text2.common_contexts(['monstrous','very'])

be_glad a_lucky am_glad is_pretty a_pretty

 

#美国总统就职演说词汇分布图

>>> text4.dispersion_plot(['citizens','democracy','freedom','duties','America'])

 

#产生随机文本

>>> text3.generate()

Traceback (most recent call last):

 File "<stdin>", line 1, in <module>

AttributeError: 'Text' object has noattribute 'generate'

 

#计数词汇

>>> len(text3)  #创世纪有44764个单词和标点符号

44764

 

 

#取得排序后的词汇条目

>>> sorted(set(text3))

['!', "'", '(', ')', ',', ',)','.', '.)', ':', ';', ';)', '?', '?)', 'A', 'Abel', 'Abelmizraim', 'Abidah','Abide', 'Abimael', 'Abimelech', 'Abr', 'A

brah', 'Abraham', 'Abram', 'Accad', 'Achbor','Adah', 'Adam', 'Adbeel', 'Admah', 'Adullamite', 'After', 'Aholibamah','Ahuzzath', 'Ajah', 'Akan', 'All

', 'Allonbachuth', 'Almighty', 'Almodad','Also', 'Alvah', 'Alvan', 'Am', 'Amal', 'Amalek', 'Amalekites', 'Ammon','Amorite', 'Amorites', 'Amraphel',

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值