python的nltk能做啥_在Python中使用NLTK时，generate（）会做什么？

最新推荐文章于 2022-11-07 20:31:03 发布

weixin_39856630

最新推荐文章于 2022-11-07 20:31:03 发布

阅读量109

点赞数

文章标签： python的nltk能做啥

I've been working with NLTK for the past three days to get familiar and reading the "Natural Language processing" book to understand what's going on. I'm curious if someone could clarify for me the following:

Note that the first time you run this command, it is slow because it

gathers statistics about word sequences. Each time you run it, you

will get different output text. Now try generating random text in the

style of an inaugural address or an Internet chat room. Although the

text is random, it re-uses common words and phrases from the source

text and gives us a sense of its style and content. (What is lacking

in this randomly generated text?)

This part of the text, chapter 1, simply says that it "gathers statistics" and it will get "different output text"

What specifically does generate do and how does it work?

This example of generate() uses text3, which is the Bible's Genesis:

In the beginning , between me and thee and in the garden thou mayest

come in unto Noah into the ark , and Mibsam , And said , Is there yet

any portion or inheritance for us , and make thee as Ephraim and as

the sand of the dukes that came with her ; and they were come . Also

he sent forth the dove out of thee , with tabret , and wept upon them

greatly ; and she conceived , and called their names , by their names

after the end of the womb ? And he

Here, the generate() function seems to simply output phrases created by cutting off text at punctuation and randomly reassembling it but it has a bit of readability to it.

解决方案

type(text3) will tell you that text3 is of type nltk.text.Text.

To cite the documentation of Text.generate():

Print random text, generated using a trigram language model.

That means that NLTK has created an N-Gram model for the Genesis text, counting each occurence of sequences of three words so that it can predict the most likely successor of any given two words in this text. N-Gram models will be explained in more detail in chapter 5 of the NLTK book.