1. get the movie comment and classify it into pos or neg
code like below:
2.get the features of the documents, that if the word in the selected document
code like below:
3.train and test the classifier for the document
code like below:
code like below:
点击(此处)折叠或打开
- >>> import nltki
- >>> import random
- >>> from nltk.corpus import movie_reviews
- >>> documents = [(list(movie_reviews.words(fileid)), category)
- ... for category in movie_reviews.categories()
- ... for fileid in movie_reviews.fileids(category)]
- >>> random.shuffle(documents)
2.get the features of the documents, that if the word in the selected document
code like below:
点击(此处)折叠或打开
- >>> all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
- >>> word_features = all_words.keys()[:2000]
- >>> def document_features(document):
- ... document_words = set(document)
- ... features = {}
- ... for word in word_features:
- ... features['contains(%s)' % word] = (word in document_words)
- ... return features
3.train and test the classifier for the document
code like below:
点击(此处)折叠或打开
- >>> featuresets = [(document_features(d), c) for (d,c) in documents]
- >>> train_set, test_set = featuresets[100:], featuresets[:100]
- >>> classifier = nltk.NaiveBayesClassifier.train(train_set)
- >>> print nltk.classify.accuracy(classifier, test_set)
- 0.73