example for document classify use nltk and python

3 篇文章 0 订阅
1. get the movie comment and classify it into pos or neg
    code like below:

点击(此处)折叠或打开

  1. >>> import nltki
  2. >>> import random
  3. >>> from nltk.corpus import movie_reviews
  4. >>> documents = [(list(movie_reviews.words(fileid)), category)
  5. ... for category in movie_reviews.categories()
  6. ... for fileid in movie_reviews.fileids(category)]
  7. >>> random.shuffle(documents)

2.get the features of the documents, that if the word in the selected document
    code like below:

点击(此处)折叠或打开

  1. >>> all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
  2. >>> word_features = all_words.keys()[:2000]
  3. >>> def document_features(document):
  4. ... document_words = set(document)
  5. ... features = {}
  6. ... for word in word_features:
  7. ... features['contains(%s)' % word] = (word in document_words)
  8. ... return features

3.train and test the classifier for the document
    code like below:

点击(此处)折叠或打开

  1. >>> featuresets = [(document_features(d), c) for (d,c) in documents]
  2. >>> train_set, test_set = featuresets[100:], featuresets[:100]
  3. >>> classifier = nltk.NaiveBayesClassifier.train(train_set)
  4. >>> print nltk.classify.accuracy(classifier, test_set)
  5. 0.73



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值