python利用NLP实现电影推荐

该代码示例展示了如何利用NLTK库中的movie_reviews数据集,通过正面和负面评论训练朴素贝叶斯分类器进行情感分析。首先,将数据集划分为训练集和测试集,然后使用80%的数据进行训练,最后对测试集进行预测,得到准确率为0.735。此外,还演示了模型在实际业务场景中的应用,对四个电影评论进行情感分类。
摘要由CSDN通过智能技术生成
情感分析

分析语料库中movie_reviews文档, 通过正面及负面评价进行自然语言训练, 实现情感分析.

"""
demo08_movie_reviews.py 电影推荐
"""
import nltk.corpus as nc
import nltk.classify as cf
import nltk.classify.util as cu
import numpy as np

# 存储正面数据
pdata = []
# 读取语料库中movie_reviews文件夹中的pos文件夹
# 把每个文件的文件名返回
fileids = nc.movie_reviews.fileids('pos')
# 遍历每个文件, 把文件信息存入pdata
for fileid in fileids:
	sample = {}
	# 对全文进行分词  得到words列表
	words = nc.movie_reviews.words(fileid)
	for word in words:
		sample[word] = True
	pdata.append((sample, 'POSITIVE'))

# 存储负面数据
ndata = []
# 读取语料库中movie_reviews文件夹中的neg文件夹
# 把每个文件的文件名返回
fileids = nc.movie_reviews.fileids('neg')
# 遍历每个文件, 把文件信息存入pdata
for fileid in fileids:
	sample = {}
	# 对全文进行分词  得到words列表
	words = nc.movie_reviews.words(fileid)
	for word in words:
		sample[word] = True
	ndata.append((sample, 'NEGATIVE'))

# 拆分测试集与训练集 (80% 训练集)
pnumb, nnumb = \
	int(len(pdata)*0.8), int(len(ndata)*0.8)
train_data = pdata[:pnumb] + ndata[:nnumb]
test_data = pdata[pnumb:] + ndata[nnumb:]
print(np.array(train_data).shape)
print(np.array(test_data).shape)

# 基于朴素贝叶斯模型, 训练测试数据
model=cf.NaiveBayesClassifier.train(train_data)
ac = cu.accuracy(model, test_data)
print(ac)

# 模拟业务场景
reviews = [
 'It is an amazing movie. ',
 'This is a dull movie, I would never \
  recommend it to anyone. ', 
 'The cinematography is pretty great \
  in this movie. ', 
 'The direction was terrible and the story \
  was all over the place. ']

for review in reviews: 
	sample = {}
	words = review.split() # 野蛮分词
	for word in words:
		sample[word] = True
	# classify类似predict方法, 通过样本预测类别
	pred_y = model.classify(sample)
	print(review, '->', pred_y)

输出结果:

(1600, 2)
(400, 2)
0.735
It is an amazing movie.  -> POSITIVE
This is a dull movie, I would never   recommend it to anyone.  -> NEGATIVE
The cinematography is pretty great   in this movie.  -> POSITIVE
The direction was terrible and the story   was all over the place.  -> NEGATIVE
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值