python评论情感分析nltk_nltk31_twitter情感分析

已经生成4个pickle文件,分别为documents,word_features,originalnaivebayes5k,featurests

其中featurests容量最大,3百多兆,如果扩大5000特征集,容量继续扩大,准确性也提供

https://www.pythonprogramming.net/sentiment-analysis-module-nltk-tutorial/

Creating a module for Sentiment Analysis with NLTK

# -*- coding: utf-8 -*-

"""

Created on Sat Jan 14 09:59:09 2017

@author: daxiong

"""

#File: sentiment_mod.py

import nltk

import random

import pickle

from nltk.tokenize import word_tokenize

documents_f = open("documents.pickle", "rb")

documents = pickle.load(documents_f)

documents_f.close()

word_features5k_f = open("word_features5k.pickle", "rb")

word_features = pickle.load(word_features5k_f)

word_features5k_f.close()

def find_features(document):

words = word_tokenize(document)

features = {}

for w in word_features:

features[w] = (w in words)

return features

featuresets_f = open("featuresets.pickle", "rb")

featuresets = pickle.load(featuresets_f)

featuresets_f.close()

random.shuffle(featuresets)

print(len(featuresets))

testing_set = featuresets[10000:]

training_set = featuresets[:10000]

open_file = open("originalnaivebayes5k.pickle", "rb")

classifier = pickle.load(open_file)

open_file.close()

def sentiment(text):

feats = find_features(text)

return classifier.classify(feats)

def sentiment_test(text):

feats = find_features(text)

value=classifier.classify(feats)

if value=="pos":

print("正面评价")

else:

print("负面评价")

def sentiment_inputTest():

text=input("主人请输入留言:")

feats = find_features(text)

value=classifier.classify(feats)

if value=="pos":

print("正面评价")

else:

print("负面评价")

print(sentiment("This movie was awesome! The acting was great, plot was wonderful, and there were pythons...so yea!"))

print(sentiment("This movie was utter junk. There were absolutely 0 pythons. I don't see what the point was at all. Horrible movie, 0/10"))

测试效果

还是比较准,the movie is good 测试不准,看来要改进算法,考虑用频率分析和过滤垃圾词来提高准确率

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是使用nltk实现情感分析的简单代码示例: 首先,我们需要导入nltk库和需要的语料库: ```python import nltk from nltk.corpus import twitter_samples from nltk.tokenize import word_tokenize from nltk.corpus import stopwords from nltk.stem import PorterStemmer from nltk.sentiment import SentimentIntensityAnalyzer ``` 接下来,我们需要加载训练数据,这里我们使用nltk内置的twitter语料库: ```python nltk.download('twitter_samples') positive_tweets = twitter_samples.strings('positive_tweets.json') negative_tweets = twitter_samples.strings('negative_tweets.json') ``` 然后,我们需要对文本进行预处理,包括分词、去停用词和词干提取: ```python nltk.download('stopwords') stop_words = stopwords.words('english') stemmer = PorterStemmer() def preprocess(tweet): tweet = tweet.lower() # 转换为小写 tweet_tokens = word_tokenize(tweet) # 分词 tweet_tokens = [token for token in tweet_tokens if token.isalpha()] # 删除非字母字符 tweet_tokens = [token for token in tweet_tokens if token not in stop_words] # 删除停用词 tweet_tokens = [stemmer.stem(token) for token in tweet_tokens] # 词干提取 return tweet_tokens positive_tweets = [preprocess(tweet) for tweet in positive_tweets] negative_tweets = [preprocess(tweet) for tweet in negative_tweets] ``` 接下来,我们可以使用情感分析器来分析文本的情感倾向: ```python sia = SentimentIntensityAnalyzer() def analyze_sentiment(tweet): neg_score = sia.polarity_scores(tweet)['neg'] # 获取负面情感分数 pos_score = sia.polarity_scores(tweet)['pos'] # 获取正面情感分数 if neg_score > pos_score: return "Negative" elif pos_score > neg_score: return "Positive" else: return "Neutral" tweet = "I'm so happy today!" sentiment = analyze_sentiment(tweet) print(sentiment) ``` 最后,我们可以根据情感分析的结果做出相应的决策。 请注意,以上代码只是一个简单的示例,实际应用中可能需要更多的数据预处理和调试优化。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值