python 多分类情感_python 文本情感分類

對於一個簡單的文本情感分類來說,其實就是一個二分類,這篇博客主要講述的是使用scikit-learn來做文本情感分類。分類主要分為兩步:1)訓練,主要根據訓練集來學習分類模型的規則。2)分類,先用已知的測試集評估分類的准確率等,如果效果還可以,那么該模型對無標注的待測樣本進行預測。

首先先介紹下我樣本集,樣本是已經分好詞的酒店評論,第一列為標簽,第二列為評論,前半部分為積極評論,后半部分為消極評論,格式如下:

2348f9570008fb7ebb79a7f7c22243f2.png

下面實現了SVM,NB,邏輯回歸,決策樹,邏輯森林,KNN 等幾種分類方法,主要代碼如下:

#coding:utf-8

from matplotlib import pyplot

import scipy as sp

import numpy as np

from sklearn.cross_validation import train_test_split

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.metrics import precision_recall_curve

from sklearn.metrics import classification_report

from numpy import *

#========SVM========#

def SvmClass(x_train, y_train):

from sklearn.svm import SVC

#調分類器

clf = SVC(kernel = 'linear',probability=True)#default with 'rbf'

clf.fit(x_train, y_train)#訓練,對於監督模型來說是 fit(X, y),對於非監督模型是 fit(X)

return clf

#=====NB=========#

def NbClass(x_train, y_train):

from sklearn.naive_bayes import MultinomialNB

clf=MultinomialNB(alpha=0.01).fit(x_train, y_train)

return clf

#========Logistic Regression========#

def LogisticClass(x_train, y_train):

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(penalty='l2')

clf.fit(x_train, y_train)

return clf

#========KNN========#

def KnnClass(x_train,y_train):

from sklearn.neighbors import KNeighborsClassifier

clf=KNeighborsClassifier()

clf.fit(x_train,y_train)

return clf

#========Decision Tree ========#

def DccisionClass(x_train,y_train):

from sklearn import tree

clf=tree.DecisionTreeClassifier()

clf.fit(x_train,y_train)

return clf

#========Random Forest Classifier ========#

def random_forest_class(x_train,y_train):

from sklearn.ensemble import RandomForestClassifier

clf= RandomForestClassifier(n_estimators=8)#參數n_estimators設置弱分類器的數量

clf.fit(x_train,y_train)

return clf

#========准確率召回率 ========#

def Precision(clf):

doc_class_predicted = clf.predict(x_test)

print(np.mean(doc_class_predicted == y_test))#預測結果和真實標簽

#准確率與召回率

precision, recall, thresholds = precision_recall_curve(y_test, clf.predict(x_test))

answer = clf.predict_proba(x_test)[:,1]

report = answer > 0.5

print(classification_report(y_test, report, target_names = ['neg', 'pos']))

print("--------------------")

from sklearn.metrics import accuracy_score

print('准確率: %.2f' % accuracy_score(y_test, doc_class_predicted))

if __name__ == '__main__':

data=[]

labels=[]

with open ("train2.txt","r")as file:

for line in file:

line=line[0:1]

labels.append(line)

with open("train2.txt","r")as file:

for line in file:

line=line[1:]

data.append(line)

x=np.array(data)

labels=np.array(labels)

labels=[int (i)for i in labels]

movie_target=labels

#轉換成空間向量

count_vec = TfidfVectorizer(binary = False)

#加載數據集,切分數據集80%訓練,20%測試

x_train, x_test, y_train, y_test= train_test_split(x, movie_target, test_size = 0.2)

x_train = count_vec.fit_transform(x_train)

x_test = count_vec.transform(x_test)

print('**************支持向量機************ ')

Precision(SvmClass(x_train, y_train))

print('**************朴素貝葉斯************ ')

Precision(NbClass(x_train, y_train))

print('**************最近鄰KNN************ ')

Precision(KnnClass(x_train,y_train))

print('**************邏輯回歸************ ')

Precision(LogisticClass(x_train, y_train))

print('**************決策樹************ ')

Precision(DccisionClass(x_train,y_train))

print('**************邏輯森林************ ')

Precision(random_forest_class(x_train,y_train))

結果如下:

d7c0488ef596b2a1e10ddb8be0a0ffb6.png

對於整體代碼和語料的下載,可以去下載

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值