python textblob,python textblob和文本分类

I'm trying do build a text classification model with python and textblob, the script is runing on my server and in the future the idea is that users will be able to submit their text and it will be classified.

i'm loading the training set from csv :

# -*- coding: utf-8 -*-

import sys

import codecs

sys.stdout = open('yyyyyyyyy.txt',"w");

from nltk.tokenize import word_tokenize

from textblob.classifiers import NaiveBayesClassifier

with open('file.csv', 'r', encoding='latin-1') as fp:

cl = NaiveBayesClassifier(fp, format="csv")

print(cl.classify("some text"))

csv is about 500 lines long (with string between 10 and 100 chars), and NaiveBayesclassifier needs about 2 minutes for training and then be able to classify my text(not sure if is normal that it need so much time, maybe is my server slow with only 512mb ram).

example of csv line :

"Oggi alla Camera con la Fondazione Italia-Usa abbiamo consegnato a 140 studenti laureati con 110 e 110 lode i diplomi del Master in Marketing Comunicazione e Made in Italy.",FI-PDL

what is not clear to me, and i cant find an answer on textblob documentation, is if there is a way to 'save' my trained classifier (so save a lot of time), because by now everytime i run the script it will train again the classifier.

I'm new to text classification and machine learing so my apologize if it is a dumb question.

Thanks in advance.

解决方案

Ok found that pickle module is what i need :)

Training:

# -*- coding: utf-8 -*-

import pickle

from nltk.tokenize import word_tokenize

from textblob.classifiers import NaiveBayesClassifier

with open('file.csv', 'r', encoding='latin-1') as fp:

cl = NaiveBayesClassifier(fp, format="csv")

object = cl

file = open('classifier.pickle','wb')

pickle.dump(object,file)

extracting:

import pickle

sys.stdout = open('demo.txt',"w");

from nltk.tokenize import word_tokenize

from textblob.classifiers import NaiveBayesClassifier

cl = pickle.load( open( "classifier.pickle", "rb" ) )

print(cl.classify("text to classify"))

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值