python保存模型,如何在Python中使用保存模型进行预测

I am doing a text classification in python and I want to use it in production environment for making prediction on new document. I am using TfidfVectorizer to build bagofWord.

I am doing:

X_train = vectorizer.fit_transform(clean_documents_for_train, classLabel).toarray()

Then I am doing cross validation and building the model using SVM. After that I am saving the model.

For making prediction on my test data I am loading that model in another script where I have the same TfidfVectorizer and I know I can't do fit_transform on my testing data. I have to do:

X_test = vectorizer.transform(clean_test_documents, classLabel).toarray()

But this is not possible because I have to fit first. I know there is a way. I can load my training data and perform fit_transform like I did during building the model but my training data is very large and every time I want to predict I can't do that. So my question is:

Is there a way I can use TfidfVectorizer on my test data and perform prediction ?

Is there any other way to perform prediction ?

解决方案

The vectorizer is part of your model. When you save your trained SVM model, you need to also save the corresponding vectorizer.

To make this more convenient, you can use Pipeline to construct a single "fittable" object that represents the steps needed to transform raw input to prediction output. In this case, the pipeline consists of a Tf-Idf extractor and an SVM classifier:

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn import svm

from sklearn.pipeline import Pipeline

vectorizer = TfidfVectorizer()

clf = svm.SVC()

tfidf_svm = Pipeline([('tfidf', vectorizer), ('svc', clf)])

documents, y = load_training_data()

tfidf_svm.fit(documents, y)

This way, only a single object needs to be persisted:

from sklearn.externals import joblib

joblib.dump(tfidf_svm, 'model.pkl')

To apply the model on your testing document, load the trained pipeline and simply use its predict function as usual with raw document(s) as input.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值