Datawhale | 自然语言处理（6）——SVM

最新推荐文章于 2022-01-05 10:51:41 发布

orient928

最新推荐文章于 2022-01-05 10:51:41 发布

阅读量218

点赞数

分类专栏： Datawhale | 自然语言处理

本文链接：https://blog.csdn.net/orient928/article/details/89374099

版权

Datawhale | 自然语言处理专栏收录该内容

8 篇文章 1 订阅

订阅专栏

写在前面：

svm我之前的博客已经总结过了，这里就不在赘述了，直接附上链接，这篇博客只放我跑的代码的部分，请见谅。

文章目录

一.SVM算法
二. 利用SVM结合 Tf-idf 算法进行文本分类

一.SVM算法

https://blog.csdn.net/orient928/article/details/89220862

二. 利用SVM结合 Tf-idf 算法进行文本分类

1. 读取数据

from sklearn.datasets import fetch_20newsgroups
import numpy as np
import pandas as pd

#初次使用这个数据集的时候，会在实例化的时候开始下载
data = fetch_20newsgroups()

categories = ["sci.space" #科学技术 - 太空
,"rec.sport.hockey" #运动 - 曲棍球
,"talk.politics.guns" #政治 - 枪支问题
,"talk.politics.mideast"] #政治 - 中东问题
train = fetch_20newsgroups(subset="train",categories = categories)
test = fetch_20newsgroups(subset="test",categories = categories)

2.使用TF-IDF将文本数据编码

from sklearn.feature_extraction.text import TfidfVectorizer as TFIDF

Xtrain = train.data
Xtest = test.data
Ytrain = train.target
Ytest = test.target
tfidf = TFIDF().fit(Xtrain)
Xtrain_ = tfidf.transform(Xtrain)
Xtest_ = tfidf.transform(Xtest)
Xtrain_
tosee = pd.DataFrame(Xtrain_.toarray(),columns=tfidf.get_feature_names())
tosee.head()
tosee.shape

3.SVM建模

from sklearn.svm import SVC

clf = SVC()
clf.fit(Xtrain_,Ytrain)
y_pred = clf.predict(Xtest_)
proba = clf.predict_proba(Xtest_)
score = clf.score(Xtest_,Ytest)

print("\tAccuracy:{:.3f}".format(score))
print("\n")

orient928

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Datawhale | 自然语言处理（6）——SVM

写在前面：svm我之前的博客已经总结过了，这里就不在赘述了，直接附上链接，这篇博客只放我跑的代码的部分，请见谅。文章目录一.SVM算法二. 利用SVM结合 Tf-idf 算法进行文本分类1. 读取数据2.使用TF-IDF将文本数据编码3.SVM建模一.SVM算法https://blog.csdn.net/orient928/article/details/89220862二. 利用SV...
复制链接

扫一扫

专栏目录