作者:Irain
QQ:2573396010
微信:18802080892
百度云盘文件:(链接:https://pan.baidu.com/s/1Ym_1iLYSzTIZ-ajNFad_kA
提取码:hlyo)
视频链接:文本主题与分类之中文文本分类
1 多项式朴素贝叶斯
1.1载入中文文本
import jieba
import pandas as pd
df_technology = pd.read_csv("./data/technology_news.csv", encoding='utf-8')
df_technology = df_technology.dropna()
df_car = pd.read_csv("./data/car_news.csv", encoding='utf-8')
df_car = df_car.dropna()
df_entertainment = pd.read_csv("./data/entertainment_news.csv", encoding='utf-8')
df_entertainment = df_entertainment.dropna()
technology = df_technology.content.values.tolist()[1000:11000]
car = df_car.content.values.tolist()[1000:11000]
entertainment = df_entertainment.content.values.tolist()[:10000]
1.2 载入停用词
stopwords=pd.read_csv("data/stopwords.txt",index_col=False,quoting=3,sep="\t",names=['stopword'