文本分类:文本预处理+特征工程+训练分类模型+调用分类模型+FlaskApi
数据集:链接:https://pan.baidu.com/s/10Na-pH5YBGs51TFnZoExdA
提取码:mic3
参考链接:https://blog.csdn.net/u013421629/article/details/87878580
https://blog.csdn.net/qq_33493180/article/details/90238654
https://blog.csdn.net/freeking101/article/details/100174215
1.读取新闻数据+文本预处理+特征工程+训练分类模型+调用分类模型+Flask:
原始数据格式:‘URL’,‘theme’,‘content’,‘category’
2.详尽代码
所需的包
先举个小例子关于flask的简单使用
下面展示一些 详细代码:
#分词
def data_Preprocessing(content,content_S):
content_S=[]
for line in content:
current_segment = jieba.cut(line)
content_S.append(current_segment)
df_content=pd.DataFrame({
'content_S':content_S})
return df_content,content_S
#去停
def drop_stopwords(contents,stopwords):
contents_clean = []
all_words = []
for line in contents:
line_clean = []
for word in line:
if word in stopwords:
continue
line_clean.append(word)
all_words.append(str(word))
contents_clean.append(line_clean)
return contents_clean,all_words
#数据集划分
def data_split_train_test(c,l):
x_train,x_test,y_train