朴素贝叶斯算法使用
工具:Pycharm,win10,Python3.6.4
1.题目要求
根据如下数据使用朴素贝叶斯算法进行预测。
Document Content Category
d1 ball goal cart goal Sports
d2 theater cart drama Culture
d3 drama strategy decision drama Politics
d4 theater ball Culture
d5 ball goal player strategy Sports
d6 theater cart opera Culture
d7 ball player strategy ?
d8 theater cart decision ?
2.Python代码
现在有三种类别Culture,Politics,Sports,我们把这三个类别分别建一个文件夹,并且把Content存入其中,这样子遍历文件的时候方便给数据打上标签。首先获取词汇表,代码和结果如下
import re
import numpy as np
import os
def textParse(String):
list_String = re.split(r'\W*', String)
return list_String
def readfiles():
doc_list = []
class_list = []
file_lists = ['culture', 'politics', 'sports']
for i in range(3):
for txtfile in os.listdir(file_lists[i] + '/'):
with open(file_lists[i] + '/' + txtfile, 'r', ) as f:
word_list = textParse(f.r