Machine Learning in Action(三)---Naive Bayes1.0

  • The book is written by Peter Harrington.

前言

由于我读的这本书是英文版,阅读过程中主要是英文思维,为了方便,我的笔记也会以英文的形式来记录。

From Machine learning in Action Chapter 4.
This chapter will lead us to some ways probability theory to help us classify thing. Start out with the simplest probabilistic classifier and make some assumptions, finally the naive Bayes classifier.

First,remember that probability theory is the basis for many ML algorithms.

Pros and Cons 优缺点

Pros
Cons
Naive Bayes
small amount of data + multiple classes
Sensitive to how the input data os prepared
  • Key idea : Choose the class with the higher probability.
  • Popular in : Document-classification problem.

General Approaches

1
2
3
4
5
6
Naive Bayes
Collect :RSS in this chapter
Prepare:Numeric or Boolean values
Analyze
Train:Conditional probability of features
Test:The error rate
Use: Like document classification

· prepare

Prepare: making word vetors from text

def loadDataSet():
	postingList=[['my','dog','has','flea',\
				'problems','help','please'],
				['maybe','not','take','him',\
				'to','dog','park','stupid'],
				['my','dalmation','is','so','cute',\
				'I','love','him'],
				['stop','posting','stupid','worthless',\
				'garbage']]
	classVec = [0,1,0,1]
	return postingList,classVec
def createVocabList(dataset):
	vocabSet = set([])    #建立空列表
	for document in dataset:
		vocabSet = vocabSet | set(document) #或运算,取并集
	return list(vocabSet)
def setOfWords2Vec(vocabList,inputSet):
	returnVec = [0]*len(vocabList)
	for word in inputSet:
		if word in vocabList:
			returnVec[vocabList.index(word)] = 1
		else:print("the word: %s is not in my Vocabulary!" % word)
	return returnVec

涉及到的python知识:或运算符,列表
经检验,程序可正常使用,
使用Spyder(python3.7)

总的来说,第一个函数建立数据,供实验;
第二个函数是找出“所有document中,只出现了一次的词”;
第三个函数,是看第二个函数找出的那些词,是否出现在“当前document”中,并返回0,1的数组。如下图:

loadDataSet 功能
载入数据
createVocalList 功能
取列表的并集
setOfWords2Vec 功能
输入列表,与并集作比较,出现则返回1

未完待续

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值