Andrew Ng machine learning 课程笔记--朴素贝叶斯算法

Naïve Bayes:in this model ,each of our features were zero,one,so indicating whether different words appear,and the length or the feature vector was ,sort of,the length N of the feature vector was the number of words in the dictonary.

The Multivariate Bernoulli Event Model:it refers to the fact that there are multiple B ernoulli ranom variables

The Multinomial Event Model:my ith training example,XI will be a feature vector,XI sub group one,XI sub group two,XI subscript NI where NI id equal  to thee number of words in this email.And each of these elements of the feature vector will be an index into my dictionary .in the second example,N is the number of words in a given email,if it's the I email subscripting,then this N subscript I ,so N will be different for different training examples,and here XI will be,these values from 1 to 50000,XI is essentially the identity of the Ith word in a given piece of email.it turns out that for text classification,the naïve bayes algorithm with the second event model,it turns out that almost always does better than the first naïve bayes model I talked about when you are applying it to the specific case to the specific of text classification.it doesnot care about the ordering of the words.You can shuffle all the words in the email,and it does exactly the same thing.so in natural language processing,it's called a Unigram Model.there are many other models in natural language processing,like higher order makeup models that take into account some of the ordering of the words.it turns out for natural language processing the models like the bigram models or trigram models,I beleve they do only very slightly better .

Laplace smoothing:a method to give you better estimates of their probability distribution over a multinomial

Non-linear classifiers:

Neutral network:having my features here and then I would feed them to say a few of these little sigmoid units,and these together will feed into yet another sigmoid unit,say,which willl output my final output H subscript theta of X.A nd just to give these names ,let me call the values output by these thhree intermidiate sigmoidal unite;let me call them A1,A2,A3.and so the value A1 will be computed as G of X transpose,and then some set of parameters,which I 'll write as theta one,and similarly A2 will be computed as G of X transpose theta two,where G is sigmoid function.and G of Z ,our final hypothesis will output G of A transpose theta four.one way to learn the parameters of an algorithm like this is to just use gradient interscent to minimize J of theta as a function of theta.the neutral network is that you can look at what these intermediate notes are computing.so this neural network has what is called a hidden layer before you can then have the output layer.

Else:it turns out that a quadratic cost function llike I wrote down on the chalkboard just now,it turns out that unlike logistic regresion,that will almost always respond to non-convex optimization problem,and so whereas for logistic regression if you run gradient descent or Newton's method or whatever,you converse the global optimer.this is not true for neural networks.In general,there are lots of local optimer and ,sort of ,much harder optimization problem.

LeNet:there is this robustness to noise.

Support vector machine:I'm going to say that the functional margin of a hyper plane WB with respect to a specific training example,XIYI has been defind as Gamma Hat I equals YI times W transpose XI plus B.It defines a linear separating boundary,and so when I say hyper plane,I just mean the decision boundary that ia defined by the parameters W,B.Gamma Hat is equal to min over all your trainng examples of gamma hat I.add a normalization condition  that the norm of the parameter W is equal to one.if the norm of W is equal to one,then the functional margin is equal to the geometric margin,the geometric margin is just equal to the functional margin divided  by the norm of W.

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值