Machine Learning:Classification:Probabilistic Generative Model

Classification

X inputs into Function and generates output.The output will be classified into different classes.Like credit scoring:Input is income,savings,profession and so on.Output is accept or refuse.Medical Diagnosis,handwritten,face recognition.

How to do Classification

Ideal Alternatives.Loss Function is:
L ( f ) = ∑ n δ ( f ( x n ) ≠ y ^ n ) L(f)=\sum_{n}\delta(f(x^n)\neq \hat{y}^n) L(f)=nδ(f(xn)=y^n)
And this time we can not differential.Because it’s not continuous.

Two Classes

We all know:
P ( C 1 ∣ X ) = P ( X ∣ C 1 ) P ( C 1 ) P ( X ) P(C_1|X)=\frac{P(X|C_1)P(C_1)}{P(X)} P(C1X)=P(X)P(XC1)P(C1)
We need use training data to calculate the probabilities.

Generative Model

P ( X ) = P ( X ∣ C 1 ) P ( C 1 ) + P ( X ∣ C 2 ) P ( C 2 ) P(X)=P(X|C_1)P(C_1)+P(X|C_2)P(C_2) P(X)=P(XC1)P(C1)+P(XC2)P(C2)

Prior

P ( C 1 ) P(C_1) P(C1) and P ( C 2 ) P(C_2) P(C2) are Prior.

Feature

Under a class, analyze the feature distribution of the training data.

Gaussian Distribution
Like this:
f μ , ∑ ( ) = 1 ( 2 π ) D / 2 1 ∣ ∑ ∣ 1 / 2 e x p { − 1 2 ( x − μ ) T ∑ − 1 ( x − μ ) } f_{\mu,\sum}()=\frac{1}{(2\pi)^{D/2}}\frac{1}{|\sum|^{1/2}}exp\{-\frac{1}{2}(x-\mu)^T\sum^{-1}(x-\mu)\} fμ,()=(2π)D/211/21exp{21(xμ)T1(xμ)}
Function.This function’s shape determines by mean μ \mu μ and covariance matrix ∑ \sum . μ \mu μ likes the center of circles,and ∑ \sum likes radius.The features of trainning data can be sampled by Gaussian Distribution.If the data is close to μ \mu μ,it can be easily sampled.Otherwise,it is hard.How to get μ \mu μ and ∑ \sum ?

Maximum Likelihood

The key is find Maximum Likelihood.Any sample image can use any μ \mu μ and ∑ \sum .The Gaussian with any mean μ \mu μ and covariance matrix ∑ \sum can generate these points.Different likelihood.Likelihood of a Gaussian with mean μ \mu μ and covariance matrix ∑ \sum =the probability of the Gaussian samples x 1 , x 2 , . . . , x 79 x^1,x^2,...,x^{79} x1,x2,...,x79.
L ( μ , ∑ ) = f μ , ∑ ( x 1 ) f μ , ∑ ( x 2 ) . . . f μ , ∑ ( x n ) L(\mu,\sum)=f_{\mu,\sum}(x^1)f_{\mu,\sum}(x^2)...f_{\mu,\sum}(x^n) L(μ,)=fμ,(x1)fμ,(x2)...fμ,(xn)
We assume x 1 , x 2 , x 3 , . . . , x n x^1,x^2,x^3,...,x^n x1,x2,x3,...,xn generate from the Gaussian( μ ∗ , ∑ ∗ \mu^*,\sum^* μ,) with the maximum likelihood.
μ ∗ , ∑ ∗ = a r g max ⁡ μ , ∑ L ( μ , ∑ ) \mu^*,\sum^*=arg\max_{\mu,\sum}L(\mu,\sum) μ,=argμ,maxL(μ,)

How to quickly calculate the parameters?

Average:
μ ∗ = 1 n ∑ i = 1 n x i \mu^*=\frac{1}{n}\sum_{i=1}^{n}x^i μ=n1i=1nxi
∑ ∗ = 1 n ∑ i = 1 n ( x i − μ ∗ ) ( x i − μ ∗ ) T \sum^*=\frac{1}{n}\sum_{i=1}^{n}(x^i-\mu^*)(x^i-\mu^*)^T =n1i=1n(xiμ)(xiμ)T

Now we can do classification!

If P ( C 1 ∣ X ) > 0.5 P(C_1|X)>0.5 P(C1X)>0.5 ,it means that x belongs to class 1.Then we already know every class’s μ \mu μ and ∑ \sum .
P ( C 1 ∣ X ) = P ( X ∣ C 1 ) P ( C 1 ) P ( X ∣ C 1 ) P ( C 1 ) + P ( X ∣ C 2 ) P ( C 2 ) P(C_1|X)=\frac{P(X|C_1)P(C_1)}{P(X|C_1)P(C_1)+P(X|C_2)P(C_2)} P(C1X)=P(XC1)P(C1)+P(XC2)P(C2)P(XC1)P(C1)
We know P ( C 1 ) P(C_1) P(C1) and P ( C 2 ) P(C_2) P(C2). P ( X ∣ C 1 ) P(X|C_1) P(XC1) and P ( X ∣ C 2 ) P(X|C_2) P(XC2) use Guassian.

Modifying Model

We can make different classes the same covariance matrix to reduce the number of the parameters.How to generate out Likelihood Function?:
L ( μ 1 , μ 2 , ∑ ) = f μ 1 , ∑ ( x 1 ) . . . f μ 1 , ∑ ( x n ) f μ 2 , ∑ ( x n + 1 ) . . . f μ 2 , ∑ ( x m ) L(\mu^1,\mu^2,\sum)=f_{\mu^1,\sum}(x^1)...f_{\mu^1,\sum}(x^n)f_{\mu^2,\sum}(x^{n+1})...f_{\mu^2,\sum}(x^m) L(μ1,μ2,)=fμ1,(x1)...fμ1,(xn)fμ2,(xn+1)...fμ2,(xm)
How to get the same ∑ \sum ?We assume that there are two classes, 60 and 80 in the training data.We need to combine two ∑ \sum .
∑ = 60 140 ∑ 1 + 80 140 ∑ 2 \sum=\frac{60}{140}\sum^1+\frac{80}{140}\sum^2 =140601+140802
And boundry will be linearbut not curve,called linear model.

Three Steps

Function Set(Model):
P ( C 1 ∣ x ) = P ( x ∣ C 1 ) P ( C 1 ) P ( x ∣ C 1 ) P ( C 1 ) + P ( x ∣ C 2 ) P ( C 2 ) P(C_1|x)=\frac{P(x|C_1)P(C_1)}{P(x|C_1)P(C_1)+P(x|C_2)P(C_2)} P(C1x)=P(xC1)P(C1)+P(xC2)P(C2)P(xC1)P(C1)
(if P ( C 1 ∣ x ) > 0.5 P(C_1|x)>0.5 P(C1x)>0.5,output:class1 Otherwise,output:class2)
Then Goodness of a function:The mean μ \mu μ and covariance ∑ \sum that maximizing the likelihood.And find the best function.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值