Generative Learning Algorithm
discriminative learning algorithm: Algorithms try to learn p(y|x) directly or try to learn mappings f(x) directly from the space of inputs to the labels y
generative learning algorithm: Algorithms try to model
p(x|y) and prior distribution p(y) , and use Bayes rule to derive the posterior distribution p(y|x) :
p(y|x)=p(x|y)p(y)p(x)∝p(x|y)p(y)
and predict y as:
y=argmaxyp(y|x)=argmaxyp(x|y)p(y) Bayes法则在统计机器学习中有很多应用场景。除了出现在这里的生成式学习模型,对于Bayes学派而言,Bayes法则还可以用于模型的参数估计,以及模型的选择。(参见PRML第三章的学习笔记)
Gaussian Discriminant Analysis
multivariate normal distribution
-
p(x;μ,Σ)=1(2π)n/2|Σ|1/2exp(−12(x−μ)TΣ−1(x−μ))
where:
- μ∈n is the mean vector
- Σ∈n×n is the covariance matrix
- E[X]=∫xxp(x;μ,Σ)dx=μ
- Cov[X]=E[XXT]−E[X]E[X]T=Σ
-
p(x;μ,Σ)=1(2π)n/2|Σ|1/2exp(−12(x−μ)TΣ−1(x−μ))
GDA models
- solving classification problems with continuous-valued feature
model assumption:
- p(y)∼Bernoulli(ϕ)
- p(x|y=k)∼(μk,Σ),k∈{0,1}
注意:在模型中我们假定了不同label下的feature共享着同一个协方差矩阵,尽管它们有着不一样的均值。
likelihood function:
l(ϕ,{μk},Σ)=log∏i=1mp(x(i),y(i);ϕ,{μk},Σ)=log∏i=1mp(x(i)|y(i));{μk},Σ)p(y(i);ϕ)区别于判别式模型,生成式模型的似然函数是对整个数据集的联合概率分布 p(x,y) 进行计算。
maximum likelihood estimation:
- ϕ=1m∑mi=11(y(i)=1)
- μk=mean(x|y=k)=∑mi=11{y(i)=k}x(i)∑mi=11{y(i)=k}
- Σ=1m∑mi=1(x(i)−uy(i))(x(i)−uy(i))T
直观上来看,最大似然法给出的GDA模型参数的预估结果可以用简单的频率统计来理解。例如, ϕ 就是正样本 y(i)=1 占整体样本的比例; μk 就是 x 在某一类label(
y(i)=k )上的均值。decision boundary:
p(y=1|x)=p(x,y=1)p(x,y=0)+p(x,y=1)=0.5
which equals with:
p(x,y=0)=p(x,y=1)这里的decision boundary也可以理解为是两个概率分布相交的平面。
GDA model vs. logistic regression model
- GDA model could be expressed in the form of logistic regression model with:
- θ is parameterized by ϕ , {μk} , and Σ
- ϕ , {μk} , and Σ are determined by GDA assumptions
- GDA model:
- stronger assumptions: p(x|y) is multivariate gaussian with shared Σ
- more efficient if these assumptions are correct
- logistic regression:
- weaker assumptions
- more robust
GDA给出的 p(y|x) 概率模型,本质上是一个更强假定下的logistic function。但反过来,从logistic function不一定能推出GDA模型的假定。事实上,如果我们认为 p(x|y) 服从一个泊松分布,也一样能得到logistic function形式的 p(y|x) 。因此,logistic regression在实际应用中是一类更为通用的算法。
- GDA model could be expressed in the form of logistic regression model with: