CS229 Lecture Notes(4): Generative Learning Algorithm

最新推荐文章于 2024-07-24 18:14:54 发布

weitian_bnu

最新推荐文章于 2024-07-24 18:14:54 发布

阅读量229

点赞数

分类专栏：浅层学习文章标签：机器学习浅层学习 CS229 学习笔记

本文链接：https://blog.csdn.net/weitian_bnu/article/details/51179366

版权

浅层学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Generative Learning Algorithm

discriminative learning algorithm: Algorithms try to learn $p(y|x)$ directly or try to learn mappings $f(x)$ directly from the space of inputs $\mathcal{X}$ to the labels $y$
generative learning algorithm: Algorithms try to model $p(x|y)$ and prior distribution $p(y)$ , and use Bayes rule to derive the posterior distribution $p(y|x)$ :

$p (y | x) = p ( x | y ) p ( y ) p ( x ) \propto p (x | y) p (y)$ $p(y|x)=\frac{p(x|y)p(y)}{p(x)}\propto p(x|y)p(y)$
and predict $y$ as:
$y = arg max y p (y | x) = arg max y p (x | y) p (y)$ $y=\arg\max_y{p(y|x)}=\arg\max_y{p(x|y)p(y)}$

Bayes法则在统计机器学习中有很多应用场景。除了出现在这里的生成式学习模型，对于Bayes学派而言，Bayes法则还可以用于模型的参数估计，以及模型的选择。（参见PRML第三章的学习笔记）

Gaussian Discriminant Analysis

multivariate normal distribution
- p(x;μ,Σ)=1(2π)n/2|Σ|1/2exp(−12(x−μ)TΣ−1(x−μ))
  where:
  - $\mu \in \mathcal{R}^n$ is the mean vector
  - $\Sigma \in \mathcal{R}^{n\times n}$ is the covariance matrix
- $E[X]=\int_x{xp(x;\mu,\Sigma)dx}=\mu$
- $Cov[X]=E[XX^T]-E[X]E[X]^T=\Sigma$
GDA models
- solving classification problems with continuous-valued feature $\mathcal{X}$
- model assumption:
  - $p(y)\sim Bernoulli(\phi)$
  - $p(x|y=k)\sim \mathcal{N}(\mu_k, \Sigma),k\in\{0,1\}$
  注意：在模型中我们假定了不同label下的feature共享着同一个协方差矩阵，尽管它们有着不一样的均值。
- likelihood function:
  
  $l (ϕ, {μ k}, Σ) = log \prod i = 1 m p (x (i), y (i); ϕ, {μ k}, Σ) = log \prod i = 1 m p (x (i) | y (i)); {μ k}, Σ) p (y (i); ϕ)$ $\begin{aligned} l(\phi,\{\mu_k\},\Sigma) & = \log{\prod_{i=1}^m{p(x^{(i)},y^{(i)};\phi,\{\mu_k\},\Sigma)}} \\ & = \log{\prod_{i=1}^m{p(x^{(i)}|y^{(i)});\{\mu_k\},\Sigma)p(y^{(i)};\phi)}} \end{aligned}$
  
  区别于判别式模型，生成式模型的似然函数是对整个数据集的联合概率分布 $p(x,y)$ 进行计算。
- maximum likelihood estimation:
  - $\phi=\frac{1}{m}\sum_{i=1}^m{1(y^{(i)}=1)}$
  - $\mu_k=mean(x|y=k)=\frac{\sum_{i=1}^m1\{y^{(i)}=k\}x^{(i)}}{\sum_{i=1}^m1\{y^{(i)}=k\}}$
  - $\Sigma=\frac{1}{m}\sum_{i=1}^m{(x^{(i)}-u_{y^{(i)}})(x^{(i)}-u_{y^{(i)}})^T}$
  直观上来看，最大似然法给出的GDA模型参数的预估结果可以用简单的频率统计来理解。例如， $\phi$ 就是正样本 $y^{(i)}=1$ 占整体样本的比例； $\mu_k$ 就是 $x$ 在某一类label（ $y^{(i)}=k$ ）上的均值。
- decision boundary:
  
  $p (y = 1 | x) = p ( x , y = 1 ) p ( x , y = 0 ) + p ( x , y = 1 ) = 0.5$ $p(y=1|x)=\frac{p(x,y=1)}{p(x,y=0)+p(x,y=1)}=0.5$
  which equals with:
  $p (x, y = 0) = p (x, y = 1)$ $p(x,y=0)=p(x,y=1)$
  
  这里的decision boundary也可以理解为是两个概率分布相交的平面。
GDA model vs. logistic regression model
- GDA model could be expressed in the form of logistic regression model with:
  - $\theta$ is parameterized by $\phi$ , $\{\mu_k\}$ , and $\Sigma$
  - $\phi$ , $\{\mu_k\}$ , and $\Sigma$ are determined by GDA assumptions
- GDA model:
  1. stronger assumptions: $p(x|y)$ is multivariate gaussian with shared $\Sigma$
  2. more efficient if these assumptions are correct
- logistic regression:
  1. weaker assumptions
  2. more robust
GDA给出的 $p(y|x)$ 概率模型，本质上是一个更强假定下的logistic function。但反过来，从logistic function不一定能推出GDA模型的假定。事实上，如果我们认为 $p(x|y)$ 服从一个泊松分布，也一样能得到logistic function形式的 $p(y|x)$ 。因此，logistic regression在实际应用中是一类更为通用的算法。

weitian_bnu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CS229 Lecture Notes(4): Generative Learning Algorithm

Generative Learning Algorithmdiscriminative learning algorithm: Algorithms try to learn p(y|x)p(y|x) directly or try to learn mappings f(x)f(x) directly from the space of inputs \mathcal{X} to the la
复制链接

扫一扫