CS229 Lecture Notes(3): Generalized Linear Models

最新推荐文章于 2024-07-24 18:14:54 发布

weitian_bnu

最新推荐文章于 2024-07-24 18:14:54 发布

阅读量330

点赞数

分类专栏：浅层学习文章标签：机器学习浅层学习 CS229 学习笔记

本文链接：https://blog.csdn.net/weitian_bnu/article/details/51179356

版权

浅层学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

The exponential family

A class of distributions is in the exponential family if it can be written in the from

p(y,η)=b(y)exp(ηTT(y)−a(η))

where:
- $\eta$ : the natural parameter (also called the canonical parameter)
- $T(y)$ : the sufficient statistic (often be the case that $T(y)=y$ )
- $a(\eta)$ : the log partition function ( $e^{-a(\eta)}$ plays the role as a normalization constant)
在指数分布族中，给定 $T$ 、 $a$ 、和 $b$ 的函数形式，我们就确定了一组以 $\eta$ 为参数的分布族。
Bernoulli distribution family:

p(y;ϕ)=ϕy(1−ϕ)1−y=exp(ylogϕ+(1−y)log(1−ϕ))=exp((log(ϕ1−ϕ))y+log(1−ϕ))

thus we have:
- $\eta=\log(\frac{\phi}{1-\phi})$
- $\phi=1/(1+e^{-\eta})$ (the Sigmoid function!)
- $T(y)=y$
- $a(\eta)=-\log(1-\phi)=log(1+e^\eta)$
- $b(y)=1$
Bernoulli分布是指数分布族的一个例子。值得注意的是，如果我们将Bernoulli分布写成指数分布的形式，并用参数 $\eta$ 来表示 $y=1$ 的概率 $\phi$ ，我们很自然地得到了logistic function： $\phi=1/(1+e^{-\eta})$ 。在后面学习GLM时，我们将进一步阐释这个结论。
Gaussian distribution family (for simplicity we set $\sigma^2=1$ ):

p(y;μ)=12π‾‾‾√exp(−12(y−μ)2)=12π‾‾‾√exp(−12y2)⋅exp(μy−12μ2)

thus we have:
- $\eta=\mu$
- $T(y)=y$
- $a(\eta)=\mu^2/2=\eta^2/2$
- $b(y)=(1/\sqrt{2\pi})exp(-y^2/2)$
Gaussian分布也是指数分布族的一个例子。只不过，对于Gaussian分布而言，其均值 $\mu$ （也是要预估的 $y$ ）恰是其对应指数分布的参数 $\eta$ 。后面将会看到为什么要将这些分布写成以 $\eta$ 为参数的指数分布的形式。

Constructing GLMs

Motivation: Given the distribution family of response variable (such as Bernoulli distribution or Gaussian distribution), how can we construct a regression/classification hypothesis?
Three assumptions for constructing a Generalized Linear Model:
- $p(y|x;\theta)\sim ExponentialFamily(\eta)$
- $h(x)=E[T(y)|x]$ (for most cases, $T(y)=y$ , which leads to $h(x)=E[y|x]$ )
- $\eta=\theta^Tx$ (design choice)
通过上面三个假定得到的模型 $h(x)$ 称之为Generalized Linear Model。后面会看到，通过这种方式得到的GLMs有着很多优雅的性质，使得模型的学习更加简单高效。
Derivative of Ordinary Least Squares (OLS):
- probabilistic assumption: $p(y|x)\sim\mathcal{N}(\mu,\sigma^2)\sim ExponentialFamily(\eta)$
- canonical response function: $g(\eta)=E[T(y)|x;\eta]=\mu=\eta$
- hypothesis: $h_\theta(x)=g(\theta^Tx)=\theta^Tx$
Derivative of Logistic Regression:
- probabilistic assumption: $p(y|x)\sim Bernoulli(\phi) \sim ExponentialFamily(\eta)$
- canonical response function: $g(\eta)=E[T(y)|x;\eta]=\phi=\frac{1}{1+e^{-\eta}}$
- hypothesis: $h_\theta(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}}$

无论是linear regression，还是logistic regression，都是广义线性模型的一个特例。这也隐含着二者在学习算法上的相通性。

Derivative of Softmax Regression:
- multi-classification problem
- probabilistic assumption: p(y|x)∼Multinomial(ϕ1,...,ϕk−1)∼ExponentialFamily(η) with:
  - $T(y)\in \mathcal{R}^{k-1}$ and
    $T (y) i = 1 {y = i} = {10 y = i y \neq i$ $T(y)_i=1\{y=i\}= \begin{cases} 1 & y = i \\ 0 & y \neq i \end{cases}$
  - $a(\eta)=-\log(\phi_k)=-\log(1-\sum_{i=1}^{k-1}{\phi_i})$
  - $b(y)=1$
  - $\eta \in \mathcal{R}^{k-1}$ and
    $η i = log ϕ i ϕ k$ $\eta_i=\log\frac{\phi_i}{\phi_k}$
- canonical response function:
  $g (η) i = E [T (y) i | x; η] = ϕ i = e η i 1 + \sum k - 1 j = 1 e η j$ $g(\eta)_i = E[T(y)_i|x;\eta] = \phi_i = \frac{e^{\eta_i}}{1 + \sum_{j=1}^{k-1}{e^{\eta_j}}}$
  which is called the softmax function
- hypothesis:
  $[h θ (x)] i = g (η) i = e θ T i x 1 + \sum k - 1 j = 1 e θ T j x$ $[h_\theta(x)]_i = g(\eta)_i = \frac{e^{\theta_i^Tx}}{1 + \sum_{j=1}^{k-1}{e^{\theta_j^Tx}}}$
  which is called the softmax regression

weitian_bnu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CS229 Lecture Notes(3): Generalized Linear Models

The exponential familyA class of distributions is in the exponential family if it can be written in the from p(y,η)=b(y)exp(ηTT(y)−a(η))p(y,\eta)=b(y)exp(\eta^TT(y)-a(\eta)) where:η\eta: the natural
复制链接

扫一扫