110人阅读 评论(0)

# 斯坦福公开课Machine Learning笔记(三)–Generalized Linear Models

### 1. The exponential family

Ng 给出了一个指数分布簇的定义：

P(y;η)=b(y)exp(ηTT(y)a(η))

P(y=1;ϕ)=ϕ

P(y=0;ϕ)=1ϕ

$\therefore$
P(y;ϕ)=ϕy(1ϕ)1y=exp(ylogϕ+(1y)log(1ϕ))=exp(log(1ϕ)+ylogϕ1ϕ)

η=logϕ1ϕ,ϕ=11+eη$\therefore \eta=\log{\frac{\phi}{1-\phi}}, \phi=\frac{1}{1+e^{-\eta}}$ (与sigmoid函数相似)

T(y)=y$T(y)=y$

a(η)=log(1ϕ)=log(1+eη)$a(\eta)=-\log{(1-\phi)} = \log(1+e^\eta)$

b(y)=1$b(y)=1$

$\therefore$ 伯努利分布可以写成GLM的形式,是其中一员.

P(y;μ)=12πexp(12(yμ)2)=12πexp(12y2)exp(μy12μ2)\begin{align} \therefore P(y;\mu)&=\frac{1}{\sqrt{2\pi}}exp(-\frac{1}{2}(y-\mu)^2)\\ &=\frac{1}{\sqrt{2\pi}}exp(-\frac{1}{2}y^2)\cdot exp(\mu y -\frac{1}{2}\mu^2)\\ \end{align}

η=μ$\therefore \eta=\mu$

T(y)=y$T(y)=y$

a(η)=12μ2=12η2$a(\eta)=\frac{1}{2}\mu^2=\frac{1}{2}\eta^2$

b(y)=exp(12y2)12π$b(y)=exp(-\frac{1}{2}y^2)\cdot \frac{1}{\sqrt{2\pi}}$

### 2.Constructing GLMs

1. y|x;θ$y|x;\theta$ ~ ExponentialFamily(η)$ExponentialFamily(\eta)$
2. 满足hθ(x)=E[y|x]$h_\theta(x)=E[y|x]$
3. η=θTx$\eta=\theta^Tx$ (满足线性关系)

#### 2.1 Ordinary least squares

hθ(x)=E[y|x;θ]=ϕ=η=θTx\begin{align} h_\theta(x)&=E[y|x;\theta]\\ &=\phi\\ &=\eta\\ &=\theta^Tx\\ \end{align}

#### 2.2 Logistic regression

T(1)=100,T(2)=010,...,T(k1)=001,T(k)=000,T(y)Rk1$T(1)=\begin{bmatrix} 1 \\ 0\\\vdots\\0 \end{bmatrix},T(2)=\begin{bmatrix} 0 \\ 1\\\vdots\\0 \end{bmatrix},...,T(k-1)=\begin{bmatrix} 0 \\ 0\\\vdots\\1 \end{bmatrix},T(k)=\begin{bmatrix} 0 \\ 0\\\vdots\\0 \end{bmatrix}, T(y)\in R^{k-1}$
Ng先定义:

1{True}=1,1{False}=0$1\{True\}=1, 1\{False\}=0$

E[(T(y))i]=P(y=i)=ϕi$\because E[(T(y))_i]=P(y=i)=\phi_i$

P(y;ϕ)=ϕ1{y=1}1ϕ1{y=2}2...ϕ1{y=k}k=ϕ1{y=1}1ϕ1{y=2}2...ϕ1k1i=11{y=i}k=ϕ(T(y)11ϕ(T(y))22...ϕ1k1i=1(T(y))ik=exp((T(y))1logϕ1+(T(y))2logϕ2+...+(1i=1k1(T(y))i)logϕk)=exp((T(y))1logϕ1ϕk+(T(y))2logϕ2ϕk+...+(T(y))k1logϕk1ϕk+logϕk)=b(y)exp(ηTT(y)a(η))\begin{align} \therefore P(y;\phi)&=\phi_1^{1\{y=1\}}\phi_2^{1\{y=2\}}...\phi_k^{1\{y=k\}}\\ &=\phi_1^{1\{y=1\}}\phi_2^{1\{y=2\}}...\phi_k^{1-\sum_{i=1}^{k-1}{1\{y=i\}}}\\ &=\phi_1^{(T(y)_1}\phi_2^{(T(y))_2}...\phi_k^{1-\sum_{i=1}^{k-1}{(T(y))_i}}\\ &=exp((T(y))_1\log{\phi_1}+(T(y))_2\log{\phi_2}+...+(1-\sum_{i=1}^{k-1}{(T(y))_i)}\log{\phi_k})\\ &=exp((T(y))_1\log{\frac{\phi_1}{\phi_k}}+(T(y))_2\log{\frac{\phi_2}{\phi_k}}+...+(T(y))_{k-1}\log{\frac{\phi_{k-1}}{\phi_k}}+\log{\phi_k})\\ &=b(y)exp(\eta^TT(y)-a(\eta))\\ \end{align}

η=logϕ1ϕklogϕ2ϕklogϕk1ϕk$\therefore \eta=\begin{bmatrix} \log{\frac{\phi_{1}}{\phi_k}} \\ \log{\frac{\phi_{2}}{\phi_k}} \\ \vdots\\\log{\frac{\phi_{k-1}}{\phi_k}}\end{bmatrix}$
a(η)=logϕk$a(\eta)=-\log{\phi_k}$

b(y)=1$b(y)=1$

ηi=logϕiϕk$\therefore \eta_i=\log{\frac{\phi_i}{\phi_k}}$

eηi=ϕiϕk$\because e^{\eta_i}=\frac{\phi_i}{\phi_k}$

ϕkeηi=ϕi(*)$\therefore \phi_ke^{\eta_i}=\phi_i \text{(*)}$

ϕkki=1eηi=ki=1ϕi=1$\because \phi_k\sum_{i=1}^k{e^{\eta_i}}=\sum_{i=1}^k{\phi_i}=1$

ϕk=1ki=1eηi$\therefore \phi_k=\frac{1}{\sum_{i=1}^k{e^{\eta_i}}}$

ϕi=eηikj=1eηj$\phi_i=\frac{e^{\eta_i}}{\sum_{j=1}^k{e^{\eta_j}}}$

θk=0,ηk=θTkx=0$\theta_k=0,\eta_k=\theta_k^Tx=0$

P(y=i|x;θ)=ϕi=eηikj=1eηj=eθTixkj=1eθTjx\begin{align} \therefore P(y=i|x;\theta)&=\phi_i\\ &=\frac{e^{\eta_i}}{\sum_{j=1}^k{e^{\eta_j}}}\\ &=\frac{e^{\theta_i^Tx}}{\sum_{j=1}^k{e^{\theta_j^Tx}}}\\ \end{align}

hθ(x)=E[T(y)|x;θ]=E1{y=1}1{y=2}1{y=k1}x;θ=ϕ1ϕ2ϕk1=eη1kj=1eηjeη2kj=1eηjeηk1kj=1eηj\begin{align} \therefore h_\theta(x)&=E[T(y)|x;\theta]\\ &=E\left[\begin{array}{c|c}1\{y=1\} \\ 1\{y=2\}\\\vdots\\1\{y=k-1\} &x;\theta\end{array}\right]\\ &=\begin{bmatrix} \phi_1 \\ \phi_2 \\\vdots\\\phi_{k-1} \end{bmatrix}\\ &=\begin{bmatrix} \frac{e^{\eta_1}}{\sum_{j=1}^k{e^{\eta_j}}} \\ \frac{e^{\eta_2}}{\sum_{j=1}^k{e^{\eta_j}}} \\\vdots\\\frac{e^{\eta_{k-1}}}{\sum_{j=1}^k{e^{\eta_j}}} \end{bmatrix}\\ \end{align}

1
0

* 以上用户言论只代表其个人观点，不代表CSDN网站的观点或立场
个人资料
• 访问：3520次
• 积分：182
• 等级：
• 排名：千里之外
• 原创：15篇
• 转载：0篇
• 译文：0篇
• 评论：1条
文章分类
文章存档
最新评论