看了一下斯坦福大学公开课:机器学习教程(吴恩达教授),记录了一些笔记,写出来以便以后有用到。笔记如有误,还望告知。
本系列其它笔记:
线性回归(Linear Regression)
分类和逻辑回归(Classification and logistic regression)
广义线性模型(Generalized Linear Models)
生成学习算法(Generative Learning algorithms)
广义线性模型(Generalized Linear Models)
我们目前学习的两种不同算法对p(y|x; θ \theta θ)进行建模:
y ∈ R G a u s s i a n d i s t r i b u t i o n → l e a s t s q u a r e s o f l i n e a r r e g r e s s i o n y ∈ { 0 , 1 } B e r n o u l l i d i s t r i b u t i o n → l o g i s t i c r e g r e s s i o n y \in \R \quad Gaussian \quad distribution \rightarrow least \ squares \ of \ linear \ regression \\ y \in \lbrace 0, 1 \rbrace \quad Bernoulli \quad distribution \rightarrow logistic \ regression y∈RGaussiandistribution→least squares of linear regressiony∈{
0,1}Bernoullidistribution→logistic regression
1 指数分布族(The exponential family)
指数分布族可写成如下形式:
p ( y ; η ) = b ( y ) e x p ( η T T ( y ) − a ( η ) ) η → 分 布 的 自 然 参 数 ( n a t u r a l p a r a m e t e r ) T ( y ) → 充 分 统 计 量 ( s u f f i c i e n t s t a t i s t i c ) 通 常 情 况 下 T ( y ) = y p(y;\eta) = b(y)exp(\eta^{T}T(y) - a(\eta)) \\ \eta \rightarrow 分布的自然参数(natural \quad parameter) \\ T(y) \rightarrow 充分统计量(sufficient \quad statistic) 通常情况下T(y) = y p(y;η)=b(y)exp(ηTT(y)−a(η))η→分布的自然参数(naturalparameter)T(y)→充分统计量(sufficientstatistic)通常情况下T(y)=y
对于伯努利分布
B e r ( ϕ ) = { p ( y = 1 ∣ ϕ ) = ϕ p ( y = 0 ∣ ϕ ) = 1 − ϕ Ber(\phi) = \left\{\begin{array}{} p(y = 1 \ | \ \phi) = \phi \\ p(y = 0 \ | \ \phi) = 1 - \phi \end{array}\right. Ber(ϕ)={
p(y=1 ∣ ϕ)=ϕp(y=0 ∣ ϕ)=1−ϕ
p ( y ∣ ϕ ) = ϕ ( y ) ( 1 − ϕ ) ( 1 − y ) = exp ( log ( ϕ ( y ) ( 1 − ϕ ) ( 1 − y ) ) ) = exp ( log ( ϕ ( y ) ) + log ( ( 1 − ϕ ) ( 1 − y ) ) ) = exp ( y log ( ϕ ) + ( 1 − y ) log ( 1 − ϕ ) ) = exp ( y log ( ϕ 1 − ϕ ) + log ( 1 − ϕ ) ) p(y \ | \ \phi) = \phi^{(y)}(1-\phi)^{(1-y)} \\ = \exp(\log(\phi^{(y)}(1-\phi)^{(1-y)})) \\ = \exp(\log(\phi^{(y)}) + \log((1-\phi)^{(1-y)})) \\ = \exp(y\log(\phi) + (1-y)\log(1-\phi)) \\ = \exp(y\log(\frac{\phi}{1-\phi}) + \log(1-\phi)) p(y ∣ ϕ)=ϕ(y)(1−ϕ)(1−y)=exp(log(ϕ(y)(1−ϕ)(1−y)))=exp(log(ϕ(y))+log((1−ϕ)(1−y)))=exp(ylog(ϕ)+(1−y)log(1−ϕ))=exp(ylog(1−ϕϕ)+log(1−ϕ))
令 T ( y ) = y , b ( y ) = 1 , η = log ϕ 1 − ϕ T(y) = y, b(y) = 1, \eta = \log\frac{\phi}{1-\phi} T(y)=y,b(y)=1,η=log1−ϕϕ,则 ϕ = 1 1 + e − η , a ( η ) = − log ( 1 − ϕ ) = log ( 1 + e η ) \phi = \frac{1}{1 + e^{-\eta}},a(\eta) = -\log(1 - \phi) = \log(1+e^{\eta}) ϕ=1+e−η1,a(η)=−log(1−ϕ)=log(1+eη)
对于高斯分布
p ( y ∣ μ ; σ 2 ) = 1 2 π σ exp ( − ( y − μ ) 2 2 σ 2 ) = 1 2 π σ exp ( − ( y 2 − 2 y μ + μ 2 ) 2 σ 2 ) = 1 2 π σ exp ( − y 2 2 σ 2 ) exp ( 2 y μ − μ 2 2 σ 2 ) p(y \ |\ \mu; \sigma^2) = \frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{(y - \mu)^2}{2\sigma^2}) \\ = \frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{(y^2 - 2y\mu + \mu^2)}{2\sigma^2}) \\ = \frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{y^2}{2\sigma^2})\exp(\frac{2y\mu - \mu^2}{2\sigma^2}) p(y ∣ μ;σ2)=2πσ1exp(−2σ2(y−μ)2)=2πσ1exp(−2σ2(y2−2yμ+μ2))=2πσ1exp(−2σ2y2)exp(2σ22yμ−μ2)
令 T ( y ) = y , b ( y ) = 1 2 π σ exp ( − y 2 2 σ 2 ) , η = μ σ 2 T(y) = y, b(y) = \frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{y^2}{2\sigma^2}), \eta = \frac{\mu}{\sigma^2} T(y)=y,b(y)=2π