Gaussian Discriminative Analysis 高斯判别分析 GDA

Gaussian Discriminative Analysis 高斯判别分析 GDA

Multidimensional Gaussian Model

z ∼ N ( μ ⃗ , Σ ) z \sim N(\vec\mu,\Sigma) zN(μ ,Σ)
z ∈ R n , μ ⃗ ∈ R n , Σ ∈ R n ∗ n z \in R^n,\vec\mu \in R^n, \Sigma \in R^{n*n} zRn,μ Rn,ΣRnn
z z z – variable
μ ⃗ = [ μ 1 μ 2 . . . μ n ] \vec\mu = \begin{bmatrix} \mu_1\\ \mu_2 \\ ... \\ \mu_n \end{bmatrix} μ =μ1μ2...μn – mean vector
Σ \Sigma Σ – covarience matrix
All the Gaussian models share one covarience matrix.

E ( z ) = μ ⃗ , C o v ( z ) = E [ ( x − μ ⃗ ) ( x − μ ⃗ ) T ] = E ( z z T ) − ( E ( z ) ) ( E ( z ) ) T E(z) = \vec\mu, Cov(z)=E[(x-\vec\mu)(x-\vec\mu)^T]=E(zz^T)-(E(z))(E(z))^T E(z)=μ ,Cov(z)=E[(xμ )(xμ )T]=E(zzT)(E(z))(E(z))T

Intro

GDA assumes:
x ∣ y = 0 ∼ N ( μ 0 , Σ ) x|y=0 \sim N(\mu_0,\Sigma) xy=0N(μ0,Σ)
x ∣ y = 1 ∼ N ( μ 1 , Σ ) x|y=1 \sim N(\mu_1,\Sigma) xy=1N(μ1,Σ)
y ∼ B e r ( ϕ ) , ϕ = P ( y = 1 ) y \sim Ber(\phi), \phi = P(y=1) yBer(ϕ),ϕ=P(y=1)

GDA model(binary classification)

Multivariate Gaussian distribution:
P ( x ) = 1 ( 2 π ) d 2 ∣ Σ ∣ 1 2 e x p ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) P(x) = \frac{1}{(2\pi)^{\frac d2}|\Sigma|^{\frac12}}exp(-\frac12(x-\mu)^T\Sigma^{-1}(x-\mu)) P(x)=(2π)2dΣ211exp(21(xμ)TΣ1(xμ))
∣ Σ ∣ |\Sigma| Σ is the value of determinant of Σ \Sigma Σ


parameter: μ 0 , μ 1 , Σ , ϕ \mu_0,\mu_1, \Sigma, \phi μ0,μ1,Σ,ϕ
P ( y ) = ϕ y ( 1 − ϕ ) 1 − y P(y) = \phi^y(1-\phi)^{1-y} P(y)=ϕy(1ϕ)1y
ϕ \phi ϕ is prior probability, and it depends on the proportion of two classes.


Joint likelihood:
L ( ϕ , μ 0 , μ 1 , Σ ) = ∑ i = 1 m P ( x ( i ) , y ( i ) ; ϕ , μ 0 , μ 1 , Σ ) = ∑ i = 1 m P ( x ( i ) ∣ y ( i ) ) P ( y ( i ) ) L(\phi, \mu_0, \mu_1, \Sigma) = \sum\limits_{i=1}^mP(x^{(i)},y^{(i)};\phi, \mu_0, \mu_1, \Sigma) = \sum\limits_{i=1}^mP(x^{(i)}|y^{(i)})P(y^{(i)}) L(ϕ,μ0,μ1,Σ)=i=1mP(x(i),y(i);ϕ,μ0,μ1,Σ)=i=1mP(x(i)y(i))P(y(i))
MLE: arg ⁡ max ⁡ ϕ , μ 0 , μ 1 , Σ l ( ϕ , μ 0 , μ 1 , Σ ) \arg\max\limits_{\phi, \mu_0, \mu_1, \Sigma}l(\phi, \mu_0, \mu_1, \Sigma) argϕ,μ0,μ1,Σmaxl(ϕ,μ0,μ1,Σ)
ϕ = ∑ i = 1 m y ( i ) m = ∑ i = 1 m 1 { y ( i ) = 1 } m \phi = \frac{\sum\limits_{i=1}^my^{(i)}}{m}=\frac{\sum\limits_{i=1}^m1\{y^{(i)}=1\}}{m} ϕ=mi=1my(i)=mi=1m1{y(i)=1}
μ k = ∑ i = 1 m 1 { y ( i ) = k } x ( i ) ∑ i = 1 m 1 { y ( i ) = k } , k ∈ { 0 , 1 } \mu_k = \frac{\sum\limits_{i=1}^m1\{y^{(i)}=k\}x^{(i)}}{\sum\limits_{i=1}^m1\{y^{(i)}=k\}},k\in \{0,1\} μk=i=1m1{y(i)=k}i=1m1{y(i)=k}x(i),k{0,1}
Σ = 1 m ∑ i = 1 m ( x ( i ) − μ y ( i ) ) ( x ( i ) − μ y ( i ) ) T \Sigma = \frac1m\sum\limits_{i=1}^m(x^{(i)}-\mu_{y^{(i)}})(x^{(i)}-\mu_{y^{(i)}})^T Σ=m1i=1m(x(i)μy(i))(x(i)μy(i))T

Based on the two Gaussian models, we can draw a boundary line.
图片来源
在这里插入图片描述

Prediction

arg ⁡ max ⁡ y P ( y ∣ x ) = arg ⁡ max ⁡ y P ( x ∣ y ) P ( y ) P ( x ) = arg ⁡ max ⁡ y P ( x ∣ y ) P ( y ) \arg\max\limits_yP(y|x) = \arg\max\limits_y \frac{P(x|y)P(y)}{P(x)}=\arg\max\limits_yP(x|y)P(y) argymaxP(yx)=argymaxP(x)P(xy)P(y)=argymaxP(xy)P(y)
( P ( x ) P(x) P(x) is a constant)

& Logistic Regression

图片是我的笔记
在这里插入图片描述
The picture shows when our data is 1D the function looks like Sigmoid function. Actually, it is Sigmoid function and it also applys to higher dimension. I won’t prove it here.
GDA is a stricter version of logistic regression because the data has to follow Gaussian distribution.
When the data follows Gaussian distribution or the data is very big(according to the central limit theorem), GDA works better than logistic regression.
Also, the data follows Gaussian distribution so the model has no local optima.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值