机器学习笔记1-Supervised learning

1.1 Generalized Linear Models
both of these methods mentioned before are special cases of a broader family of models, called Generalized Linear Models (GLMs). We will also show how other models in the GLM family can be derived and applied to other classification and regression problem.
The exponential family
We say that a class of distributions is in the exponential family if it can be written in the form :

p(y:η)=b(y)exp(ηTT(y)a(η))

η is called the natural parameter(also called the canonical parameter);
T(y) is the sufficient statistic ;
a(η) is the log partition function;
The quantity ea(η) essentially plays the role of a normalization constant, that makes sure the distribution p(y;η) sums/integrates over y to 1.
A fixed choice of T, a and b defines a family (or set) of distributions that is parameterized by η ; as we vary η , we then get different distributions within this family.
We now show that the Bernoulli and the Gaussian distributions are examples of exponential family distribution:
Bernoulli(ϕ) :
the Bernoulli distribution with mean ϕ ;
p(y=1;ϕ)=ϕ;p(y=0;ϕ)=1ϕ
We write the Bernoulli distribution as:
p(y;ϕ)=ϕy(1ϕ)(1y)

=exp(ylogϕ+(1y)log(1ϕ))

=exp((log(ϕ1ϕ))y+log(1ϕ))

η=(log(ϕ1ϕ))=log(1+eη)

b(y)=1

T(y)=y

a(η)=log(1ϕ)

Gaussian distribution
Recall that, when deriving linear regression, the value of σ2 had no effect on our final choice of θ and hθ(x). Thus, we can choose an arbitrary value for σ2 without changing anything. To simplify the derivation below, lets set σ2 = 1 We then have:
p(y;μ)=1(2π)exp(12(yμ)2)

=1(2π)exp(12y2)exp(μy12μ2)

Thus, we see that the Gaussian is in the exponential family, with:
η=μ

T(y)=y

a(η)=12μ2=12η2

b(y)=1(2π)exp(12y2)

There’re many other distributions that are members of the exponential family: The multinomial (which we’ll see later), the Poisson (for modelling count-data; also see the problem set); the gamma and the exponential (for modelling continuous, non-negative random variables, such as timeintervals); the beta and the Dirichlet (for distributions over probabilities); and many more. In the next section, we will describe a general “recipe” for constructing models in which y (given x and θ) comes from any of these distributions.
Constructing GLMs
More generally, consider a classification or regression problem where we would like to predict the value of some random variable y as a function of x. To derive a GLM for this problem, we will make the following three assumptions about the conditional distribution of y given x and about our model:
1. y|x;θ ∼ ExponentialFamily( η ). I.e., given x and θ, the distribution of y follows some exponential family distribution, with parameter η .
2. Given x, our goal is to predict the expected value of T(y) given x. In most of our examples, we will have T(y) = y, so this means we would like the prediction h(x) output by our learned hypothesis h to satisfy h(x)=E[y|x] . (Note that this assumption is satisfied in the choices for hθ(x) for both logistic regression and linear regression. For instance, in logistic regression, we had hθ(x)=p(y=1|x;θ)=0p(y=0|x;θ)+1p(y=1|x;θ)=E[y|x;θ].)
3. The natural parameter η and the inputs x are related linearly: η = θTx . (Or, if η is vector-valued, then ηi=θiTx .)
Ordinary Least Squares
To show that ordinary least squares is a special case of the GLM family of models, consider the setting where the target variable y (also called the response variable in GLM terminology) is continuous, and we model the conditional distribution of y given x as as a Gaussian N(µ,σ2). (Here, µ may depend x.) So, we let the ExponentialFamily(η) distribution above be the Gaussian distribution. As we saw previously, in the formulation of the Gaussian as an exponential family distribution, we had µ = η. So, we have:
hθ(x)=E[y|x;θ]

=µ

=η

=θTx

Logistic Regression
We now consider logistic regression. Here we are interested in binary classification, so y ∈{0,1}. Given that y is binary-valued, it therefore seems natural to choose the Bernoulli family of distributions to model the conditional distribution of y given x. In our formulation of the Bernoulli distribution as an exponential family distribution, we had ϕ=1/(1+eη) . Furthermore, note that if y|x;θ ∼ Bernoulli( ϕ ), then E[y|x;θ] = ϕ . So, following a similar derivation as the one for ordinary least squares, we get:
hθ(x)=E[y|x;θ]

=ϕ

=1/(1+eη)

=1/(1+eθTx)

So, this gives us hypothesis functions of the form hθ(x)=1/(1+eθTx) . If you are previously wondering how we came up with the form of the logistic function 1/(1+ez) , this gives one answer: Once we assume that y conditioned on x is Bernoulli, it arises as a consequence of the definition of GLMs and exponential family distributions. To introduce a little more terminology, the function g giving the distribution’s mean as a function of the natural parameter (g(η) = E[T(y);η]) is called the canonical response function. Its inverse, g−1, is called the canonical link function. Thus, the canonical response function for the Gaussian family is just the identify function; and the canonical response function for the Bernoulli is the logistic function

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值