Probabilisic interpretaion

weixin_30535913

于 2013-04-13 09:44:00 发布

阅读量64

点赞数

原文链接：http://www.cnblogs.com/ustccjw/archive/2013/04/13/3017741.html

版权

1. Guide

When faced with a regression problem, why might linear regression, and specifically why might the least-squares cost function J, be a reasonable choice? In this section, we will give a set of probabilistic assumptions, under which least-squares regression is derived as a very natural algorithm.

2. Let us assume that the target variables and the inputs are related via the equation:

　　　　　　 y⁽ⁱ⁾ = θ^T x⁽ⁱ⁾ + ε⁽ⁱ⁾,

where e⁽ⁱ⁾ is an error term that captures either unmodeled effects (such as if there are some features very pertinent to predicting housing price, but we’d left out of the regression), or random noise.

We assume ε⁽ⁱ⁾ distributed IID (independently and identically distributed), and ε⁽ⁱ⁾ ∼ N(0, σ²), the density of ε⁽ⁱ⁾ is given by

We know: ε⁽ⁱ⁾ = y⁽ⁱ⁾ - θ^T x⁽ⁱ⁾, this implies that:

The notation “p(y⁽ⁱ⁾|x⁽ⁱ⁾; θ)” indicates that this is the distribution of y⁽ⁱ⁾ given x⁽ⁱ⁾ and parameterized by θ. Note that we should not condition on θ (“p(y(i)|x(i), θ)”), since θ is not a random variable. We can also write the distribution of y(i) as as y⁽ⁱ⁾ | x⁽ⁱ⁾; θ ∼ N(θ^T x⁽ⁱ⁾, σ²).

We consider X(X is a matrix contains all the x⁽ⁱ⁾), ~y(y is a vector contains all the y⁽ⁱ⁾), we can get the likelihood function(now θ is a random variable):

Notice, ε⁽ⁱ⁾ is indepedence, so y⁽ⁱ⁾ | x⁽ⁱ⁾; θ is indepedence, and y⁽ⁱ⁾ | x⁽ⁱ⁾; θ ∼ N(θ^T x⁽ⁱ⁾, σ²)

The principal of maximum likelihood says that we should should choose θ so as to make the data as high probability as possible. I.e., we should choose θ to maximize L(θ).

In particular, the derivations will be a bit simpler if we instead maximize the log likelihood l(θ):

Hence, maximizing l(θ) gives the same answer as minimizing

which we recognize to be J(θ), our original least-squares cost function.

3. Summarize:

Under the previous probabilistic assumptions on the data, least-squares regression corresponds to finding the maximum likelihood estimate of θ. This is thus one set of assumptions under which least-squares regression can be justified as a very natural method that’s just doing maximum likelihood estimation. (Note however that the probabilistic assumptions are by no means necessary for least-squares to be a perfectly good and rational procedure, and there may—and indeed there are—other natural assumptions that can also be used to justify it.)

Note also that, in our previous discussion, our final choice of θ did not depend on what was σ², and indeed we’d have arrived at the same result even if σ² were unknown. We will use this fact again later, when we talk about the exponential family and generalized linear models.

转载于:https://www.cnblogs.com/ustccjw/archive/2013/04/13/3017741.html

weixin_30535913

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Probabilisic interpretaion

1. Guide When faced with a regression problem, why might linear regression, andspecifically why might the least-squares cost function J, be a reasonablechoice? In this section, we will give a s...
复制链接

扫一扫