CS229 Lecture Note(1): Linear Regression

最新推荐文章于 2024-08-31 15:42:00 发布

weitian_bnu

最新推荐文章于 2024-08-31 15:42:00 发布

阅读量402

点赞数

分类专栏：浅层学习文章标签：机器学习浅层学习

本文链接：https://blog.csdn.net/weitian_bnu/article/details/51132552

版权

浅层学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1. LMS Algorithm

The Ordinary Least Squares Regression Model:
$h (θ) = θ T x$ $h(\theta)=\theta^Tx$
Cost Function:
$J (θ) = 1 2 \sum i = 1 m (h θ (x i) - y i) 2$ $J(\theta)=\frac{1}{2}\sum_{i=1}^m(h_\theta(x^i)-y^i)^2$
Gradient Descent Algorithm:

$θ : = θ - α \partial \partial θ J (θ)$ $\theta:=\theta-\alpha \frac{\partial}{\partial\theta}J(\theta)$
LMS (least mean squares) update rule (also called for Widrow-Hoff learning rule):
$θ j : = θ j + α \sum i = 1 m (y i - h θ (x i)) x i j$ $\theta_j:=\theta_j+\alpha\sum_{i=1}^m{(y^i-h_\theta(x^i))x_j^i}$

Batch Gradient Descent vs. Stochastic Gradient Descent


# BGD

Repeat until convergence {
    theta = theta + alpha * sum((y_i - h_i) * x_i)
}


# SGD

Loop {
    for i=1 to m, {
        theta = theta + alpha * (y_i - h_i) * x_i
    }
}

Normal Equation Solution:
$θ = (X T X) - 1 X T Y$ $\theta=(X^TX)^{-1}X^TY$

2. Probabilistic Interpretation

Predictive Probability Assumption: a Gaussian Distribution
$p (y | x; θ) = 1 2 π ‾ ‾ ‾ \sqrt σ e - ( y - θ T x ) 2 2 σ 2 \sim  (θ T x, σ 2)$ $p(y|x;\theta)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(y-\theta^Tx)^2}{2\sigma^2}}\sim\mathcal{N}(\theta^Tx,\sigma^2)$
Likelihood Function of $\theta$ : the probability of the given data $y$ (given i.i.d. assumption)
$L (θ) = \prod i = 1 m p (y i | x i; θ) = \prod i = 1 m 1 2 π ‾ ‾ ‾ \sqrt σ e - ( y i - θ T x i ) 2 2 σ 2$ $L(\theta)=\prod_{i=1}^m{p(y^i|x^i;\theta)}=\prod_{i=1}^m{\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(y^i-\theta^Tx^i)^2}{2\sigma^2}}}$
Maximum Likelihood Method: choose $\theta$ to maximize $L(\theta)$ or the log likelihood $l(\theta)$ :
$l (θ) = log L (θ) \Rightarrow 1 2 \sum i = 1 m (y i - θ T x i) 2$ $l(\theta)=\log{L(\theta)}\Rightarrow\frac{1}{2}\sum_{i=1}^m{(y^i-\theta^Tx^i)^2}$ $θ = arg max θ l (θ)$ $\theta=\arg\max_\theta{l(\theta)}$

The least-squares regression model corresponds to the maximum likelihood estimation of $\theta$ under a Gaussian distribution assumption on data.

3. Locally Weighted Linear Regression

Motivation: get rid of the problem of feature selection (which leads to the underfitting and overfitting problems)
Parametric vs. Non-parametric learning algorithm
LWR algorithm:
Querying a certain point $x$ ,
1. Fit $\theta$ to minimize $\sum_i{w^i(y^i-\theta^Tx^i)^2}$ , where $w^i=e^{-\frac{(x^i-x)^2}{2\tau^2}}$
2. Output $\theta^Tx$
3. Hence, the (errors on) training examples close to the query point $x$ would be given a much higher weight to determine $\theta$ (local linearity).

weitian_bnu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CS229 Lecture Note(1): Linear Regression

1. LMS AlgorithmThe Ordinary Least Squares Regression Model: h(θ)=θTxh(\theta)=\theta^TxCost Function: J(θ)=12∑mi=1(hθ(xi)−yi)2J(\theta)=\frac{1}{2}\sum_{i=1}^m(h_\theta(x^i)-y^i)^2LMS (least mean s
复制链接

扫一扫