吴恩达·Machine Learning || chap2 Linear regression with one variable 简记

最新推荐文章于 2024-09-13 17:02:14 发布

The Prestige

最新推荐文章于 2024-09-13 17:02:14 发布

阅读量100

点赞数

分类专栏： Machine Learning

本文链接：https://blog.csdn.net/qq_46203130/article/details/119185566

版权

机器学习

Machine Learning 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

2 Linear regression with one variable

2-1 Model representation

Training set

m = Number of training examples

x’s = “input” variable/features

y’s = “output” variable/“target” variable

(x,y) = one training example

Hypothesis 假设函数
$h_\theta (x)=\theta_0+\theta_1 x$

2-2 Cost Function

Goal: minimize
$\frac{1}{2m}\sum_{1}^{m}(h_\theta(x^{(i)}-y^{(i)})^2$

除以m是使得误差平均到每个样本，除以2是一个微积分技巧，用于消除计算偏导数时出现的2。

Cost Function(Square error function )
$J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)}-y^{(i)})^2$

2-3 Cost Function Intuition I

Simplified

$\theta_0=0$

2-4 Cost Function Intuition II

Contour plot/figure 等高线图

2-5 Gradient descent

Outline

Start with some $\theta_0,\theta_1$
Keep changing $\theta_0,\theta_1$ to reduce $J(\theta_0,\theta_1)$

until we hopefully end up with a minimum

Gradient descent algorithm

repeat until convergence{

$\theta_j := \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1)$ （for j=0 and j=1）

}

$\alpha$ learning rate

Correct ：Simultaneous update

$\textcolor{blue}{temp0：}$ $\theta_0 := \theta_0-\alpha\frac{\partial}{\partial\theta_0}J(\theta_0,\theta_1)$

$\textcolor{blue}{temp1：}$ $\theta_1 := \theta_1-\alpha\frac{\partial}{\partial\theta_1}J(\theta_0,\theta_1)$

$\theta_0:$ = $\textcolor{blue}{temp0}$

$\theta_1$ := $\textcolor{blue}{temp1}$

2-6 Gradient descent Intuition

Learning rate $\alpha$

if $\alpha$ is too small,gradient descent can be slow

if $\alpha$ is too large,gradient descent can overshoot the minimum .It may fail to converge,or even diverge

learning rate $\alpha$ fixed

As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease $\alpha$ over time

2-7 Gradient descent for linear regression

Gradient decent algorithm

repeat until convergence{
$\begin{cases}\theta_0 := \theta_0-\alpha\frac{1}{m}\sum_{i=1}^{m}{(h_\theta(x^{(i)}-y^{(i)})} \\\theta_1 := \theta_1-\alpha\frac{1}{m}\sum_{i=1}^{m}{(h_\theta(x^{(i)}-y^{(i)})}.x^{(i)} \end{cases}$
}

update $\theta_0$ and $\theta_1$ simultaneously

convex function 凸函数

”Batch“ Gradient Descent

Batch ": Each step of gradient descent uses all the training examples

The Prestige

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录