[ML of Andrew Ng]Week 1 : Linear Regression with One Variable

最新推荐文章于 2023-05-12 10:05:15 发布

大庆csdn

最新推荐文章于 2023-05-12 10:05:15 发布

阅读量453

点赞数

分类专栏： meachine learning

本文链接：https://blog.csdn.net/mrliudq/article/details/50823130

版权

meachine learning 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Week 1 Linear Regression with One Variable

Week 1 Linear Regression with One Variable
- Introduction
- Linear Regression with One Variable

Introduction

Two definitions of ML

The field of study that gives computers the ability to learn without being explicitly programmed.
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Two types of ML

Supervised Learning and Unsupervised Learning.
The keys of supervised learning are the correct output and labeled examples.
While data in Unsupervised learning has no labels,they are same.We don’t know what to do.

Prerequisite for this course

Knowledge of basic computer science principles and skills, at a level sufficient to write a reasonably non-trivial computer program.
Familiarity with the basic probability theory. (CS109 or Stat116 is sufficient but not necessary.)
Familiarity with the basic linear algebra (any one of Math 51, Math 103, Math 113, or CS 205 would be much more than necessary.)

Linear Regression with One Variable

The Hypothesis Function

h θ (x) = θ 0 + θ 1 x

$h_\theta (x) = \theta_0 + \theta_1 x$
or:

h θ (x) = θ 0 x 0 + θ 1 x 1 w h i c h x 0 = 1

$h_\theta (x) = \theta_0 x_0 + \theta_1 x_1 \quad which \; x_0 = 1$
PS:

x(i)j $x_j^{(i)}$ means the

j $j$ rd features and

i $i$ rd examples.

we can get the vectors $\boldsymbol{\theta}$ and $\boldsymbol{X}$ as:

θ = [θ 0 θ 1]

$\boldsymbol{\theta} = \begin{bmatrix} \theta_0 \\ \theta_1 \end{bmatrix}$

(2 \times 1)

$\qquad (2 \times 1)$
and:

X = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ 11 ⋮ 1 x (1) x (2) ⋮ x (m) ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

$\boldsymbol{X} = \begin{bmatrix} 1 & x^{(1)} \\ 1 & x^{(2)} \\ \vdots & \vdots \\ 1 & x^{(m)} \end{bmatrix}$

(m \times 2)

$\qquad (m \times 2)$
So we get the

H = X θ

$\boldsymbol{H} = \boldsymbol{X}\boldsymbol{\theta}$

(m \times 2) \times (2 \times 1) = (m \times 1)

$(m \times 2) \times (2 \times 1) = (m \times 1)$ like as:

H = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ h θ (x (1)) h θ (x (2)) ⋮ h θ (x (m)) ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

$\boldsymbol{H} = \begin{bmatrix} h_{\theta}(x^{(1)}) \\ h_{\theta}(x^{(2)}) \\ \vdots \\h_{\theta}(x^{(m)}) \end{bmatrix}$

(m \times 1)

$\qquad (m \times 1)$

In matlab:

h = X*theta;

Cost Function

J (θ 0, θ 1) = 1 2 m \sum i = 1 m (h θ (x (i)) - y (i)) 2

$J(\theta_0 , \theta_1) = \frac{1}{2m} \sum_{i=1}^m (h_ \theta (x^{(i)}) - y^{(i)})^2$

Attention: $J(\theta_0 , \theta_1)$ 　is a scalar, just a number.
In matlab, we can use like:

J = 1/(2*m) * sum((X*theta - y).^2);
%.^ means dot product
%sum means sum all elements in matrix

Gradient Descent

θ j = θ j - α \partial \partial θ j J (θ 0, θ 1)

$\theta_j = \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0 , \theta_1)$
repeat until convergence.
parameter

α $\alpha$ : Learning rate
parameter

∂ $\partial$ : Slope of tangent aka derivative

Gradient Descent for Linear Regression

When specifically applied to the case of linear regression, a new form of the gradient descent equation can be derived.

θ j = θ j - α 1 m \sum i = 1 m ((h θ (x (i) j) - y (i)) x (i) j)

$\theta_j = \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m ((h_ \theta (x_j^{(i)}) - y^{(i)}) x_j^{(i)})$
In matlab, we can use like:

theta = theta - (alpha/m * X' * (X*theta - y));
%because of the matrix multiplication, we  need not sum them.

大庆csdn

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[ML of Andrew Ng]Week 1 : Linear Regression with One Variable

Week1Week1IntroductionTwo definitions of MLTwo types of MLSupervised LearningUnsupervised LearningPrerequisite for this courseLinear Regression with One VariableThe Hypothesis FunctionCost Fun
复制链接

扫一扫