Machine Learning Week Two

最新推荐文章于 2022-09-01 20:54:02 发布

YB0316

最新推荐文章于 2022-09-01 20:54:02 发布

阅读量229

点赞数

notation

$x^{(i)}_j$ denotes value of feature j in the ith training example

$n$ denotes the number of features

The multivariable form of the hypothesis function is

h θ (x) = θ 0 + θ 1 x 1 + . . . + θ n x n

$h_\theta(x)=\theta_0+\theta_1x_1+...+\theta_nx_n$
or

h θ (x) = \sum i = 0 n θ i x i

$h_\theta(x) = \sum_{i = 0}^{n}\theta_ix_i$ ’
where we asume

x(i)0=1 $x^{(i)}_0=1$
we alse write

θ = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ θ 0 θ 1 ⋮ θ n ⎤ ⎦ ⎥ ⎥ ⎥ ⎥

$\theta=\begin{bmatrix}\theta_0\\\theta_1\\\vdots\\\theta_n \end{bmatrix}$
and

X = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ x 0 x 1 ⋮ x n ⎤ ⎦ ⎥ ⎥ ⎥ ⎥

$X = \begin{bmatrix}x_0\\x_1\\\vdots\\x_n \end{bmatrix}$
so we can rewrite h(x) as

h θ (x) = θ ⃗ T X ⃗

$h_\theta(x) = \vec\theta^T\vec X$

Algorithm

θ j : = θ j - α 1 m (h θ (x (i)) - y (i))) x (i) j

$\theta_j:=\theta_j - \alpha\frac{1}{m}(h_\theta(x^{(i)}) - y^{(i)}) )x^{(i)}_j$
(for j = 0 to n)
where x0 = 1

Algorithm in practise

feature scaling

We can speed up gradient descent by having each of our input values in roughly the same range.

x i : = x i - μ i s i

$x_i:=\frac{x_i-\mu_i}{s_i}$
where

μi $\mu_i$ means the average of xi
and

si $s_i$ means the range of xi or means standard deviation

learning rate

Debugging gradient descent.

Make a plot with number of iterations on the x-axis. Plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increases, α is too large.

Automatic convergence test.

Declare convergence if J(θ) decreases by less than E in one iteration, where E is some small value such as 10−3. However in practice it’s difficult to choose this threshold value.

Features and Polynomial Regression

we can combine multiple features into one. Such as x3 := x1 * x2
We can change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form)for example:

h θ (x) = θ 0 + θ 1 x 1 + θ 2 x 21 + θ 3 x 31

$h_\theta(x) = \theta_0+\theta_1x_1+\theta_2x_1^2+\theta_3x_1^3$
or

h θ (x) = θ 0 + θ 1 x 1 + θ 2 x \sqrt 1

$h_\theta(x) = \theta_0+\theta_1x_1+\theta_2\sqrt x_1$

Normal Equation

θ = (X T X) - 1 X T y ⃗

$\theta=(X^TX)^{-1}X^T\vec y$
where

X = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ x (1) T x (2) T ⋮ x (m) T ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ m \times (n + 1)

$X=\begin{bmatrix}x^{(1)^T}\\x^{(2)^T}\\\vdots\\x^{(m)^T} \end{bmatrix}_{m\times (n+1)}$
and

y = ⎡ ⎣ ⎢ ⎢ y (0) ⋮ y (m) ⎤ ⎦ ⎥ ⎥

$y = \begin{bmatrix}y^{(0)}\\\vdots\\y^{(m)}\end{bmatrix}$
differences

Gradient Descent	Normal Equation
Need to choose alpha	No need to choose alpha
Needs many iterations	No need to iterate
$O(Kn^2)$	$O(n^3)$ and need to calculate X’X
work well when n is large	Slow if n is very large

YB0316

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Machine Learning Week Two

notationx(i)jx^{(i)}_jdenotes value of feature j in the ith training examplenn denotes the number of features The multivariable form of the hypothesis function is hθ(x)=θ0+θ1x1+...+θnxnh_\theta
复制链接

扫一扫