Multivariate Linear Regression

NZOGGY_

已于 2022-01-25 21:56:34 修改

阅读量150

点赞数

文章标签：线性回归算法机器学习

于 2022-01-25 21:55:20 首次发布

本文链接：https://blog.csdn.net/NZOGGY_/article/details/122692758

版权

Multivariate Linear Regression

Multiple Features

Multivariate linear regression: Linear regression with mutiple variables.

Notation

$n$ = number of features

$x^{(i)}$ = input (features) of $i^{th}$ training example.

$x_j^{i}$ = value of feature $j$ in $i^{th}$ training example.

$h_\theta(x)$ = $\theta_0+\theta_1x_1+...\theta_nx_n$

For convenience of notation, define $x_0$ =1.(That is, $x_0^{(i)}$ =1)
$x=\begin{bmatrix} x_0\\x_1\\\cdots\\x_n \end{bmatrix}\in\R^{n+1}$

$\theta=\begin{bmatrix} \theta_0\\\theta_1\\\cdots\\\theta_n\end{bmatrix}\in\R^{n+1}$

$h_\theta(x)$ = $\theta^Tx$

Gradient Descent for Multiple Variable

The gradient descent equation is generally the same form, we just have to repeat it for ‘n’ features:

在这里插入图片描述

Repeat until convergence:{

$\theta_j$ := $\theta_j$ - $\alpha\frac{1}{m}\sum^m_{i=1}{(h_\theta(x^{(i)} )-y^{(i)})\cdot x_j^{(i)} }$

}

Gradient Descent in Practice I- Feature Scaling

Idea: Make sure features are on similar scale.

Modify the ranges of our input variables: Speed up gradient descent by having each of input values in roughly the same range.(make the contour of cost function $J$ can become less skewed)

Because $\theta$ will descend quickly on small ranges and slowly on large ranges. This will oscillate inefficiently down to the optimum when the variables are very uneven.

No exact requirements.

Two techniques:

feature scaling :Dividing the input values by the range of the input variables

Get every feature into approximately a $-1\leq x_i\leq1$ range.

mean normalization: Subtracting the average value for an input variable from the values for that input variable.

Replace $x_i$ with $x_i-\mu_i$ to make features have approximately zero mean (Do not apply to $x_0=1$ )

Adjust input values in the formula:

$x_i :=\frac{x_i-\mu_i}{s_i}$

$\mu$ is the average of all the values for features(i) and $s_i$ is the range of values (max-min), or $s_i$ is the standard deviation.

Dividing by the range, or dividing by the standard deviation, give different results.

Gradient Descent in Practice II- Learning Rate

Debugging gradient descent

Make a plot with number of iterations on the x-axis. Now plot the cost function, J(θ) over the number of iterations of gradient descent. If J( $\theta$ ) ever increases, then we need to increase $\alpha$ .

( $J_\theta$ should decrease after every iteration)