Gradient Descent in Practice I - Feature Scaling
goal:speed up gradient descent by having each of our input values in roughly the same range
xi:=(xi−μi)/si
Where μi is the average of all the values for feature (i) and si is the range of values (max - min), or si is the standard deviation.
Gradient Descent in Practice II - Learning Rate
goal:find the fit learning rate to make the J(θ) will decrease on every iteration.
summary:
If α is too small: slow convergence.
If α is too large: may not decrease onevery iteration and thus may not converge.
Polynomial Regression
goal:simplify our hypothesis functioncombine multiple features into one
For example, if our hypothesis function is hθ(x)=θ0+θ1x1
then we can create additional features based on x1, to get the quadratic function hθ(x)=θ0+θ1x1+θ2x12
or the cubic function hθ(x)=θ0+θ1x1+θ2x12+θ3x13
In the cubic version, we have created new features x2 and x3 where x2=x12 and x3=x13.
To make it a square root function, we could do: hθ(x)=θ0+θ1x1+θ2x1
One important thing to keep in mind is, if you choose your features this way then feature scaling becomes very important.