梯度下降法实践 学习率(Learning Rate)
绘制出 pluck cost function J
迭代的步数需要根据不同的算法做调整,30, 3000, 3000000
行之有效的方法是:去尝试使用不同的 α \alpha α
to choose α \alpha α, try:
…,0.001 ,0.003,0.01,0.03, 0.1, 0.3, 1,…
多项式回归 polynomial regression
可以解决非常复杂,甚至于非线性函数
两个特征量用之一个特征量来表示
看上去二次模型 quadratic model 是一种方式,但是它会到一定程度就下降。我们可以考虑选用三次函数 cubic function。当然还有其他的解决办法比如开根号 square root function
标准方程法 Normal equation
X: m X (n+1)维
y: m 维
m: 是训练样本数量
n: 是特征变量数
θ = ( X T X ) − 1 X T y \theta=(X^TX)^{-1}X^Ty θ=(XTX)−1XTy
x ( i ) = [ x 0 ( i ) x 1 ( i ) x 2 ( i ) . . . x n ( i ) ] ∈ R n + 1 x^{(i)}=\begin{bmatrix} x_0^{(i)}\\ x_1^{(i)}\\ x_2^{(i)}\\ .\\ .\\ .\\ x_n^{(i)} \end{bmatrix}\in \mathbb{R}^{n+1} x(i)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡x0(i)x1(i)x2(i)...xn(i)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤∈Rn+1
X ( d e s i g n m a t r i x ) = [ − − − − ( x ( 1 ) ) T − − − − − − − − ( x ( 2 ) ) T − − − − − − − − ( x ( 3 ) ) T − − − − . . . − − − − ( x ( m ) ) T − − − − ] X(design \ \ \ \ matrix)=\begin{bmatrix} ----(x^{(1)})^T----\\ ----(x^{(2)})^T----\\----(x^{(3)})^T----\\ .\\ .\\ .\\ ----(x^{(m)})^T---- \end{bmatrix} X(design matrix)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡−−−−(x(1))T−−−−−−−−(x(2))T−−−−−−−−(x(3))T−−−−...−−−−(x(m))T−−−−⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤
Example
if x ( i ) = [ 1 x 1 ( i ) ] x^{(i)}=\begin{bmatrix} 1 \\ x_1^{(i)} \end{bmatrix} x(i)=[1x1(i)]
X = [ 1 x 1 ( 1 ) 1 x 1 ( 2 ) 1 x 1 ( 3 ) 1 x 1 ( 4 ) ] , y = [ y ( 1 ) y ( 2 ) y ( 3 ) y ( 4 ) ] X=\begin{bmatrix} 1 & x_1^{(1)} \\ 1 & x_1^{(2)} \\ 1 & x_1^{(3)}\\ 1 & x_1^{(4)}\end{bmatrix},y=\begin{bmatrix} y^{(1)} \\ y^{(2)} \\ y^{(3)} \\ y^{(4)} \\ \end{bmatrix} X=⎣⎢⎢⎢⎡1111x1(1)x1(2)x1(3)x1(4)⎦⎥⎥⎥⎤,y=⎣⎢⎢⎡y(1)y(2)y(3)y(4)⎦⎥⎥⎤
θ = ( X T X ) − 1 X T y \theta=(X^TX)^{-1}X^Ty θ=(XTX)−1XTy
梯度下降和标准方程法的比较
gradient descent | Normal Equation |
---|---|
need to choose α \alpha α | no need to choose α \alpha α |
needs many iterations | don’t need to iterate |
Works well even when n is large | need to compute ( X T X ) − 1 (X^TX)^{-1} (XTX)−1 |
– | slow if n is very large |