Now we need to estimate the parameter of cost function ,so that’s where Gradient Descent comes in.
The way we do this is by taking the derivative (求导)of the cost function.
The gradient descent algorithm is:
repeat until convergence:
α is Learning Rate of cost function
当cost function越来越接近局部最小值时 cost function的导数也越来越接近0.
所以即使Learning Rate是一个fixed值,梯度下降的速度也会减慢。所以不用减小Learning Rate的值.
Batch Gradient Descent
“Batch”:Each step of gradient descent uses all the training examples.