Gradient Descent
In mathematics, gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function.
In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non-vertical tangent line at each interior point in its domain.
Cost Function
A Cost Function is a function that measures the performance of a model for any given data. It quantifies the error between predicted values and expected values and presents it in the form of a single real number.
Firstly, we should make a hypothesis function with initial parameters, we calculate this function’s cost function. And with a goal to reduce the cost function, we modify the parameters of the hypothesis function by using the Gradient descent algorithm over the given data.
There are two functions, hypothesis and cost function. We apply the gradient descent algorithm to the cost function so that we find the minimum value of it. (use gradient descent as a tool to minimize our cost function)
In machine learning, the cost function is a function to which we are applying the gradient descent algorithm.
Derivatives are used to decide whether to increase or decrease the weights to increase or decrease an objective function. If we can compute the derivative of a function, we know in which direction to proceed to minimize it.
Convergence
The concept of convergence is a well-defined mathematical term. It means that “eventually” a sequence of elements gets closer and closer to a single value.
e.g., an algorithm starts printing numbers like:
x0 = 3.1
x1 = 3.14
x2 = 3.141
x3 = 3.1415
x4 = 3.14159
...
As we can see, the algorithm prints increasing numbers close to pi. We say our algorithm converges to pi. And we call such functions convex functions (like a bowl shape).
Gradient Descent
The goal of the gradient descent algorithm is to minimize the given function (say cost function). To achieve this goal, it performs two steps iteratively:
- Compute the gradient (slope), the first order derivative of the function at that point.
- Make a step (move) in the direction opposite to the gradient, opposite direction of slope increase from the current point by alpha times the gradient at that point.
Gradient Descent Algorithm:
α
\alpha
α is called Learning rate – a tuning parameter in the optimization process. It decides the length of the steps.
To reach a local minimum efficiently, we have to set our learning rate- parameter α appropriately, neither too high nor too low.
Credit To:
https://towardsdatascience.com/minimizing-the-cost-function-gradient-descent-a5dd6b5350e1
https://www.analyticsvidhya.com/blog/2020/10/how-does-the-gradient-descent-algorithm-work-in-machine-learning/