Data normalization is generally performed during the data pre-processing step.
1. why we need normalization
There are two major reasons that data normalization is so essential for machine learning algorithm.
- Data normalization can promote the performance in common machine learning problems.
- Data normalization can speed up the coverage of gradient descent algorithm.
Let's illustrate this using a screenshot from Andrew's machine learning course
2. how to normalize data
Three common methods are used to perform feature normalization in machine learning algorithms.- Rescaling
where is the original value, is the normalized value.
The equation (1) rescales data into [0,1], and the equation (2) rescales data into [-1,1].
Note: the parameters and should be computed in the training data only, but will be used in the training, validation, and testing data later.
There are also some methods to normalize the features using non-linear function, such as
logarithmic function:
inverse tangent function:
sigmoid function:
- Standardization
Feature standardization makes the values of each feature in the data have zero-mean and unit-variance. This method is widely used for normalization in many machine learning algorithms (e.g., support vector machines,logistic regression, and neural networks). The general formula is given as:
where is the standard deviation of the feature .
- Scaling to unit length
This is especially important if the Scalar Metric is used as a distance measure in the following learning steps.
3. Some cases you don't need data normalization
3.1 using a similarity function instead of distance function
You can propose a similarity function rather than a distance function and plug it in a kernel (technically this function must generate positive-definite matrices).
3.2 random reforest
Random forest never compare one feature with another in magnitude, so the ranges don't matter.
Reference