理解normalization||Standardization||Feature scaling

  • Feature scaling

    Feature scaling is a method used to normalize the range of independent variables or features of data.

    In some machine learning algorithms, objective functions will not work without normalization.

    Gradient descent converges much faster with feature scaling than without it.

    • Rescaling(min-max normalization) Normalization

      It’s a method consists in rescaling the range of features to scale the range in [0,1] or [-1,1].

      The general formula for a min-max of [0,1] is given as:
      x ‘ = x − m i n ( x ) m a x ( x ) − m i n ( x ) x`=\frac{x-min(x)}{max(x)-min(x)} x=max(x)min(x)xmin(x)

  • Mean normalization

    x ‘ = x − a v e r a g e ( x ) m a x ( x ) − m i n ( x ) x`=\frac{x-average(x)}{max(x)-min(x)} x=max(x)min(x)xaverage(x)

  • Standardization (Z-score Normalization)

    x ‘ = x − x ^ σ x`=\frac{x-\hat x}{\sigma} x=σxx^

  • Scaling to unit length

    x ‘ = x ∣ ∣ x ∣ ∣ x`=\frac{x}{||x||} x=xx

  • Why Should we Use Feature Scaling

    Gradient Descent Based Algorithms like linear regression, logistic regression, neural network, etc. that use gradient descent as an optimization technique require data to be scaled.

    Distance-Based Algorithms like KNN, K-means, and SVM are most affected by the range of feetures.

    Tree-Based Algorithms, are fairly insensitive to the scale of the features. A decision tree is only splitting a node based on a single feature, This split on a feature is not influenced by other features.

  • Normalization vs. Standardization

    These are two of the most commonly used feature scaling techniques in machine learning but a level of ambiguity exists in their understanding.

    Standardization can be helpful in cases where the data follows a Gaussian distribution;and unlike normalization, standardization does not have a bounding range, Which means even if you have outliers in your data, they will not be affected by standardization;

    Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution.

  • Normalization

    Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max scaling.
    X − X m i n X m a x − X m i n \frac{X-X_{min}}{X_{max}-X_{min}} XmaxXminXXmin

  • Standardization

    Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation.
    X − μ σ \frac{X-\mu}{\sigma} σXμ

  • References

  1. Feature Scaling for Machine Learning: Understanding the Difference Between Normalization vs. Standardization
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值