24 篇文章 7 订阅
6 篇文章 6 订阅

# 2 标准化

Gradient descent is one of the many algorithms that benefit from feature scaling. In this section, we will use a feature scaling method called standardization, which gives our data the property of a standard normal distribution, which helps gradient descent learning to converge more quickly. Standardization shifts the mean of each feature so that it is centered at zero and each feature has a standard deviation of 1.

x j ′ = x j − μ j σ j x_j^{'}=\frac{x_j-\mu_j}{\sigma_j}

x j x_j n n 个训练样本中第 j j 个特征值组成的向量， μ j \mu_j 是训练样本中的均值， σ j \sigma_j 是训练样本的标准差。

One of the reasons why standardization helps with gradient descent learning is that the optimizer has to go through fewer steps to find a good or optimal solution (the global cost minimum), as illustrated in the following figure, where the subfigures represent the cost surface as a function of two model weights in a two-dimensional classification problem:

# 3 归一化

Now, there are two common approaches to bring different features onto the same scale: normalization and standardization. Those terms are often used quite loosely in different fields, and the meaning has to be derived from the context. Most often, normalization refers to the rescaling of the features to a range of [0, 1], which is a special case of min-max scaling.

x n o r m ( i ) = x i − x m i n x m a x − x m i n x_{norm}^{(i)}=\frac{x^{i}-x_{min}}{x_{max}-x_{min}}

x n o r m ( 1 ) = 1 − 1 5 − 1 = 0 x_{norm}^{(1)}=\frac{1-1}{5-1}=0
x n o r m ( 2 ) = 5 − 1 5 − 1 = 1 x_{norm}^{(2)}=\frac{5-1}{5-1}=1
x n o r m ( 3 ) = 3 − 1 5 − 1 = 0.5 x_{norm}^{(3)}=\frac{3-1}{5-1}=0.5

# 4 正则化

Regularization is a very useful method to handle collinearity (high correlation among features), filter out noise from data, and eventually prevent overfitting. The concept behind regularization is to introduce additional information (bias) to penalize extreme parameter (weight) values.

1. 减少特征个数（特征约减）：
1.1 手工保留部分特征（你觉得你能做到么？反正我觉得我做不到）
1.2 模型选择算法（PCA，SVD，卡方分布）
2. 正则化：保留所有特征，惩罚系数 θ \theta ，使之接近于0，系数小，贡献就小。

λ 2 ∥ w ∥ 2 = λ 2 ∑ j = 1 m w j 2 \frac{\lambda}{2}\left\|w\right\|^2=\frac{\lambda}{2}\sum_{j=1}^mw_j^2

1. 计算机中计算平方比计算绝对值简单；
2. 第二范数是光滑且可求导的，但是第一范数至少在0这个点是不可导的。

# 5 总结

1. 量纲与无量纲的区别就是：物理量是否与单位有关。
2. 标准化与归一化没有显著的区别，具体是谁要依据上下文确定。归一化是把特征缩放到 [ 0 , 1 ] [0,1] ，标准化是把特征缩放到均值为0，标准差为1。
3. 正则化是与标准化和归一化完全不同的东西，是用于惩罚训练的太好的参数，防止模型过拟合，提高模型的泛化能力。

# 6 参考

[1]Sebastian Raschka,Vahid Mirjalili.Python Machine Learning[M].Packt Publishing - ebooks Account:Birmingham,2017.
[2]龚焱.标准化和归一化什么区别？[EB/OL].https://www.zhihu.com/question/20467170,2018-5-16.
[3]lcdxshengpeng.无量纲量和有量纲量[EB/OL].https://blog.csdn.net/lcdxshengpeng/article/details/82794603,2018-9-20.

11-10 3334
04-10 8万+
03-25 8035
07-18 2万+
04-18 3772
10-21 4万+
01-24 4253
01-11 707
04-26 1万+
01-08 7856

### “相关推荐”对你有帮助么？

• 非常没帮助
• 没帮助
• 一般
• 有帮助
• 非常有帮助

¥2 ¥4 ¥6 ¥10 ¥20

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。