sklearn 数据预处理1: StandardScaler

StandardScaler

作用:去均值和方差归一化。且是针对每一个特征维度来做的,而不是针对样本

并不是所有的标准化都能给estimator带来好处。
“Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual feature do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).”

from sklearn.preprocessing import StandardScaler
import numpy as np


def test_algorithm():
    np.random.seed(123)
    print('use sklearn')
    # 注:shape of data: [n_samples, n_features]
    data = np.random.randn(10, 4)
    scaler = StandardScaler()
    scaler.fit(data)
    trans_data = scaler.transform(data)
    print('original data: ')
    print(data)
    print('transformed data: ')
    print(trans_data)
    print('scaler info: scaler.mean_: {}, scaler.var_: {}'.format(scaler.mean_, scaler.var_))
    print('\n')

    print('use numpy by self')
    mean = np.mean(data, axis=0)
    std = np.std(data, axis=0)
    var = std * std
    print('mean: {}, std: {}, var: {}'.format(mean, std, var))
    # numpy 的广播功能
    another_trans_data = data - mean
    # 注:是除以标准差
    another_trans_data = another_trans_data / std
    print('another_trans_data: ')
    print(another_trans_data)

if __name__ == '__main__':
    test_algorithm()

运行结果:

use sklearn
original data: 
[[-1.0856306   0.99734545  0.2829785  -1.50629471]
 [-0.57860025  1.65143654 -2.42667924 -0.42891263]
 [ 1.26593626 -0.8667404  -0.67888615 -0.09470897]
 [ 1.49138963 -0.638902   -0.44398196 -0.43435128]
 [ 2.20593008  2.18678609  1.0040539   0.3861864 ]
 [ 0.73736858  1.49073203 -0.93583387  1.17582904]
 [-1.25388067 -0.6377515   0.9071052  -1.4286807 ]
 [-0.14006872 -0.8617549  -0.25561937 -2.79858911]
 [-1.7715331  -0.69987723  0.92746243 -0.17363568]
 [ 0.00284592  0.68822271 -0.87953634  0.28362732]]
transformed data: 
[[-0.94511643  0.58665507  0.5223171  -0.93064483]
 [-0.53659117  1.16247784 -2.13366794  0.06768082]
 [ 0.9495916  -1.05437488 -0.42049501  0.3773612 ]
 [ 1.13124423 -0.85379954 -0.19024378  0.06264126]
 [ 1.70696485  1.63376764  1.22910949  0.8229693 ]
 [ 0.52371324  1.02100318 -0.67235312  1.55466934]
 [-1.08067913 -0.85278672  1.13408114 -0.858726  ]
 [-0.18325687 -1.04998594 -0.00561227 -2.1281129 ]
 [-1.49776284 -0.9074785   1.15403514  0.30422599]
 [-0.06810748  0.31452186 -0.61717074  0.72793583]]
scaler info: scaler.mean_: [ 0.08737571  0.33094968 -0.24989369 -0.50195303], scaler.var_: [1.54038781 1.29032409 1.04082479 1.16464894]


use numpy by self
mean: [ 0.08737571  0.33094968 -0.24989369 -0.50195303], std: [1.24112361 1.13592433 1.02020821 1.07918902], var: [1.54038781 1.29032409 1.04082479 1.16464894]
another_trans_data: 
[[-0.94511643  0.58665507  0.5223171  -0.93064483]
 [-0.53659117  1.16247784 -2.13366794  0.06768082]
 [ 0.9495916  -1.05437488 -0.42049501  0.3773612 ]
 [ 1.13124423 -0.85379954 -0.19024378  0.06264126]
 [ 1.70696485  1.63376764  1.22910949  0.8229693 ]
 [ 0.52371324  1.02100318 -0.67235312  1.55466934]
 [-1.08067913 -0.85278672  1.13408114 -0.858726  ]
 [-0.18325687 -1.04998594 -0.00561227 -2.1281129 ]
 [-1.49776284 -0.9074785   1.15403514  0.30422599]
 [-0.06810748  0.31452186 -0.61717074  0.72793583]]

Process finished with exit code 0

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值