数据预处理--特征缩放

1.class sklearn.preprocessing.MinMaxScaler(feature_range=(0, 1), copy=True)
通过将每个特征缩放到给定范围来。
该估计器(estimator)单独地将每个特征缩放和转换,使数值落在给定的范围内,例如, 介于0和1之间。
MinMaxScaler类的参数有:
feature_range : tuple (min, max), default=(0, 1)
copy : boolean, optional, default True Set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array).
MinMaxScaler类的属性有:
data_min_ : ndarray, shape (n_features,) Per feature minimum seen in the data
data_max_ : ndarray, shape (n_features,) Per feature maximum seen in the data
data_range_ : ndarray, shape (n_features,) Per feature range (data_max_ - data_min_) seen in the data

from sklearn.preprocessing import MinMaxScaler

data = [[1, 4, 2], [18, -1, 2], [4, 7, 8], [-4, 2, 10]]
scaler = MinMaxScaler()
print(scaler.fit(data))
print('----------------')
#返回每一列的最大值
print(scaler.data_max_)   
print('----------------')
#将每列进行归一化
print(scaler.transform(data))  
print('----------------')
#将训练好的模型用于新的数据上([2, 2,2]),所谓训练好的模型指上面已获得每个特征的最大最小值
print(scaler.transform([[2, 2, 2]]))  

输出:

MinMaxScaler(copy=True, feature_range=(0, 1))
----------------
[18.  7. 10.]
----------------
[[0.22727273 0.625      0.        ]
 [1.         0.         0.        ]
 [0.36363636 1.         0.75      ]
 [0.         0.375      1.        ]]
----------------
[[0.27272727 0.375      0.        ]]

scikit-learn官方文档链接:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler

2.class sklearn.preprocessing.StandardScaler(copy=True, with_mean=True, with_std=True)
Standardize features by removing the mean and scaling to unit variance
将特征值的分布转化为标准正态分布。

The standard score of a sample x is calculated as:

z = (x - u) / s
where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.
参数:
with_mean : boolean, True by default

If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.

with_std : boolean, True by default

If True, scale the data to unit variance (or equivalently, unit standard deviation).

copy : boolean, optional, default True

If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.

属性:

n_samples_seen__ : int The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls. 样本数量,可以通过patial_fit 增加

mean_ : array of floats with shape [n_features] The mean value for each feature in the training set. 每个特征的平均值

var_ : array of floats with shape [nfeatures] The variance for each feature in the training set. Used to compute scale 每个特征的方差

scale_ : ndarray, shape (n_features,) Per feature relative scaling of the data. 缩放比例,同时也是标准差

from sklearn.preprocessing import StandardScaler
import numpy as np

x=np.arange(10).reshape(5,2)
ss=StandardScaler()
ss.fit(x) 
print(x)
print('----------------------')
print(ss.n_samples_seen_ )
print('----------------------')
print(ss.mean_)   #每个特征的平均值
print('----------------------')
print(ss.var_)    #每个特征的方差
print('----------------------')
print(ss.scale_)
x=ss.fit_transform(x)
print(x)

输出:

[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
----------------------
5
----------------------
[4. 5.]
----------------------
[8. 8.]
----------------------
[2.82842712 2.82842712]
[[-1.41421356 -1.41421356]
 [-0.70710678 -0.70710678]
 [ 0.          0.        ]
 [ 0.70710678  0.70710678]
 [ 1.41421356  1.41421356]]

scikit-learn官方文档链接:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值