数据预处理--特征缩放

最新推荐文章于 2022-12-31 22:10:14 发布

AndrewTeng

最新推荐文章于 2022-12-31 22:10:14 发布

阅读量851

点赞数 1

分类专栏：数据预处理 Python 机器学习实战

本文链接：https://blog.csdn.net/qq_30982323/article/details/97236789

版权

Python 同时被 3 个专栏收录

3 篇文章 0 订阅

订阅专栏

数据预处理

2 篇文章 0 订阅

订阅专栏

机器学习实战

2 篇文章 0 订阅

订阅专栏

1.class sklearn.preprocessing.MinMaxScaler(feature_range=(0, 1), copy=True)
通过将每个特征缩放到给定范围来。
该估计器（estimator）单独地将每个特征缩放和转换，使数值落在给定的范围内，例如，介于0和1之间。
MinMaxScaler类的参数有：
feature_range : tuple (min, max), default=(0, 1)
copy : boolean, optional, default True Set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array).
MinMaxScaler类的属性有：
data_min_ : ndarray, shape (n_features,) Per feature minimum seen in the data
data_max_ : ndarray, shape (n_features,) Per feature maximum seen in the data
data_range_ : ndarray, shape (n_features,) Per feature range (data_max_ - data_min_) seen in the data

from sklearn.preprocessing import MinMaxScaler

data = [[1, 4, 2], [18, -1, 2], [4, 7, 8], [-4, 2, 10]]
scaler = MinMaxScaler()
print(scaler.fit(data))
print('----------------')
#返回每一列的最大值
print(scaler.data_max_)   
print('----------------')
#将每列进行归一化
print(scaler.transform(data))  
print('----------------')
#将训练好的模型用于新的数据上（[2, 2，2]），所谓训练好的模型指上面已获得每个特征的最大最小值
print(scaler.transform([[2, 2, 2]]))

输出：

MinMaxScaler(copy=True, feature_range=(0, 1))
----------------
[18.  7. 10.]
----------------
[[0.22727273 0.625      0.        ]
 [1.         0.         0.        ]
 [0.36363636 1.         0.75      ]
 [0.         0.375      1.        ]]
----------------
[[0.27272727 0.375      0.        ]]

scikit-learn官方文档链接：
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler

2.class sklearn.preprocessing.StandardScaler(copy=True, with_mean=True, with_std=True)
Standardize features by removing the mean and scaling to unit variance
将特征值的分布转化为标准正态分布。

The standard score of a sample x is calculated as:

z = (x - u) / s
where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.
参数：
with_mean : boolean, True by default

If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.

with_std : boolean, True by default

If True, scale the data to unit variance (or equivalently, unit standard deviation).

copy : boolean, optional, default True

If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.

属性:

n_samples_seen__ : int The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls. 样本数量，可以通过patial_fit 增加

mean_ : array of floats with shape [n_features] The mean value for each feature in the training set. 每个特征的平均值

var_ : array of floats with shape [nfeatures] The variance for each feature in the training set. Used to compute scale 每个特征的方差

scale_ : ndarray, shape (n_features,) Per feature relative scaling of the data. 缩放比例，同时也是标准差

from sklearn.preprocessing import StandardScaler
import numpy as np

x=np.arange(10).reshape(5,2)
ss=StandardScaler()
ss.fit(x) 
print(x)
print('----------------------')
print(ss.n_samples_seen_ )
print('----------------------')
print(ss.mean_)   #每个特征的平均值
print('----------------------')
print(ss.var_)    #每个特征的方差
print('----------------------')
print(ss.scale_)
x=ss.fit_transform(x)
print(x)

输出：

[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
----------------------
5
----------------------
[4. 5.]
----------------------
[8. 8.]
----------------------
[2.82842712 2.82842712]
[[-1.41421356 -1.41421356]
 [-0.70710678 -0.70710678]
 [ 0.          0.        ]
 [ 0.70710678  0.70710678]
 [ 1.41421356  1.41421356]]

scikit-learn官方文档链接：
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler