Python——sklearn包中数据标准化以及缺失值的处理函数汇集

最新推荐文章于 2023-03-06 16:20:05 发布

偷偷搞塌

最新推荐文章于 2023-03-06 16:20:05 发布

阅读量1.9k

点赞数 2

分类专栏： python学习记录

python学习记录专栏收录该内容

37 篇文章 0 订阅

订阅专栏

sklearn.preprocessing:
博主写的很详细：
https://blog.csdn.net/pipisorry/article/details/52247679

The preprocessing module further provides a utility class StandardScaler that implements the Transformer API to computethe mean and standard deviation on a training set so as to beable to later reapply the same transformation on the testing set.This class is hence suitable for use in the early steps of a sklearn.pipeline.Pipeline:

scaler = preprocessing.StandardScaler().fit(X)
scaler
StandardScaler(copy=True, with_mean=True, with_std=True)

scaler.mean_
array([ 1. …, 0. …, 0.33…])

scaler.scale_
array([ 0.81…, 0.81…, 1.24…])

scaler.transform(X)
array([[ 0. …, -1.22…, 1.33…],
[ 1.22…, 0. …, -0.26…],
[-1.22…, 1.22…, -1.06…]])
The scaler instance can then be used on new data to transform it thesame way it did on the training set:

scaler.transform([[-1., 1., 0.]])
array([[-2.44…, 1.22…, -0.26…]])

作者：-柚子皮-
来源：CSDN
原文：https://blog.csdn.net/pipisorry/article/details/52247679
版权声明：本文为博主原创文章，转载请附上博文链接！

在这里插入图片描述

二值化 Binarization

sklearn.preprocessing.Binarizer

binarizer=preprocessing.Binarizer().fit(x)# here fit does nothing
binarizer
Out[136]: Binarizer(copy=True, threshold=0.0)
binarizer.transform(x)
Out[137]:
array([[1, 0, 1],
[1, 0, 0],
[0, 1, 0]])
x
Out[138]:
array([[ 1, -1, 2],
[ 2, 0, 0],
[ 0, 1, -1]])
在这里插入图片描述

SimpleImputer的用法见sklearn相关的documentation：
https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer
在这里插入图片描述
SimpleImputer fills value for each feature也即对每列的空值进行填充，故而和老版本（即将be deprecated in newer version）的Imputer相比，没有了axis=0(即逐行）的参数的位置。

import numpy as np
from sklearn.impute import SimpleImputer

#类似于填补模型imp造出来
imp=SimpleImputer(missing_values=np.nan,strategy='mean')
imp.fit([[1,2],[np.nan,3],[7,6]])

Out[163]:
SimpleImputer(copy=True, fill_value=None, missing_values=nan, strategy=‘mean’,
verbose=0)

x=[[np.nan,2],[6,np.nan],[7,6]]
#用imp模型填补并且转换x
print(imp.transform(x))

[[4. 2. ]
[6. 3.66666667]
[7. 6. ]]

fit_transform与transform的区别见博文：
https://blog.csdn.net/quiet_girl/article/details/72517053

偷偷搞塌

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Python——sklearn包中数据标准化以及缺失值的处理函数汇集

sklearn.preprocessing:博主写的很详细：https://blog.csdn.net/pipisorry/article/details/52247679The preprocessing module further provides a utility class StandardScaler that implements the Transformer API...
复制链接

扫一扫