Python使用 sklearn pipeline进行数据清洗

最新推荐文章于 2023-07-29 16:30:33 发布

SamWang_333

最新推荐文章于 2023-07-29 16:30:33 发布

阅读量696

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/qq_38844711/article/details/103450006

版权

本文介绍了如何使用Python的sklearn库创建Pipeline来处理数据。首先，通过Imputer进行中值填充缺失值，然后利用CombinedAttributesAdder添加比例列，接着使用StandardScaler进行数据标准化。同时，定义了一个DataFrameSelector类来选择DataFrame中的特定列。最后，将数值型和类别型特征的处理流程结合，通过FeatureUnion进行联合处理，完成整个数据预处理过程。

摘要由CSDN通过智能技术生成

setup pipeline

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

num_pipeline = Pipeline([
(‘imputer’, Imputer(strategy=“median”)), #中值写入
(‘attribs_adder’, CombinedAttributesAdder()),#增加比例列
(‘std_scaler’, StandardScaler()),#标准化
])
housing_num_tr = num_pipeline.fit_transform(housing_num)

And a transformer to just select a subset of the Pandas DataFrame columns:
from sklearn.base import BaseEstimator, TransformerMixin

Create a class to select numerical or categorical columns

since Scikit-Learn doesn’t handle DataFrames yet

class DataFrameSelector(BaseEstimator, TransformerMixin):
def init(self, attribute_names):
self.attribute_names = attribute_names
def

最低0.47元/天解锁文章

SamWang_333

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python使用 sklearn pipeline进行数据清洗

setup pipelinefrom sklearn.pipeline import Pipelinefrom sklearn.preprocessing import StandardScalernum_pipeline = Pipeline([(‘imputer’, Imputer(strategy=“median”)), #中值写入(‘attribs_adder’, Combi...
复制链接

扫一扫

专栏目录