sklearn之Pipeline 估计器

最新推荐文章于 2023-11-15 17:49:55 发布

月疯

最新推荐文章于 2023-11-15 17:49:55 发布

阅读量329

点赞数

分类专栏：【人工智能AI】

本文链接：https://blog.csdn.net/chehec2010/article/details/116239724

版权

【人工智能AI】专栏收录该内容

57 篇文章 8 订阅

订阅专栏

Pipeline

Pipeline 将若干个估计器按顺序连在一起，比如

特征提取 -> 降维 -> 拟合 -> 预测

在整个 Pipeline 中，它的属性永远和最后一个估计器属性一样

如果最后一个估计器是预测器，那么 Pipeline 是预测器
如果最后一个估计器是转换器，那么 Pipeline 是转换器

pip作为转换器测试：

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MinMaxScaler

a=np.array([[1,2,3,4,5,6,np.NAN,5],[3,4,5,6,np.NAN,3,np.NAN,9]])
X=np.transpose(a)#转换
print(X)

#impleImputer 起名叫 impute，MinMaxScaler起名叫 normalize。
pipp=Pipeline([("impute",SimpleImputer(missing_values=np.nan,strategy="mean")),("normalize",MinMaxScaler())])

#因为这是转换器，所以pipp也是转换器
X_pro=pipp.fit_transform(X)

print(X_pro)

#单独尝试一下
aa=SimpleImputer(missing_values=np.nan,strategy="mean").fit_transform(X)
mms=MinMaxScaler().fit_transform(aa)

print(mms)#结果和上面的是一样的

测试结果：

F:\开发工具\pythonProject\tools\venv\Scripts\python.exe F:/开发工具/pythonProject/tools/python的sklear学习/sklearn07.py
[[ 1.  3.]
 [ 2.  4.]
 [ 3.  5.]
 [ 4.  6.]
 [ 5. nan]
 [ 6.  3.]
 [nan nan]
 [ 5.  9.]]
[[0.         0.        ]
 [0.2        0.16666667]
 [0.4        0.33333333]
 [0.6        0.5       ]
 [0.8        0.33333333]
 [1.         0.        ]
 [0.54285714 0.33333333]
 [0.8        1.        ]]

Process finished with exit code 0

FeatureUnion

如果我们想在一个节点同时运行几个估计器，我们可用 FeatureUnion

策略：

对分类型变量：获取 -> 中位数填充 -> 独热编码
对数值型变量：获取 -> 均值填充 -> 标准化

主要就是 transform 函数中，将输入的 DataFrame X 根据属性名称来获取其值。

接下来建立一个流水线 full_pipe，它并联着两个流水线

categorical_pipe 处理分类型变量

DataFrameSelector 用来获取

SimpleImputer 用出现最多的值来填充 None

OneHotEncoder 来编码返回非稀疏矩阵

numeric_pipe 处理数值型变量

DataFrameSelector 用来获取

SimpleImputer 用均值来填充 NaN

normalize 来规范化数值

import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.pipeline import FeatureUnion
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.base import BaseEstimator,TransformerMixin

class DataFrameSelector(BaseEstimator,TransformerMixin):
    def __init__(self,attribute_names):
        self.attribute_names=attribute_names
    def fit(self,X,y=None):
        return self
    def transform(self,X,y=None):
        return X[self.attribute_names].values

#创建一个字典
fe={"height":[1.67,1.89,np.NAN,1.66,1.88,np.NAN],
    "weight":[56,78,92,np.NAN,78,92],
    "age":[26,34,18,34,25,27],
    "love":["apple","origine","piss","loss","good",None]
}
X=pd.DataFrame(fe)
categorical_feature=["love"]
numeric_feature=["height","age","weight"]

categorical_pipe=Pipeline([
    ("select",DataFrameSelector(categorical_feature)),
    ("impute",SimpleImputer(missing_values=None,strategy="most_frequent")),
    ("one_hot_encode",OneHotEncoder(sparse=False))
])

numeric_pipe=Pipeline([
    ("select",DataFrameSelector(numeric_feature)),
    ("impute",SimpleImputer(missing_values=np.nan,strategy="mean")),
    ("normalize",MinMaxScaler())
])

full_pipe=FeatureUnion(transformer_list=[
    ("numeric_pipe",numeric_pipe),
    ("categorical_pipe",categorical_pipe)
])
x_pro=full_pipe.fit_transform(X)
print(x_pro)

测试结果：

F:\开发工具\pythonProject\tools\venv\Scripts\python.exe F:/开发工具/pythonProject/tools/python的sklear学习/sklearn08.py
[[0.04347826 0.5        0.         1.         0.         0.
  0.         0.        ]
 [1.         1.         0.61111111 0.         0.         0.
  1.         0.        ]
 [0.5        0.         1.         0.         0.         0.
  0.         1.        ]
 [0.         1.         0.64444444 0.         0.         1.
  0.         0.        ]
 [0.95652174 0.4375     0.61111111 0.         1.         0.
  0.         0.        ]
 [0.5        0.5625     1.         1.         0.         0.
  0.         0.        ]]

Process finished with exit code 0

月疯

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
sklearn之Pipeline 估计器

PipelinePipeline将若干个估计器按顺序连在一起，比如特征提取 -> 降维 -> 拟合 -> 预测在整个Pipeline中，它的属性永远和最后一个估计器属性一样如果最后一个估计器是预测器，那么Pipeline是预测器如果最后一个估计器是转换器，那么Pipeline是转换器 pip作为转换器测试：import numpy as npfrom sklearn.pipeline import Pipelinefrom ...
复制链接

扫一扫