- setup pipeline
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
num_pipeline = Pipeline([
(‘imputer’, Imputer(strategy=“median”)), #中值写入
(‘attribs_adder’, CombinedAttributesAdder()),#增加比例列
(‘std_scaler’, StandardScaler()),#标准化
])
housing_num_tr = num_pipeline.fit_transform(housing_num)
And a transformer to just select a subset of the Pandas DataFrame columns:
from sklearn.base import BaseEstimator, TransformerMixin
Create a class to select numerical or categorical columns
since Scikit-Learn doesn’t handle DataFrames yet
class DataFrameSelector(BaseEstimator, TransformerMixin):
def init(self, attribute_names):
self.attribute_names = attribute_names
def