如果要定义转换器,所需要的只是创建一个类,然后应用以下三个方法:fit()、transform()、fit_transform()。如果添加TransformerMixin作为基类,就可以直接得到最后一个方法,同时,如果添加BaseEstimator作为基类(并在构造函数中避免*args和**kargs),你还能额外获得两个非常有用的自动调整超参数的方法(get_params()和set_params())。
from sklearn.base import BaseEstimator, TransformerMixin
room_ix, bedrooms_ix, population_ix, household_ix = 3, 4, 5, 6
class CombinedAttributesAdder(BaseEstimator, TransformerMixin):
def __init__(self, add_bedrooms_per_room = True):
self.add_bedrooms_per_room = add_bedrooms_per_room
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
rooms_per_household = X[:, room_ix] / X[:, household_ix]
population_per_househould = X[:, population_ix] / X[:, household_ix]
if self.add_bedrooms_per_room:
bedrooms_per_room = X[:, bedrooms_ix] / X[:, room_ix]
return np.c_[X, rooms_per_household, population_per_househould,
bedrooms_per_room]
else:
return np.c_[X, rooms_per_household, population_per_househould]
attr_adder = CombinedAttributesAdder(add_bedrooms_per_room=False)
housing_extra_attribs = attr_adder.transform(housing.values)
额外知识点
- np.r_是按列连接两个矩阵,就是把两矩阵上下相加,要求列数相等。
- np.c_是按行连接两个矩阵,就是把两矩阵左右相加,要求行数相等。