Studying, Thinking, Implementing

scikit-learn fit() transfrom() fit_transform()


In scikit-learn estimator api,

fit() : used for generating learning model parameters from training data

transform() : parameters generated from fit() method,applied upon model to generate transformed data set.

fit_transform() :combination of fit() and transform() api on same data set

enter image description here

Checkout Chapter-4 from this book & answer from stackexchange for more clarity

Further more explanation as follows (an example to explain the meaning of fit() and fit_transform() ):

To center the data (make it have zero mean and unit standard error), you subtract the mean and then divide the result by the standard deviation.

You do that on the training set of data. But then you have to apply the same transformation to your testing set (e.g. in cross-validation), or to newly obtained examples before forecast. But you have to use the same two parameters μ and σ  (values) that you used for centering the training set.

Hence, every sklearn's transform's fit() just calculates the parameters (e.g. μ and σ  in case of StandardScaler) and saves them as an internal objects state. Afterwards, you can call its transform() method to apply the transformation to a particular set of examples.

fit_transform() joins these two steps and is used for the initial fitting of parameters on the training set x, but it also returns a transformed x. Internally, it just calls first fit() and then transform() on the same data.

想对作者说点什么? 我来说一句