Hands-On Machine Learning with Scikit-Learn and TensorFlow中出现的问题总结
Problem
housing_prepared = full_pipeline.fit_transform(housing)
TypeError: fit_transform() takes 2 positional arguments but 3 were given
Note from the Author or Editor
- The LabelEncoder and LabelBinarizer classes were designed for preprocessing labels, not input features, so their fit() and fit_transform() methods only accept one parameter y instead of two parameters X and y. The proper way to convert categorical input features to one-hot vectors should be to use the OneHotEncoder class, but unfortunately it does not work with string categories, only integer categories (people are working on it, see Pull Request 7327: https://github.com/scikit-learn/scikit-learn/pull/7327). In the meantime, one workaround was to use the LabelBinarizer class, as shown in the book.
- Unfortunately, since Scikit-Learn 0.19.0, pipelines now expect each estimator to have a fit() or fit_transform() method with two parameters X and y, so the code shown in the book won’t work if you are using Scikit-Learn 0.19.0 (and possibly later as well). Avoiding such breakage is the reason why I specified the library versions to use in the requirements.txt file (including scikit-learn 0.18.1).
- A temporary workaround (until PR 7327 is finished and you can use a OneHotEncoder) is to create a small wrapper class around the LabelBinarizer class, to fix its fit_transform() method, like this:
class PipelineFriendlyLabelBinarizer(LabelBinarizer):
def fit_transform(self, X, y=None):
return super(PipelineFriendlyLabelBinarizer, self).fit_transform(X)
- 该问题的原因主要是因为Scikit-Learn版本差异导致的,修补方式也比较简单,作者提供的思路是重新定义用户自己的LabelBinarizer类,并继承自LabelBinarizer类本身,然后复写fit_transform()方法,修改参数变量的个数,增加y=None。
引用
“Hands-On Machine Learning with Scikit-Learn and TensorFlow” 这本书上还是有很多问题的, 好在作者和编者对于读者提出的问题给出了解答和回复。本文也是源于此网站内容整理,下面是该网址链接: