lazypredict介绍
lazypredict是一个基于scikit-learn的机器学习包,一行代码搞定多个机器学习模型的训练。
lazypredict安装
初始环境,lazypredict的版本为0.2.7,python版本为3.9.7,scikit-learn版本为1.1.1
pip3 install lazypredict==0.2.7 numpy pandas tqdm scikit-learn xgboost lightgbm
安装完成后以一个分类问题为例,使用lazypredict可以对样本进行多个模型的简单训练。
from lazypredict.Supervised import LazyClassifier, LazyRegressor
from sklearn.model_selection import train_test_split
from sklearn import datasets
data = datasets.load_iris()
x, y = data["data"], data["target"]
x_train, x_test, y_train, y_test = train_test_split(x,y, train_size = 0.7)
print(x_train.shape, y_train.shape, x_test.shape, y_test.shape)
clf = LazyClassifier(predictions = True)
models, predictions = clf.fit(x_train, x_test, y_train, y_test)
models
import报错
lazypredict在安装过程中可能会出现与scikit-learn版本不一致的问题,若scikit-learn版本为0.24.2则无法正常import lazypredict。除了安装可用的版本外(1.1.1以上应该可以),也可以将安装后的lazypredict文件夹移动至项目目录下修改部分源码即可使用。
修改lazypredict目录下的Supervised.py,内容如下:
修改import部分
# from sklearn.utils.testing import all_estimators
from sklearn.utils import all_estimators
在 removed_classifiers中修改如下内容
# sklearn.ensemble.gradient_boosting.GradientBoostingClassifier
sklearn.ensemble.GradientBoostingClassifier
# sklearn.gaussian_process.gpc.GaussianProcessClassifier
sklearn.gaussian_process._gpc.GaussianProcessClassifier
# sklearn.neural_network.multilayer_perceptron.MLPClassifier
sklearn.neural_network.MLPClassifier
# sklearn.linear_model.logistic.LogisticRegressionCV
sklearn.linear_model.LogisticRegressionCV
# sklearn.neighbors.classification.RadiusNeighborsClassifier
sklearn.neighbors.RadiusNeighborsClassifier
# sklearn.ensemble.voting.VotingClassifier
sklearn.ensemble.VotingClassifier
在removed_regressors中修改如下内容
# removed_regressors = [('TheilSenRegressor', sklearn.linear_model.theil_sen.TheilSenRegressor),
removed_regressors = [('TheilSenRegressor', sklearn.linear_model.TheilSenRegressor),
# 去掉这一行
('_SigmoidCalibration', sklearn.calibration._SigmoidCalibration)