Python自动化机器学习（AutoML）快速实现，只需要PyCaret 2.0、AutoGluon两个包

易烊千蝈

已于 2022-03-11 09:15:39 修改

阅读量4.6k

点赞数 4

分类专栏：人工智能算法 Python相关文章标签： python 自动化机器学习

于 2022-03-11 09:14:56 首次发布

本文链接：https://blog.csdn.net/weixin_39490300/article/details/123416083

版权

算法同时被 3 个专栏收录

64 篇文章

订阅专栏

人工智能

59 篇文章

订阅专栏

Python相关

46 篇文章

订阅专栏

文章目录

1.PyCaret 2.0
2.超性能自动机器学习包AutoGluon

机器学习和深度学习领域也在向低人工自动化迈进，因此本次向大家介绍自动机器学习工具集，我将从（调包侠2.0）PyCaret 2.0、AutoGluon 讲起。

1.PyCaret 2.0

在这里插入图片描述

官方网站：
https://github.com/pycaret/pycaret/releases/tag/2.0

今天推荐的pycaret 便是致力于自动化机器学习的python 库，它还无法面向最终用户，因为它没有GUI。但是低代码的优势，让它离这个目标很近，相信很多网页开发者可以很轻易地以pycaret为核心，开发出面向最终用户的机器学习平台。今年8月份，pycaret更新到2.0 版本，新增加了AUTO ML 应用，以及集成了ML FLOW来管理机器学习模型的“生产过程”。

为什么是PyCaret 2.0 而不是其他autoML？

在这里插入图片描述

两大优势：
快速验证模型可以帮助确定模型中的以下内容：

1 确认数据的可用性；
2 评估未来数据获取的特征维度和数据量大小；
3 获得 base-line模型

出图导向：各类报告、论文、PPT都需要图表作为成果载体，PyCaret内置心动UI交互出图界面。

安装pycaret

# 创建conda环境
conda create --name yourenvname python=3.6  
# 激活环境
conda activate yourenvname  
# 安装 pycaret 
pip install pycaret==2.0  
# 创建与conda环境链接的笔记本内核python -m ipykernel install --user --name yourenvname --display-name“ display-name”

实际使用

以英国共享自行车数据为例

from pycaret.datasets import get_data
import pandas as pd
dataset = pd.read_csv('london_merged.csv')
dataset.head(5)

部分数据展示：
在这里插入图片描述
之后，构建数据预处理模块

from pycaret.regression import *
s = setup(data, target = 'cnt',ignore_features=['timestamp'])
eda(display_format = 'bokeh')

有超级多得图可供选择
在这里插入图片描述
高清出图

改变scale选择合适的尺寸分辨率出图

plot_model(ll, plot = 'auc', scale = 2)

# 导入分类模块
from pycaret.classification import *

# 初始化设置
clf1 = setup(data, target = 'name-of-target')

# 训练adaboost模型
adaboost = create_model('ada')

# AUC可视化
plot_model(adaboost, plot = 'auc')

# 决策边界
plot_model(adaboost, plot = 'boundary')

# 精确召回曲线
plot_model(adaboost, plot = 'pr')

# 验证曲线
plot_model(adaboost, plot = 'vc')

2.超性能自动机器学习包AutoGluon

官网地址：https://auto.gluon.ai/
AutoGluon官方文档：AutoGluon：AutoML for Text， Image， and Tabular Data — AutoGluon Documentation 0.3.1 文档[https://auto.gluon.ai/stable/index.html]

AutoGluon特点总结如下。三大应用领域image（image classification、object detection）text（text classification）tabular data（tabular prediction）两大功能自动调参不仅支持mxnet，还支持PyTorch支持的搜索策略包括random search、grid search、RL、Bayesian optimization等NAS（仅支持image classification，目前只有ENAS）。
官网提供官方安装地址：
在这里插入图片描述
如果不熟悉如何镜像安装的可以参考本人手把手pip安装pytorch教程。pip手把手安装pytorch，保姆级教学

conda create -n myenv python=3.9 cudatoolkit=11.3 -y
conda activate myenv

pip3 install torch==1.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

pip3 install -U pip
pip3 install -U setuptools wheel

pip3 install autogluon

如何运行分类器

from autogluon.tabular import TabularDataset, TabularPredictor
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
subsample_size = 500  # 大数据下采样快速训练
train_data = train_data.sample(n=subsample_size, random_state=0)
label = 'class'
save_path = 'agModels-predictClass'  # specifies folder to store trained models
predictor = TabularPredictor(label=label, path=save_path).fit(train_data)

测试集验证

test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
y_test = test_data[label]  # values to predict
test_data_nolab = test_data.drop(columns=[label])  # delete label column to prove we're not cheating
predictor.leaderboard(test_data, silent=True)

横向对比

在这里插入图片描述

使用高精度模式

time_limit = 500 # for quick demonstration only, you should set this to longest time you are willing to wait (in seconds)
metric = 'roc_auc'  # specify your evaluation metric here
predictor = TabularPredictor(label, eval_metric=metric).fit(train_data, time_limit=time_limit, presets='best_quality')
predictor.leaderboard(test_data, silent=True)