AutoML工具-AutoGluon

最新推荐文章于 2024-07-20 17:10:09 发布

樱缘之梦

最新推荐文章于 2024-07-20 17:10:09 发布

阅读量615

点赞数

文章标签：机器学习

本文链接：https://blog.csdn.net/qq_28409193/article/details/131787289

版权

1、简介

AutoGluon是AutoML的自动化工具，涉及方面有图像、文本、时间序列和表格式数据。

2、入门

2.1 安装

pip install autogluon

2.2 使用

（1）Tabular（解释是表格式数据，不知道对不对）

两个函数TabularDataset、TabularPredictor

例子参看：AutoGluon Tabular - Quick Start - AutoGluon 0.8.2 documentation

流程：training、prediction、evaluation

（2）Multimodal（多模式）

一个函数MultiModalPredictor

例子参看：AutoGluon Multimodal - Quick Start - AutoGluon 0.8.2 documentation

流程：training、prediction、evaluation

（3）Time Series（时间序列）

两个函数TimeSeriesDataFrame, TimeSeriesPredictor

例子参看：AutoGluon Time Series - Forecasting Quick Start - AutoGluon 0.8.2 documentation

流程：training、prediction、evaluation

该模型支持：GPU

3、进阶

（1）Tabular

目标：完成分类和回归任务

以分类为例，下图是基本实现流程：

from autogluon.tabular import TabularDataset, TabularPredictor


#准备数据
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
subsample_size = 500  # subsample subset of data for faster demo, try setting this to much larger values
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head()


#测试集数据
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
y_test = test_data[label]  # values to predict
test_data_nolab = test_data.drop(columns=[label])  # delete label column to prove we're not cheating
test_data_nolab.head()


#标签值
label = 'class'
print("Summary of class variable: \n", train_data[label].describe())

#模型训练和保存
save_path = 'agModels-predictClass'  # specifies folder to store trained models
predictor = TabularPredictor(label=label, path=save_path).fit(train_data)


#加载模型，预测评估
predictor = TabularPredictor.load(save_path)  # unnecessary, just demonstrates how to load previously-trained predictor from file

y_pred = predictor.predict(test_data_nolab)
print("Predictions:  \n", y_pred)
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True)


#在测试集评估模型性能
predictor.leaderboard(test_data, silent=True)

此外，还可进行调参:

from autogluon.common import space

nn_options = {  # specifies non-default hyperparameter values for neural network models
    'num_epochs': 10,  # number of training epochs (controls training time of NN models)
    'learning_rate': space.Real(1e-4, 1e-2, default=5e-4, log=True),  # learning rate used in training (real-valued hyperparameter searched on log-scale)
    'activation': space.Categorical('relu', 'softrelu', 'tanh'),  # activation function used in NN (categorical hyperparameter, default = first entry)
    'dropout_prob': space.Real(0.0, 0.5, default=0.1),  # dropout probability (real-valued hyperparameter)
}

gbm_options = {  # specifies non-default hyperparameter values for lightGBM gradient boosted trees
    'num_boost_round': 100,  # number of boosting rounds (controls training time of GBM models)
    'num_leaves': space.Int(lower=26, upper=66, default=36),  # number of leaves in trees (integer hyperparameter)
}

hyperparameters = {  # hyperparameters of each model type
                   'GBM': gbm_options,
                   'NN_TORCH': nn_options,  # NOTE: comment this line out if you get errors on Mac OSX
                  }  # When these keys are missing from hyperparameters dict, no models of that type are trained

time_limit = 2*60  # train various models for ~2 min
num_trials = 5  # try at most 5 different hyperparameter configurations for each type of model
search_strategy = 'auto'  # to tune hyperparameters using random search routine with a local scheduler

hyperparameter_tune_kwargs = {  # HPO is not performed unless hyperparameter_tune_kwargs is specified
    'num_trials': num_trials,
    'scheduler' : 'local',
    'searcher': search_strategy,
}  # Refer to TabularPredictor.fit docstring for all valid values

predictor = TabularPredictor(label=label, eval_metric=metric).fit(
    train_data,
    time_limit=time_limit,
    hyperparameters=hyperparameters,
    hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
)

模型集成（stacking/bagging）：


predictor = TabularPredictor(label=label, eval_metric=metric).fit(train_data,
    num_bag_folds=5, num_bag_sets=1, num_stack_levels=1,
    hyperparameters = {'NN_TORCH': {'num_epochs': 2}, 'GBM': {'num_boost_round': 20}},  # last  argument is just for quick demo here, omit it in real applications
)

核心参数是num_bag_folds和num_stack_levels等，但是会增加训练时间和内存占用，num_bag_sets控制k-fold进程的时间，auto_stack自动进行stack操作

特征工程是常见操作，对数据处理，缺失值，任务类型判断等任务进行相应操作。详见：AutoGluon Tabular - Feature Engineering - AutoGluon 0.8.2 documentation

（2）Multimodal Prediction

MultiModal是基于Huggingface，实现任务如下图：

（3）Time Series

时间序列支持的模型

简单预测模型：ARIMA，ETS，Theta
深度学习模型：DeepAR, Temporal Fusion Transformer
树模型：LightGBM
集成模型

特征工程：

（1）静态变量

可以在数据集增加静态变量，例如位置信息（国家、州、城市）、产品的性质（品牌、颜色、大小、重量等）

（2）跟时间相关的变量

已知变量（known covariates）：例如假期、工作日、周末等

过去变量（past covariates）:促销信息、售卖产品信息等

Backtesting：

使用多窗口滑动测试

4、总结

具体详细用法可以查看官方文档

樱缘之梦

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
打赏
0
评论
AutoML工具-AutoGluon

核心参数是num_bag_folds和num_stack_levels等，但是会增加训练时间和内存占用，num_bag_sets控制k-fold进程的时间，auto_stack自动进行stack操作。可以在数据集增加静态变量，例如位置信息（国家、州、城市）、产品的性质（品牌、颜色、大小、重量等）流程：training、prediction、evaluation。流程：training、prediction、evaluation。covariates）：例如假期、工作日、周末等。目标：完成分类和回归任务。
复制链接

扫一扫