使用Optuna获得准确的Scikit学习模型:超参数框架

本文介绍了如何利用Optuna这一流行框架为Scikit-learn模型找到最佳超参数。通过实例展示了随机森林和逻辑回归的优化过程,强调了Optuna易于使用、文档丰富、算法适应性强以及具备修剪和可视化功能的特点。
摘要由CSDN通过智能技术生成

Hyper-parameter frameworks have been quite in the discussions in the past couple of months. With several packages developed and still in progress, it has become a tough choice to pick one. Such frameworks not only help fit an accurate model but can help boost Data scientists’ efficiency to the next level. Here I am showing how a recent popular framework Optuna can be used to get the best parameters for any Scikit-learn model. I have only implemented Random Forest and Logistic Regression as an example, but other algorithms can be implemented in a similar way shown here.

^ h yper参数框架已经相当在过去几个月的讨论。 随着一些软件包的开发并且仍在进行中,选择一个软件包已经成为一个艰难的选择。 这样的框架不仅有助于拟合准确的模型,而且可以帮助将数据科学家的效率提高到一个新的水平。 在这里,我向您展示如何使用最近流行的Optuna框架来获取任何Scikit学习模型的最佳参数。 我仅以“ 随机森林逻辑回归”为例进行了说明,但是其他算法也可以按照此处显示的类似方式来实现。

为什么选择奥图纳? (Why Optuna?)

Optuna can become one of the work-horse tools if integrated into everyday experimentations. I was deeply impressed when I implemented Logistic Regression using Optuna with such minimal effort. Here are a couple of reasons why I like Optuna:

如果将Optuna集成到日常实验中,则可以成为工作工具之一。 当我以极少的努力使用Optuna实现Logistic回归时,我印象深刻。 我喜欢Optuna的原因有两个:

  • Easy use of API

    易于使用的API
  • Great documentation

    优质的文档
  • Flexibility to accommodate any algorithms

    适应任何算法的灵活性
  • Features like pruning and in-built great visualization modules

    修剪和内置出色的可视化模块等功能

Documentation: https://optuna.readthedocs.io/en/stable/index.html

文档https : //optuna.readthedocs.io/en/stable/index.html

Github: https://github.com/optuna/optuna

GitHubhttps : //github.com/optuna/optuna

Before we start looking at the functionalities, we need to make sure that we have installed pre-requisite packages:

在开始研究功能之前,我们需要确保已安装必备软件包:

  1. Optuna

    奥图纳
  2. Plotly

    密谋
  3. Pandas

    大熊猫
  4. Scikit-Learn

    Scikit学习

基本参数和定义: (Basic parameters and defining:)

Setting up the basic framework is pretty simple and straightforward. It can be divided broadly into 4 steps:

设置基本框架非常简单明了。 它可以大致分为4个步骤:

  1. Define an objective function (Step 1)

    定义目标函数 (步骤1)

  2. Define a set of hyperparameters to try (Step 2)

    定义一组要尝试的超参数 (步骤2)

  3. Define the variable/metrics you want to optimize(Step 3)

    定义要优化的变量/指标(第3步)
  4. Finally, run the function. Here you need to mention:

    最后, 运行该函数。 在这里您需要提及:

  • the scoring function/variable you are trying to optimize is to be maximized or minimized

    评分功能/变量 您试图优化是要最大化还是最小化

  • the number of trials you want to make. Higher the number of hyper-parameters and more the number of trials defined, the more computationally expensive it is (unless you have a beefy machine or a GPU!)

    您要进行的试验次数 。 超参数的数量越多,定义的试验数量越多,计算量就越大(除非您拥有强大的机器或GPU!)

In the Optuna world, the term Trial is a single call of the objective function, and multiple such Trials together are called Study.

Optuna世界中,“ 试用 ”一词是对目标函数的一次调用,而多个这样的“试用”一起称为“ 学习”。

Following is a basic implementation of Random Forest and Logistic Regression from scikit-learn package:

以下是从scikit-learn包中随机森林逻辑回归的基本实现:

# Importing the Packages:
import optuna
import pandas as pd
from sklearn import linear_model
from sklearn import ensemble
from sklearn import datasets
from sklearn import model_selection


#Grabbing a sklearn Classification dataset:
X,y = datasets.load_breast_cancer(return_X_y=True, as_frame=True)


#Step 1. Define an objective function to be maximized.
def objective(trial):


    classifier_name = trial.suggest_categorical("classifier", ["LogReg", "RandomForest"])
    
    # Step 2. Setup values for the hyperparameters:
    if classifier_name == 'LogReg':
        logreg_c = trial.suggest_float("logreg_c", 1e-10, 1e10, log=True)
        classifier_obj = linear_model.LogisticRegression(C=logreg_c)
    else:
        rf_n_estimators = trial.suggest_int("rf_n_estimators", 10, 1000)
        rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)
        classifier_obj = ensemble.RandomForestClassifier(
            max_depth=rf_max_depth, n_estimators=rf_n_estimators
        )


    # Step 3: Scoring method:
    score = model_selection.cross_val_score(classifier_obj, X, y, n_jobs=-1, cv=3)
    accuracy = score.mean()
    return accuracy


# Step 4: Running it
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)

When you run the above code, the output would be something like below:

当您运行上面的代码时,输​​出将如下所示:

  • 1
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值