自动化调参NNI学习(三)：使用python启动NNI框架调整随机森林(RandomForest)模型

最新推荐文章于 2022-03-05 20:18:34 发布

呆萌的代Ma

最新推荐文章于 2022-03-05 20:18:34 发布

阅读量851

点赞数

分类专栏： python 文章标签： python 自动化随机森林

本文为CSDN博主"呆萌的代Ma"原创文章，转载请注明博客链接：https://blog.csdn.net/weixin_35757704/

本文链接：https://blog.csdn.net/weixin_35757704/article/details/120972280

版权

python 专栏收录该内容

593 篇文章 40 订阅

订阅专栏

首先按照NNI框架的要求写一个调参的代码文件my_rf.py：

import nni
from sklearn.model_selection import train_test_split
import logging
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

LOG = logging.getLogger('sklearn_classification')


def load_data():  # 加载生成的数据
    gen_x, gen_y = make_classification(n_samples=10000, n_informative=10, n_classes=5)
    x_train, x_test, y_train, y_test = train_test_split(gen_x, gen_y, random_state=0, test_size=.25)
    return x_train, x_test, y_train, y_test


def get_model(PARAMS):
    '''Get model according to parameters'''
    model = RandomForestClassifier()
    model.n_estimators = PARAMS.get('n_estimators')
    model.min_samples_split = PARAMS.get('min_samples_split')
    return model


def run(X_train, X_test, y_train, y_test, model):
    '''Train model and predict result'''
    model.fit(X_train, y_train)
    score = model.score(X_test, y_test)
    LOG.debug('score: %s', score)
    nni.report_final_result(score)


if __name__ == '__main__':
    X_train, X_test, y_train, y_test = load_data()
    try:
        # get parameters from tuner
        RECEIVED_PARAMS = nni.get_next_parameter()
        LOG.debug(RECEIVED_PARAMS)
        PARAMS = {
            'n_estimators': 10,
            'min_samples_split': 10,
        }
        PARAMS.update(RECEIVED_PARAMS)
        LOG.debug(PARAMS)
        model = get_model(PARAMS)
        run(X_train, X_test, y_train, y_test, model)
    except Exception as exception:
        LOG.exception(exception)

然后使用python去启动它：

原本需要写的search_space.json就直接写在代码中
运行环境config.yml文件中的配置就也直接写在代码中

由此得到的python代码是：

from nni.experiment import Experiment

search_space = {
    "n_estimators": {"_type": "randint", "_value": [10, 250]},
    "min_samples_split": {"_type": "randint", "_value": [2, 25]}
} # 这里配置调参的范围

experiment = Experiment('local')
experiment.config.experiment_name = 'my rf example'
experiment.config.trial_concurrency = 2  # 同时进行的数量
experiment.config.max_trial_number = 10  # 最大案例
experiment.config.search_space = search_space
experiment.config.trial_command = 'python my_rf.py'
# experiment.config.trial_code_directory = Path(__file__).parent
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.training_service.use_active_gpu = True

experiment.run(port=12345) # 本地的端口
experiment.get_status()
experiment.export_data()
experiment.get_job_metrics()