【笔记】（网格搜索，随机搜索）使用iris植物的数据，训练iris分类模型，通过模型预测识别品种

最新推荐文章于 2023-07-01 18:13:24 发布

程序猿的探索之路

最新推荐文章于 2023-07-01 18:13:24 发布

阅读量307

点赞数

分类专栏：小菜鸡加油文章标签： python pytorch 人工智能

原文链接：https://github.com/aialgorithm/Blog/blob/master/projects/%E4%B8%80%E6%96%87%E5%BD%92%E7%BA%B3Ai%E8%B0%83%E5%8F%82%E7%82%BC%E4%B8%B9%E4%B9%8B%E6%B3%95/hyper_tune.ipynb

版权

小菜鸡加油专栏收录该内容

396 篇文章 27 订阅

订阅专栏


使用iris植物的数据，训练iris分类模型，通过模型预测识别品种

# 导入模块
import pandas as pd
from sklearn.datasets import load_iris

# 加载数据集 
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['class'] = data.target
df.head()

	sepal length (cm) 	sepal width (cm) 	petal length (cm) 	petal width (cm) 	class
0 	5.1 	3.5 	1.4 	0.2 	0
1 	4.9 	3.0 	1.4 	0.2 	0
2 	4.7 	3.2 	1.3 	0.2 	0
3 	4.6 	3.1 	1.5 	0.2 	0
4 	5.0 	3.6 	1.4 	0.2 	0

# pandas_profiling是一个超实用的数据分析模块，使用它可快速数据缺失情况、数据分布、相关情况
import pandas_profiling

df.profile_report(title='iris')

# 特征工程 
# （略）该数据集质量较高，不可以不用数据清洗，缺失值填充等


# 划分目标标签y、特征x
y = df['class']
x = df.drop('class', axis=1)


#划分训练集，测试集
from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test_y = train_test_split(x, y)

%%time
# 模型训练
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier



# 选择模型 
model = RandomForestClassifier()


# 参数搜索空间
param_grid = {
    'max_depth': np.arange(1, 20, 1),
    'n_estimators': np.arange(1, 50, 10),
    'max_leaf_nodes': np.arange(2, 100, 10)

}

# 网格搜索模型参数
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='f1_micro')
grid_search.fit(x, y)
print(grid_search.best_params_)
print(grid_search.best_score_)
print(grid_search.best_estimator_)


# 随机搜索模型参数
rd_search = RandomizedSearchCV(model, param_grid, n_iter=200, cv=5, scoring='f1_micro')
rd_search.fit(x, y)
print(rd_search.best_params_)
print(rd_search.best_score_)
print(rd_search.best_estimator_)

{'max_depth': 9, 'max_leaf_nodes': 82, 'n_estimators': 41}
0.9733333333333334
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=9, max_features='auto', max_leaf_nodes=82,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=41,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)
{'n_estimators': 11, 'max_leaf_nodes': 52, 'max_depth': 15}
0.9666666666666667
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=15, max_features='auto', max_leaf_nodes=52,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=11,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)
Wall time: 1min 50s

%%time
# 贝叶斯优化
import numpy as np
from hyperopt import hp, tpe, Trials, STATUS_OK, Trials, anneal
from functools import partial
from hyperopt.fmin import fmin
from sklearn.metrics import f1_score
from sklearn.ensemble import RandomForestClassifier

def model_metrics(model, x, y):
    """ 评估函数 """
    yhat = model.predict(x)

    return  f1_score(y, yhat,average='micro')

def bayes_fmin(train_x, test_x, train_y, test_y, eval_iters=50):
    """
    bayes优化超参数
    eval_iters：迭代次数
    
    """
    
    def factory(params):
        """
        定义调参目标函数
        """
        fit_params = {

            'max_depth':int(params['max_depth']),
            'n_estimators':int(params['n_estimators']),
            'max_leaf_nodes': int(params['max_leaf_nodes'])

            }
        
        # 选择模型
        model = RandomForestClassifier(**fit_params)
        model.fit(train_x, train_y)
        # 最小化测试集（- f1score）为目标
        train_metric = model_metrics(model, train_x, train_y)
        test_metric = model_metrics(model, test_x, test_y)
        loss = - test_metric
        return {"loss": loss, "status":STATUS_OK}

    # 参数空间
    space = {
        'max_depth': hp.quniform('max_depth', 1, 20, 1),
        'n_estimators': hp.quniform('n_estimators', 2, 50, 1), 
        'max_leaf_nodes': hp.quniform('max_leaf_nodes', 2, 100, 1)
            }
    # bayes优化搜索参数
    best_params = fmin(factory, space, algo=partial(anneal.suggest,), max_evals=eval_iters, trials=Trials(),return_argmin=True)
    # 取最优参数
    best_params["max_depth"] = int(best_params["max_depth"])
    best_params["max_leaf_nodes"] = int(best_params["max_leaf_nodes"])
    best_params["n_estimators"] = int(best_params["n_estimators"])
    return best_params

#  搜索最优参数
best_params = bayes_fmin(train_x, test_x, train_y, test_y, 100)
print(best_params)

100%|████████████████████████████████████████████████| 100/100 [00:03<00:00, 25.47it/s, best loss: -0.9736842105263158]
{'max_depth': 12, 'max_leaf_nodes': 81, 'n_estimators': 49}
Wall time: 3.94 s

程序猿的探索之路

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【笔记】（网格搜索，随机搜索）使用iris植物的数据，训练iris分类模型，通过模型预测识别品种

使用iris植物的数据，训练iris分类模型，通过模型预测识别品种# 导入模块import pandas as pdfrom sklearn.datasets import load_iris# 加载数据集 data = load_iris()df = pd.DataFrame(data.data, columns=data.feature_names)df['class'] = data.targetdf.head() sepal length (cm) sepal wi...
复制链接

扫一扫