超参数选择方法

understand?

于 2024-04-25 20:57:38 发布

阅读量328

点赞数 11

文章标签：机器学习算法人工智能

本文链接：https://blog.csdn.net/weixin_74875244/article/details/138199052

版权

本文介绍了传统手动调参的局限性，着重讲解了交叉验证用于数据集划分以提高模型可靠性，以及网格搜索在寻找最优超参数方面的优势。通过KNN算法实例，展示了如何使用网格搜索和交叉验证进行手写数字识别的模型调优过程。

摘要由CSDN通过智能技术生成

1. 传统或手动调参

在传统的调优中，我们通过手动检查随机超参数集来训练算法，并选择最适合我们目标的参数集。

缺点:

不能保证得到最佳的参数组合。
这是一种反复试验的方法，因此会消耗更多的时间。

不作为重点来讲。

2.交叉验证

是一种数据集的分割方法，将训练集划分为 n 份，拿一份做验证集（测试集）、其他n-1份做训练集。

交叉验证法，是划分数据集的一种方法，目的就是为了得到更加准确可信的模型评分。

3.网格搜索

• 为什么需要网格搜索？

• 模型有很多超参数，其能力也存在很大的差异。需要手动产生很多超参数组合，来训练模型

• 每组超参数都采用交叉验证评估，最后选出最优参数组合建立模型。

• 网格搜索是模型调参的有力工具。寻找最优超参数的工具！只需要将若干参数传递给网格搜索对象，它自动帮我们完成不同超参数的组合、模型训练、模型评估，最终返回一组最优的超参数。

• 网格搜索 + 交叉验证的强力组合 (模型选择和调优)

• 交叉验证解决模型的数据输入问题(数据集划分)得到更可靠的模型

• 网格搜索解决超参数的组合 • 两个组合再一起形成一个模型参数调优的解决ji交叉验证网格搜索 – API代码实现：

#1.加载数据
from sklearn.datasets import load_iris
iris_data=load_iris()
#数据划分
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(iris_data.data,iris_data.target,train_size=0.3,random_state=22)
#数据处理
from sklearn.preprocessing import StandardScaler
transfer=StandardScaler()
x_train=transfer.fit_transform(x_train)
x_test=transfer.transform(x_test)

#实例化模型
from sklearn.neighbors import KNeighborsClassifier
estimator=KNeighborsClassifier()

#交叉验证
from sklearn.model_selection import GridSearchCV
estimator=GridSearchCV(estimator=estimator,param_grid={'n_neighbors':[1,2,3,4,5]},cv=5)
estimator.fit(x_train,y_train)

#保存交叉验证结果
import pandas as pd
myret=pd.DataFrame(estimator.cv_results_)
myret.to_csv(path_or_buf='./myretresult.csv')

#模型评估
score=estimator.score(x_test,y_test)
print(f'模型score-->{score}')

结果如下：

4.利用KNN算法实现手写数字识别

以下是利用超参数交叉验证网格搜索KNN算法对手写数字识别的代码实现：

代码中数据为已知数据

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
def train_model():
    #加载数据
    data=pd.read_csv('./data/手写数字识别.csv')
    #print(data)
    #数据预处理
    x=data.iloc[:,1:]/255
    y=data.iloc[:,0]
    s_data=train_test_split(x,y,train_size=0.3,stratify=y,random_state=22)
    x_train,x_test,y_train,y_test=s_data

    #实例化模型
    estimator=KNeighborsClassifier()

    #定义网络参数
    param={'n_neighbors': range(1,21)}
    estimator=GridSearchCV(estimator=estimator,param_grid=param,cv=6)
    estimator.fit(x_train,y_train)

    #模型评估
    score=estimator.score(x_test,y_test)
    print(f'最优参数为：{estimator.best_params_}')
    print(f'分数：{estimator.best_score_}')

if __name__ == '__main__':
    train_model()

结果如下：

understand?

关注

11
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
超参数选择方法

只需要将若干参数传递给网格搜索对象，它自动帮我们完成不同超参数的组合、模型训练、模型评估，最终返回一组最优的超参数。是一种数据集的分割方法，将训练集划分为 n 份，拿一份做验证集（测试集）、其他n-1份做训练集。在传统的调优中，我们通过手动检查随机超参数集来训练算法，并选择最适合我们目标的参数集。• 模型有很多超参数，其能力也存在很大的差异。交叉验证法，是划分数据集的一种方法，目的就是为了得到更加准确可信的模型评分。• 交叉验证解决模型的数据输入问题(数据集划分)得到更可靠的模型。
复制链接

扫一扫