机器学习-超参数调整-验证曲线（8）

最新推荐文章于 2023-03-27 10:53:49 发布

我欲乘风归去�

最新推荐文章于 2023-03-27 10:53:49 发布

阅读量429

点赞数

文章标签：机器学习 python sklearn

本文链接：https://blog.csdn.net/qq_51724097/article/details/126580436

版权

验证曲线理解：

1. 模型性能 = f(超参数) 分类模型输出的是f1得分
2. 验证曲线的目的是为了获得更优的超参数，用在建模之前

代码实现：

import numpy as np
import pandas as pd
import sklearn.preprocessing as sp
import sklearn.ensemble as se
import sklearn.model_selection as ms
import matplotlib.pyplot as plt

data_pd = pd.read_csv('C:/Users/81936/Desktop/car.txt', delimiter=",")
data = np.array(data_pd)
train_x, train_y = [], []
encoders = [] # 存储所有的标签编码规则，用于下面的预测
for index, row in enumerate(data.T):
encoder = sp.LabelEncoder() # 创建了一个编码器，能将文本性的类别，转成0，1，2....数字
if index < (len(data.T) - 1):
train_x.append(encoder.fit_transform(row)) # 使用编码器重新编码1行的类型
else:
train_y.append(encoder.fit_transform(row))
encoders.append(encoder) # 存储所有的标签编码规则，用于下面的预测
train_x = np.array(train_x).T
train_y = np.array(train_y).T.reshape(-1)

# 创建一个随机森林分类器模型
# max_depth最大深度（层数） n_estimators树的个数， random_state随机种子
model = se.RandomForestClassifier(max_depth=9, n_estimators=140, random_state=7)

# 验证曲线选择最优的n_estimators树的个数, cv=5相当于做了交叉验证
train_scores, test_scores = ms.validation_curve(model, train_x, train_y, param_name = 'n_estimators', param_range = np.arange(50, 550, 50), cv=5

test_scores输出：

[[0.69942197 0.8150289  0.79768786 0.83188406 0.90434783]
 [0.70231214 0.78034682 0.79479769 0.83768116 0.89855072]
 [0.72254335 0.78034682 0.79479769 0.82608696 0.90434783]
 [0.69653179 0.76878613 0.79479769 0.83188406 0.89855072]
 [0.64739884 0.77456647 0.79479769 0.83478261 0.89855072]
 [0.68786127 0.78034682 0.79479769 0.84637681 0.89855072]
 [0.65895954 0.77456647 0.79479769 0.84057971 0.89855072]
 [0.69364162 0.77745665 0.79479769 0.84637681 0.89855072]
 [0.7283237  0.7716763  0.79479769 0.84347826 0.89565217]
 [0.73699422 0.77456647 0.79479769 0.84057971 0.89565217]] 选了10次超参数，每次都做5个交叉验证

# 画折线图
plt.grid(linestyle=':') # 冒号代表点线
plt.plot(np.arange(50, 550, 50), test_scores.mean(axis=1), 'o-', color = 'dodgerblue', label = 'validation curve') # 'o-' 连点成线
plt.legend()
plt.show()