logisticregression参数_机器学习模型评估与超参数调优详解

最新推荐文章于 2024-07-15 14:46:23 发布

weixin_39986060

最新推荐文章于 2024-07-15 14:46:23 发布

阅读量661

点赞数

文章标签： logisticregression参数 logisticregression参数、

本文深入探讨了机器学习模型的评估和超参数调优，介绍了如何使用管道简化工作流程，通过k折交叉验证评估模型性能，利用学习和验证曲线调试算法，以及运用网格搜索进行超参数调优。同时，文章还强调了不同性能评估指标的重要性，特别是准确率之外的考量因素。

摘要由CSDN通过智能技术生成

以下文章来源于Datawhale，作者李祖贤

导读

当我们建立好了相关模型以后我们怎么评价我们建立的模型的好坏以及优化我们建立的模型呢？

本次分享的内容就是关于机器学习模型评估与超参数调优的。本次分享的内容包括：

用管道简化工作流
使用k折交叉验证评估模型性能
使用学习和验证曲线调试算法
通过网格搜索进行超参数调优
比较不同的性能评估指标

一、用管道简化工作流

在很多机器学习算法中，我们可能需要做一系列的基本操作后才能进行建模，如：在建立逻辑回归之前，我们可能需要先对数据进行标准化，然后使用PCA将维，最后拟合逻辑回归模型并预测。那有没有什么办法可以同时进行这些操作，使得这些操作形成一个工作流呢？下面请看代码：

1. 加载基本工具库

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use("ggplot")
import warnings
warnings.filterwarnings("ignore")

2. 加载数据，并做基本预处理

# 加载数据
df = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data",header=None)
# 做基本的数据预处理
from sklearn.preprocessing import LabelEncoder

X = df.iloc[:,2:].values
y = df.iloc[:,1].values
le = LabelEncoder()    #将M-B等字符串编码成计算机能识别的0-1
y = le.fit_transform(y)
le.transform(['M','B'])
# 数据切分8：2
from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,stratify=y,random_state=1)

3. 把所有的操作全部封在一个管道pipeline内形成一个工作流：标准化+PCA+逻辑回归

完成以上操作，共有两种方式：

方式1：make_pipeline

# 把所有的操作全部封在一个管道pipeline内形成一个工作流：
## 标准化+PCA+逻辑回归

### 方式1：make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

pipe_lr1 = make_pipeline(StandardScaler(),PCA(n_components=2),LogisticRegression(random_state=1))
pipe_lr1.fit(X_train,y_train)
y_pred1 = pipe_lr.predict(X_test)
print("Test Accuracy: %.3f"% pipe_lr1.score(X_test,y_test))

Test Accuracy: 0.956

方式2：Pipeline

# 把所有的操作全部封在一个管道pipeline内形成一个工作流：
## 标准化+PCA+逻辑回归

### 方式2：Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

pipe_lr2 = Pipeline([['std',StandardScaler()],['pca',PCA(n_components=2)],['lr',LogisticRegression(random_state=1)]])
pipe_lr2.fit(X_train,y_train)
y_pred2 = pipe_lr2.predict(X_test)
print("Test Accuracy: %.3f"% pipe_lr2.score(X_test,y_test))

Test Accuracy: 0.956

二、使用k折交叉验证评估模型性能

最低0.47元/天解锁文章

weixin_39986060

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫