机器学习最全合集——基于sklearn快速实现11种机器学习分类模型的训练与评估

基于sklearn快速实现11种机器学习分类模型的训练与评估



关于作者


作者:小白熊

作者简介:精通python、matlab、c#语言,擅长机器学习,深度学习,机器视觉,目标检测,图像分类,姿态识别,语义分割,路径规划,智能优化算法,数据分析,各类创新融合等等。

联系邮箱:xbx3144@163.com

科研辅导、知识付费答疑、个性化定制以及其他合作需求请联系作者~



一、概述


  在数据科学领域,选择合适的机器学习模型至关重要。随着数据的爆炸性增长,各行各业对机器学习的需求也在不断上升。然而,面对众多的模型选择,如何快速而有效地搭建和评估机器学习模型成为了一个挑战。为此,scikit-learn 库提供了丰富的工具和接口,使得构建机器学习模型变得更加便捷。

  本文将详细介绍如何使用 scikit-learn 库实现 11 种流行的分类模型。我们将从数据准备开始,逐步进行模型的搭建、训练和评估过程。通过简单的示例,本文将展示如何快速实施这些模型,帮助你更好地理解每种模型的特性和应用场景。小白也能快速上手机器学习!



二、数据准备


  首先需要准备一个数据集,这里使用 Iris 数据集作为示例。并使用train_test_split函数按照8:2的比例划分训练集和验证集。

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# 加载数据集
iris = load_iris()
X = iris.data
y = iris.target

# 划分训练集和验证集
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)



三、模型评估指标


准确率 (Accuracy):表示模型正确预测的样本占总样本的比例,反映模型的整体性能。

精确率 (Precision):表示被正确预测为正类的样本占所有被预测为正类样本的比例,评估模型在正类预测上的准确性。

召回率 (Recall):表示被正确预测为正类的样本占所有实际正类样本的比例,评估模型对正类样本的捕捉能力。

F1 值 (F1-Score):精确率和召回率的调和平均数,综合反映模型在正类预测上的表现,尤其适用于类别不平衡的数据。



四、11种机器学习分类模型


1. 自适应增强 (AdaBoost)

from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# 初始化模型
ada_model = AdaBoostClassifier()

# 训练模型
ada_model.fit(X_train, y_train)

# 预测
y_train_pred = ada_model.predict(X_train)
y_val_pred = ada_model.predict(X_val)

# 训练集指标
accuracy_train = accuracy_score(y_train, y_train_pred)
precision_train = precision_score(y_train, y_train_pred, average='weighted')
recall_train = recall_score(y_train, y_train_pred, average='weighted')
f1_train = f1_score(y_train, y_train_pred, average='weighted')

# 验证集指标
accuracy_val = accuracy_score(y_val, y_val_pred)
precision_val = precision_score(y_val, y_val_pred, average='weighted')
recall_val = recall_score(y_val, y_val_pred, average='weighted')
f1_val = f1_score(y_val, y_val_pred, average='weighted')

# 打印指标
print(f'AdaBoost:')
print(f"训练集 Accuracy: {accuracy_train:.2f}, Precision: {precision_train:.2f}, Recall: {recall_train:.2f}, F1-Score: {f1_train:.2f}")
print(f"验证集 Accuracy: {accuracy_val:.2f}, Precision: {precision_val:.2f}, Recall: {recall_val:.2f}, F1-Score: {f1_val:.2f}")
print("*" * 100)



2. 人工神经网络 (ANN)

from sklearn.neural_network import MLPClassifier

# 初始化模型
ann_model = MLPClassifier(max_iter=300)

# 训练模型
ann_model.fit(X_train, y_train)

# 预测
y_train_pred = ann_model.predict(X_train)
y_val_pred = ann_model.predict(X_val)

# 训练集指标
accuracy_train = accuracy_score(y_train, y_train_pred)
precision_train = precision_score(y_train, y_train_pred, average='weighted')
recall_train = recall_score(y_train, y_train_pred, average='weighted')
f1_train = f1_score(y_train, y_train_pred, average='weighted')

# 验证集指标
accuracy_val = accuracy_score(y_val, y_val_pred)
precision_val = precision_score(y_val, y_val_pred, average='weighted')
recall_val = recall_score(y_val, y_val_pred, average='weighted')
f1_val = f1_score(y_val, y_val_pred, average='weighted')

# 打印指标
print(f'ANN:')
print(f"训练集 Accuracy: {accuracy_train:.2f}, Precision: {precision_train:.2f}, Recall: {recall_train:.2f}, F1-Score: {f1_train:.2f}")
print(f"验证集 Accuracy: {accuracy_val:.2f}, Precision: {precision_val:.2f}, Recall: {recall_val:.2f}, F1-Score: {f1_val:.2f}")
print("*" * 100)



3. 决策树 (DT)

from sklearn.tree import DecisionTreeClassifier

# 初始化模型
dt_model = DecisionTreeClassifier()

# 训练模型
dt_model.fit(X_train, y_train)

# 预测
y_train_pred = dt_model.predict(X_train)
y_val_pred = dt_model.predict(X_val)

# 训练集指标
accuracy_train = accuracy_score(y_train, y_train_pred)
precision_train = precision_score(y_train, y_train_pred, average='weighted')
recall_train = recall_score(y_train, y_train_pred, average='weighted')
f1_train = f1_score(y_train, y_train_pred, average='weighted')

# 验证集指标
accuracy_val = accuracy_score(y_val, y_val_pred)
precision_val = precision_score(y_val, y_val_pred, average='weighted')
recall_val = recall_score(y_val, y_val_pred, average='weighted')
f1_val = f1_score(y_val, y_val_pred, average='weighted')

# 打印指标
print(f'Decision Tree:')
print(f"训练集 Accuracy: {accuracy_train:.2f}, Precision: {precision_train:.2f}, Recall: {recall_train:.2f}, F1-Score: {f1_train:.2f}")
print(f"验证集 Accuracy: {accuracy_val:.2f}, Precision: {precision_val:.2f}, Recall: {recall_val:.2f}, F1-Score: {f1_val:.2f}")
print("*" * 100)



4. 额外树 (Extra Trees)

from sklearn.ensemble import ExtraTreesClassifier

# 初始化模型
et_model = ExtraTreesClassifier()

# 训练模型
et_model.fit(X_train, y_train)

# 预测
y_train_pred = et_model.predict(X_train)
y_val_pred = et_model.predict(X_val)

# 训练集指标
accuracy_train = accuracy_score(y_train, y_train_pred)
precision_train = precision_score(y_train, y_train_pred, average='weighted')
recall_train = recall_score(y_train, y_train_pred, average='weighted')
f1_train = f1_score(y_train, y_train_pred, average='weighted')

# 验证集指标
accuracy_val = accuracy_score(y_val, y_val_pred)
precision_val = precision_score(y_val, y_val_pred, average='weighted')
recall_val = recall_score(y_val, y_val_pred, average='weighted')
f1_val = f1_score(y_val, y_val_pred, average='weighted')

# 打印指标
print(f'Extra Trees:')
print(f"训练集 Accuracy: {accuracy_train:.2f}, Precision: {precision_train:.2f}, Recall: {recall_train:.2f}, F1-Score: {f1_train:.2f}")
print(f"验证集 Accuracy: {accuracy_val:.2f}, Precision: {precision_val:.2f}, Recall: {recall_val:.2f}, F1-Score: {f1_val:.2f}")
print("*" * 100)



5. 梯度增强机 (GBM)

from sklearn.ensemble import GradientBoostingClassifier

# 初始化模型
gbm_model = GradientBoostingClassifier()

# 训练模型
gbm_model.fit(X_train, y_train)

# 预测
y_train_pred = gbm_model.predict(X_train)
y_val_pred = gbm_model.predict(X_val)

# 训练集指标
accuracy_train = accuracy_score(y_train, y_train_pred)
precision_train = precision_score(y_train, y_train_pred, average='weighted')
recall_train = recall_score(y_train, y_train_pred, average='weighted')
f1_train = f1_score(y_train, y_train_pred, average='weighted')

# 验证集指标
accuracy_val = accuracy_score(y_val, y_val_pred)
precision_val = precision_score(y_val, y_val_pred, average='weighted')
recall_val = recall_score(y_val, y_val_pred, average='weighted')
f1_val = f1_score(y_val, y_val_pred, average='weighted')

# 打印指标
print(f'Gradient Boosting:')
print(f"训练集 Accuracy: {accuracy_train:.2f}, Precision: {precision_train:.2f}, Recall: {recall_train:.2f}, F1-Score: {f1_train:.2f}")
print(f"验证集 Accuracy: {accuracy_val:.2f}, Precision: {precision_val:.2f}, Recall: {recall_val:.2f}, F1-Score: {f1_val:.2f}")
print("*" * 100)



6、k 近邻 (KNN)

from sklearn.neighbors import KNeighborsClassifier

# 初始化模型
knn_model = KNeighborsClassifier()

# 训练模型
knn_model.fit(X_train, y_train)

# 预测
y_train_pred = knn_model.predict(X_train)
y_val_pred = knn_model.predict(X_val)

# 训练集指标
accuracy_train = accuracy_score(y_train, y_train_pred)
precision_train = precision_score(y_train, y_train_pred, average='weighted')
recall_train = recall_score(y_train, y_train_pred, average='weighted')
f1_train = f1_score(y_train, y_train_pred, average='weighted')

# 验证集指标
accuracy_val = accuracy_score(y_val, y_val_pred)
precision_val = precision_score(y_val, y_val_pred, average='weighted')
recall_val = recall_score(y_val, y_val_pred, average='weighted')
f1_val = f1_score(y_val, y_val_pred, average='weighted')

# 打印指标
print(f'KNN:')
print(f"训练集 Accuracy: {accuracy_train:.2f}, Precision: {precision_train:.2f}, Recall: {recall_train:.2f}, F1-Score: {f1_train:.2f}")
print(f"验证集 Accuracy: {accuracy_val:.2f}, Precision: {precision_val:.2f}, Recall: {recall_val:.2f}, F1-Score: {f1_val:.2f}")
print("*" * 100)



7. 轻梯度增强机 (LightGBM)

import lightgbm as lgb

# 初始化模型
lgb_model = lgb.LGBMClassifier()

# 训练模型
lgb_model.fit(X_train, y_train)

# 预测
y_train_pred = lgb_model.predict(X_train)
y_val_pred = lgb_model.predict(X_val)

# 训练集指标
accuracy_train = accuracy_score(y_train, y_train_pred)
precision_train = precision_score(y_train, y_train_pred, average='weighted')
recall_train = recall_score(y_train, y_train_pred, average='weighted')
f1_train = f1_score(y_train, y_train_pred, average='weighted')

# 验证集指标
accuracy_val = accuracy_score(y_val, y_val_pred)
precision_val = precision_score(y_val, y_val_pred, average='weighted')
recall_val = recall_score(y_val, y_val_pred, average='weighted')
f1_val = f1_score(y_val, y_val_pred, average='weighted')

# 打印指标
print(f'LightGBM:')
print(f"训练集 Accuracy: {accuracy_train:.2f}, Precision: {precision_train:.2f}, Recall: {recall_train:.2f}, F1-Score: {f1_train:.2f}")
print(f"验证集 Accuracy: {accuracy_val:.2f}, Precision: {precision_val:.2f}, Recall: {recall_val:.2f}, F1-Score: {f1_val:.2f}")
print("*" * 100)



8. 逻辑回归 (LR)

from sklearn.linear_model import LogisticRegression

# 初始化模型
lr_model = LogisticRegression(max_iter=200)

# 训练模型
lr_model.fit(X_train, y_train)

# 预测
y_train_pred = lr_model.predict(X_train)
y_val_pred = lr_model.predict(X_val)

# 训练集指标
accuracy_train = accuracy_score(y_train, y_train_pred)
precision_train = precision_score(y_train, y_train_pred, average='weighted')
recall_train = recall_score(y_train, y_train_pred, average='weighted')
f1_train = f1_score(y_train, y_train_pred, average='weighted')

# 验证集指标
accuracy_val = accuracy_score(y_val, y_val_pred)
precision_val = precision_score(y_val, y_val_pred, average='weighted')
recall_val = recall_score(y_val, y_val_pred, average='weighted')
f1_val = f1_score(y_val, y_val_pred, average='weighted')

# 打印指标
print(f'Logistic Regression:')
print(f"训练集 Accuracy: {accuracy_train:.2f}, Precision: {precision_train:.2f}, Recall: {recall_train:.2f}, F1-Score: {f1_train:.2f}")
print(f"验证集 Accuracy: {accuracy_val:.2f}, Precision: {precision_val:.2f}, Recall: {recall_val:.2f}, F1-Score: {f1_val:.2f}")
print("*" * 100)



9. 随机森林 (RF)

from sklearn.ensemble import RandomForestClassifier

# 初始化模型
rf_model = RandomForestClassifier(n_estimators=300, random_state=0)

# 训练模型
rf_model.fit(X_train, y_train)

# 预测
y_train_pred = rf_model.predict(X_train)
y_val_pred = rf_model.predict(X_val)

# 训练集指标
accuracy_train = accuracy_score(y_train, y_train_pred)
precision_train = precision_score(y_train, y_train_pred, average='weighted')
recall_train = recall_score(y_train, y_train_pred, average='weighted')
f1_train = f1_score(y_train, y_train_pred, average='weighted')

# 验证集指标
accuracy_val = accuracy_score(y_val, y_val_pred)
precision_val = precision_score(y_val, y_val_pred, average='weighted')
recall_val = recall_score(y_val, y_val_pred, average='weighted')
f1_val = f1_score(y_val, y_val_pred, average='weighted')

# 打印指标
print(f'Random Forest:')
print(f"训练集 Accuracy: {accuracy_train:.2f}, Precision: {precision_train:.2f}, Recall: {recall_train:.2f}, F1-Score: {f1_train:.2f}")
print(f"验证集 Accuracy: {accuracy_val:.2f}, Precision: {precision_val:.2f}, Recall: {recall_val:.2f}, F1-Score: {f1_val:.2f}")
print("*" * 100)



10. 支持向量机 (SVM)

from sklearn.svm import SVC

# 初始化模型
svm_model = SVC(probability=True)

# 训练模型
svm_model.fit(X_train, y_train)

# 预测
y_train_pred = svm_model.predict(X_train)
y_val_pred = svm_model.predict(X_val)

# 训练集指标
accuracy_train = accuracy_score(y_train, y_train_pred)
precision_train = precision_score(y_train, y_train_pred, average='weighted')
recall_train = recall_score(y_train, y_train_pred, average='weighted')
f1_train = f1_score(y_train, y_train_pred, average='weighted')

# 验证集指标
accuracy_val = accuracy_score(y_val, y_val_pred)
precision_val = precision_score(y_val, y_val_pred, average='weighted')
recall_val = recall_score(y_val, y_val_pred, average='weighted')
f1_val = f1_score(y_val, y_val_pred, average='weighted')

# 打印指标
print(f'Support Vector Machine:')
print(f"训练集 Accuracy: {accuracy_train:.2f}, Precision: {precision_train:.2f}, Recall: {recall_train:.2f}, F1-Score: {f1_train:.2f}")
print(f"验证集 Accuracy: {accuracy_val:.2f}, Precision: {precision_val:.2f}, Recall: {recall_val:.2f}, F1-Score: {f1_val:.2f}")
print("*" * 100)



11. 极限梯度增强 (XGBoost)

import xgboost as xgb

# 初始化模型
xgb_model = xgb.XGBClassifier(use_label_encoder=False, verbosity=0)

# 训练模型
xgb_model.fit(X_train, y_train)

# 预测
y_train_pred = xgb_model.predict(X_train)
y_val_pred = xgb_model.predict(X_val)

# 训练集指标
accuracy_train = accuracy_score(y_train, y_train_pred)
precision_train = precision_score(y_train, y_train_pred, average='weighted')
recall_train = recall_score(y_train, y_train_pred, average='weighted')
f1_train = f1_score(y_train, y_train_pred, average='weighted')

# 验证集指标
accuracy_val = accuracy_score(y_val, y_val_pred)
precision_val = precision_score(y_val, y_val_pred, average='weighted')
recall_val = recall_score(y_val, y_val_pred, average='weighted')
f1_val = f1_score(y_val, y_val_pred, average='weighted')

# 打印指标
print(f'XGBoost:')
print(f"训练集 Accuracy: {accuracy_train:.2f}, Precision: {precision_train:.2f}, Recall: {recall_train:.2f}, F1-Score: {f1_train:.2f}")
print(f"验证集 Accuracy: {accuracy_val:.2f}, Precision: {precision_val:.2f}, Recall: {recall_val:.2f}, F1-Score: {f1_val:.2f}")
print("*" * 100)



五、结论


  通过以上代码示例,实现了 11 种流行的机器学习模型,并在 Iris 数据集上进行了训练和评估。每个模型的训练集和验证集指标可以在终端轻松查看。这些模型各具特点,适用于不同类型的数据和任务,你可以根据具体需求选择合适的模型进行应用。

  除此之外,你可以根据需要进一步扩展这些模型的调优、特征工程或交叉验证等步骤,以提高模型的性能和鲁棒性。

  希望这篇文章能帮助你更好地理解和运用 scikit-learn 进行机器学习分类任务!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

小白熊XBX

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值