2023.10.15学习-集成学习

最新推荐文章于 2024-08-21 23:32:17 发布

q浅夜

最新推荐文章于 2024-08-21 23:32:17 发布

阅读量28

点赞数

文章标签：学习集成学习机器学习

本文链接：https://blog.csdn.net/qianyeguiji/article/details/133850053

版权

2023.10.15学习-集成学习

贝叶斯网络感觉涉及的知识较多，之后会再学习几遍。

人工智能基础-集成学习

集成学习（Ensemble learning）

bagging模型，并行训练多个分类器取平均

策略：

（1）首先对训练数据进行多次采样，保证每次得到的采样数据都不同

（2）分别训练多个模型，例如树模型

（3）得到所有的模型结果，进行集成，最后预测

典型代表：随机森林

随机：数据采样随机，特征选择随机；

森林：多个决策树并行放在一起

树模型：

在这里插入图片描述

由于二重随机性，每棵树基本不会相同，最终结果也不一样
随机森林的优势：

（1）能够处理高维（feature）数据，且自动做特征选择

（2）训练结束后，能够给出哪些feature较为重要

（3）并行方法，运行速度较快

（4）可以进行可视化展示，便于分析

理论上越多的树效果越好，但实际上超过一定数量，会保持在某一个值上下浮动

代码练习1：硬投票和软投票测试，对图中数据进行分类。结果说明应用集成学习，将多种分类器以合适的方式结合起来可以起到更好的分类效果
在这里插入图片描述

from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

X, y = make_moons(n_samples=500, noise=0.3, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

fig1 = plt.figure()
plt.plot(X[:, 0][y == 0], X[:, 1][y == 0], 'yo', alpha=0.6)
plt.plot(X[:, 0][y == 1], X[:, 1][y == 1], 'bs', alpha=0.6)
plt.show()

'''
投票策略：软投票和硬投票
硬投票：直接用类别值（标签），少数服从多数
软投票：各自分类器的概率值进行加权平均
'''

# 创建分类器模型
log_clf = LogisticRegression(random_state=42)  # 逻辑回归分类器
rnd_clf = RandomForestClassifier(random_state=42)  # 随机森林分类器
svm_clf = SVC(random_state=42)  # 支持向量机分类器

voting_clf = VotingClassifier(estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)], voting='hard')  # 投票分类器

voting_clf.fit(X_train, y_train)

# 硬投票，分别测试单个分类器
print('硬投票准确率：')
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_predict = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_predict))

# 软投票,对概率值进行加权平均
# 创建分类器模型
log_clf = LogisticRegression(random_state=42)  # 逻辑回归分类器
rnd_clf = RandomForestClassifier(random_state=42)
svm_clf = SVC(probability=True, random_state=42)  # 支持向量机分类器。probability=True，得到概率值

voting_clf = VotingClassifier(estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)], voting='soft')  # 投票分类器
voting_clf.fit(X_train, y_train)

print('软投票准确率：')
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_predict = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_predict))

控制台结果：
在这里插入图片描述

代码练习2：采用树决策器作为分类器，进行bagging模型训练数据

import numpy as np
from sklearn.datasets import make_moons
from sklearn.ensemble import BaggingClassifier
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from matplotlib import pyplot as plt
from sklearn.metrics import accuracy_score
from matplotlib.colors import ListedColormap

X, y = make_moons(n_samples=500, noise=0.3, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

fig1 = plt.figure()
plt.plot(X[:, 0][y == 0], X[:, 1][y == 0], 'yo', alpha=0.6)
plt.plot(X[:, 0][y == 1], X[:, 1][y == 1], 'bs', alpha=0.6)
plt.show()

# 只用树决策器
tree_clf = DecisionTreeClassifier(random_state=42)
tree_clf.fit(X_train, y_train)
y_tree_predict = tree_clf.predict(X_test)
print(accuracy_score(y_test, y_tree_predict))  # 0.856

# 使用bagging
bag_clf = BaggingClassifier(DecisionTreeClassifier(), n_estimators=500, max_samples=100, bootstrap=True, n_jobs=-1, random_state=42)
# 参数：建立500个树模型，最多传入100个样本数，进行有放回的随机采样
bag_clf.fit(X_train, y_train)
y_predict = bag_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_predict)
print(accuracy)  # 0.904


# 绘制决策边界
def plot_decision_boundary(clf, X, y, axes=[-1.5, 2.5, -1, 1.5], alpha=0.5, contour=True):
    x1s = np.linspace(axes[0], axes[1], 100)
    x2s = np.linspace(axes[2], axes[3], 100)
    x1, x2 = np.meshgrid(x1s, x2s)
    X_new = np.c_[x1.ravel(), x2.ravel()]
    y_predict = clf.predict(X_new).reshape(x1.shape)
    custom_cmap = ListedColormap(['#fafab0', '#9898ff', '#a0faa0'])
    plt.contourf(x1, x2, y_predict, cmap=custom_cmap, alpha=0.3)
    if contour:  # 绘制等高线
        custom_cmap2 = ListedColormap(['#7d7d58', '#4c4c7f', '#507d50'])
        plt.contour(x1, x2, y_predict, cmap=custom_cmap2, alpha=0.8)
    plt.plot(X[:, 0][y == 0], X[:, 1][y == 0], 'yo', alpha=0.6)
    plt.plot(X[:, 0][y == 1], X[:, 1][y == 1], 'bs', alpha=0.6)
    plt.axis(axes)
    plt.xlabel('x1')
    plt.ylabel('x2')

fig2 = plt.figure(figsize=(12, 5))
plt.subplot(121)
plot_decision_boundary(tree_clf, X, y)
plt.title('Decision Tree')
plt.subplot(122)
plot_decision_boundary(bag_clf, X, y)
plt.title('Bagging-Decision Tree')
plt.show()