随机森林算法：有放回重采样基本分类器

最新推荐文章于 2024-07-10 16:13:44 发布

每天都要被自己菜醒

最新推荐文章于 2024-07-10 16:13:44 发布

阅读量1.8k

点赞数 2

分类专栏：大数据文章标签： python 机器学习深度学习决策树

本文链接：https://blog.csdn.net/qq_45531594/article/details/108563213

版权

大数据专栏收录该内容

36 篇文章 2 订阅

订阅专栏

详解sklearn中的make_moons函数

在这里插入图片描述

Bootstraping:
有放回的重采样。

Bagging :套袋法：
1.有放回的重采样
2.对这采样得到的样本建立分类器

重复m次步骤，得到m个分类器
依据m个分类结果的投票结果进行分类，决定数据

在这里插入图片描述

硬投票器和软投票器的区别：

在这里插入图片描述

#硬投票和软投票对比
import numpy as np
import os
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
import warnings
warnings.filterwarnings('ignore')
#随机种子
np.random.seed(42)

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons 
# 主要参数作用如下：
# n_numbers:生成样本数量
# noise:默认是false，数据集是否加入高斯噪声
# random_state:生成随机种子，给定一个int型数据，能够保证每次生成数据相同。

#X:500行两列的数据 ,Y 500行一列的标签数据
X,y = make_moons(n_samples=500, noise=0.30, random_state=42)
print(X,y)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

在这里插入图片描述

plt.plot(X[:,0][y==0],X[:,1][y==0],'yo',alpha = 0.6) # 黄色的圆
plt.plot(X[:,0][y==0],X[:,1][y==1],'bs',alpha = 0.6) # 蓝色的矩形

在这里插入图片描述

#准备好数据集之后
#硬投票实验：
from sklearn.ensemble import RandomForestClassifier,VotingClassifier  #投票分类器
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC #分类

#此处是为了实验，参数任意了
log_clf = LogisticRegression(random_state=42)
rnd_clf = RandomForestClassifier(random_state=42)
svm_clf = SVC(random_state=42)

#投票                                     #在这里导入多个基本分类器
voting_clf = VotingClassifier(estimators= [ ('lr',log_clf),('rf',rnd_clf),('svc',svm_clf) ] ,voting= 'hard' )

在这里插入图片描述

from sklearn.metrics import accuracy_score #准确率包

for clf in (log_clf,rnd_clf,svm_clf,voting_clf):
    clf.fit(X_train,y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__,accuracy_score(y_test,y_pred) )

在这里插入图片描述
软投票是根据概率：

在这里插入图片描述

问题

在这里插入图片描述

构建随机森林回归模型：

在这里插入图片描述

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

"""
n_estimators：int, optional (default=10),要集成的基估计器的个数
max_samples： int or float, optional (default=1.0)。
           决定从x_train抽取去训练基估计器的样本数量。int 代表抽取数量，float代表抽取比例
bootstrap : boolean, optional (default=True) 决定样本子集的抽样方式（有放回和不放回）           
n_jobs : int, optional (default=1) 
random_state:如果int，random_state是随机数生成器使用的种子
           
"""

# 用集成BaggingClassifier分类器
bag_clf = BaggingClassifier(DecisionTreeClassifier(),
                  n_estimators = 500,
                  max_samples = 100,
                  bootstrap = True,
                  n_jobs = -1,
                  random_state = 42
)
bag_clf.fit(X_train,y_train)
y_pred = bag_clf.predict(X_test)

accuracy_score(y_test,y_pred)

在这里插入图片描述

单个决策树分类器

# 用随机森林分类器
tree_clf = DecisionTreeClassifier(random_state = 42)
tree_clf.fit(X_train,y_train)
y_pred_tree = tree_clf.predict(X_test)
accuracy_score(y_test,y_pred_tree)