GeekDengshuo

学习记录

第八章 Ensemble_learning

Ensemble learning

根据集成学习的生成方式,集成学习可分成两大类:

Boosting:个体间存在强依赖关系,必须串行生成的序列化方法
Bagging&Randon Forest: 个体间学习器不存在强依赖关系,可以同时生成的并行化方法

loss function(损失函数) 以及 cost function(代价函数)的区别 定义

# 如何使用集成学习
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegresssion
from sklearn.svm import SVC

# 数据集的划分
X,y=make_moons(n_samples=500,noise=0.30,random_state=42)
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=42)

log_clf=LogisticRegression()
rnd_clf=RandomForestClassifier()
svm_clf=SVC()

voting_clf=VotingClassifier(estimators=[('lr',log_clf),('rf',rnd_clf),('svc',svm_clf)],
                           voting='hard')
voting_clf.fit(X_train,y_train)
from sklearn.metrics import accuracy_score
for clf in (log_clf,rnd_clf,svm_clf,voting_clf):
    clf.fit(X_train,y_train)
    y_pred=clf.predict(X_test)
    print(clf.__class__.__name__ ,accuracy_score(y_test,y_pred))

编程实现Adaboost算法,以不剪枝的决策树为基学习器,在西瓜数据集上训练一个Adaboost集成.

1.先参考其他(hands-on machine learning with sklearn and tensorflow)

先看一个bagging的现成算法

# 以决策树桩的基学习器

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag_clf=BaggingClassifier(DecisionTreeClassifier(),n_estimators=500,max_samples=100,
                         bootstrap=True,n_jobs=-1)
bag_clf.fit(X_train,y_train)
y_pred=bag_clf.predict(X_test)
from sklearn.ensamble import AdaBoostClassifier
ada_clf=AdaBoostClassifier(DecisionTreeClassifier(max_depth=1),n_estimators=200,
                          algorithm="SAMME.R",learning_rate=0.5)
ada_clf.fit(X_trian,y_train)

Scikit-Learn actually uses a multiclass version of AdaBoost called SAMME16 (which
stands for Stagewise Additive Modeling using a Multiclass Exponential loss function).
When there are just two classes, SAMME is equivalent to AdaBoost. Moreover, if the
predictors can estimate class probabilities (i.e., if they have a predict_proba()
method), Scikit-Learn can use a variant of SAMME called SAMME.R (the R stands
for “Real”), which relies on class probabilities rather than predictions and generally
performs better.

| algorithm : {‘SAMME’, ‘SAMME.R’}, optional (default=’SAMME.R’)
| If ‘SAMME.R’ then use the SAMME.R real boosting algorithm.
| base_estimator must support calculation of class probabilities.
| If ‘SAMME’ then use the SAMME discrete boosting algorithm.
| The SAMME.R algorithm typically converges faster than SAMME,
| achieving a lower test error with fewer boosting iterations.

集成学习:
    相当于各种学习器的合集,不仅需要对基学习器的了解,还要了解如何区结合使用
    目前只能去使用sklearn的学习包完成集成学习的过程.但是对包的学习还不是很深入,只能一点一点慢慢学吧

阅读更多
文章标签: 集成学习
个人分类: 机器学习
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

不良信息举报

第八章 Ensemble_learning

最多只允许输入30个字

加入CSDN,享受更精准的内容推荐,与500万程序员共同成长!
关闭
关闭