# 第八章　Ensemble_learning

## Ensemble learning

Boosting:个体间存在强依赖关系，必须串行生成的序列化方法
Bagging&Randon Forest: 个体间学习器不存在强依赖关系，可以同时生成的并行化方法

loss function(损失函数)　以及　cost function(代价函数)的区别　定义

# 如何使用集成学习
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegresssion
from sklearn.svm import SVC

# 数据集的划分
X,y=make_moons(n_samples=500,noise=0.30,random_state=42)
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=42)

log_clf=LogisticRegression()
rnd_clf=RandomForestClassifier()
svm_clf=SVC()

voting_clf=VotingClassifier(estimators=[('lr',log_clf),('rf',rnd_clf),('svc',svm_clf)],
voting='hard')
voting_clf.fit(X_train,y_train)
from sklearn.metrics import accuracy_score
for clf in (log_clf,rnd_clf,svm_clf,voting_clf):
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
print(clf.__class__.__name__ ,accuracy_score(y_test,y_pred))

1.先参考其他(hands-on machine learning with sklearn and tensorflow)

#### 先看一个bagging的现成算法

# 以决策树桩的基学习器

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag_clf=BaggingClassifier(DecisionTreeClassifier(),n_estimators=500,max_samples=100,
bootstrap=True,n_jobs=-1)
bag_clf.fit(X_train,y_train)
y_pred=bag_clf.predict(X_test)
from sklearn.ensamble import AdaBoostClassifier
algorithm="SAMME.R",learning_rate=0.5)

Scikit-Learn actually uses a multiclass version of AdaBoost called SAMME16 (which
stands for Stagewise Additive Modeling using a Multiclass Exponential loss function).
When there are just two classes, SAMME is equivalent to AdaBoost. Moreover, if the
predictors can estimate class probabilities (i.e., if they have a predict_proba()
method), Scikit-Learn can use a variant of SAMME called SAMME.R (the R stands
for “Real”), which relies on class probabilities rather than predictions and generally
performs better.

| algorithm : {‘SAMME’, ‘SAMME.R’}, optional (default=’SAMME.R’)
| If ‘SAMME.R’ then use the SAMME.R real boosting algorithm.
| base_estimator must support calculation of class probabilities.
| If ‘SAMME’ then use the SAMME discrete boosting algorithm.
| The SAMME.R algorithm typically converges faster than SAMME,
| achieving a lower test error with fewer boosting iterations.

相当于各种学习器的合集，不仅需要对基学习器的了解，还要了解如何区结合使用
目前只能去使用sklearn的学习包完成集成学习的过程．但是对包的学习还不是很深入，只能一点一点慢慢学吧

• 广告
• 抄袭
• 版权
• 政治
• 色情
• 无意义
• 其他

120