# Python sklearn 随机森林（一）

### 1. 集成学习

• 装袋法的核心思想是构建多个相互独立的分类器，然后对多个分类器的表现进行平行或多数表决原则来决定集成评估器的结果。典型的代表是Randon Forest.

### 2. sklearn 模块中的集成算法模块 ensemble

• ensemble.RandomForestClassifier
• ensemble.RandomForestRegressor

### 3. RandomForestClassifier

• 比较DecisionTreeClassifier和RandomForest
• n_estimators的学习曲线
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt

rfc = RandomForestClassifier(n_estimators=25)
rfc_s = cross_val_score(rfc, wine.data, wine.target, cv=10)

clf = DecisionTreeClassifier()
clf_s = cross_val_score(clf, wine.data, wine.target, cv=10)

plt.plot(range(1, 11), rfc_s, label="Random Forest")
plt.plot(range(1, 11), clf_s, label="Decision Tree")
plt.legend()
plt.show()

# n_estimators学习曲线
result = []
for i in range(200):
rfc = RandomForestClassifier(n_estimators=i+1, n_jobs=-1)
# 十折交叉验证，每次求均值
rfc_s = cross_val_score(rfc,wine.data, wine.target, cv=10).mean()
result.append(rfc_s)

# 输出最高精的值，以及最高精度时树颗数
print(max(result), result.index(max(result))+1)

plt.plot(range(1, 201), result, label="Random Forest")
plt.legend()   # 显示图例
plt.show()


### 4. 为什么RandomForest比DecisionTree好

import numpy as np
from scipy.special import comb
error_rate = np.array([comb(15, i)*(0.15**i)*((1-0.15)**(15-i)) for i in range(8, 16)]).sum()
print(error_rate)

