F ( x ) = ∑ t = 1 T α t h ( x ; θ t ) , h ( x ; θ t ) ∈ { − 1 , 1 } F(\mathbf{x})=\sum_{t=1}^{T}\alpha_{t}h(\mathbf{x};\theta_{t}), h(\mathbf{x};\theta_{t})\in \{-1,1\}

arg ⁡ min ⁡ α t ; θ t , t : 1 , 2 , . . . , T ∑ i = 1 N exp ⁡ ( − y i F ( x i ) ) \arg \min_{\alpha_{t};\theta_{t},t:1,2,...,T}\sum_{i=1}^{N}\exp{(-y_{i}F(\mathbf{x_{i}}))}

J ( α , θ ) = ∑ i = 1 N exp ⁡ ( − y i ( F m − 1 ( x i ) + α h ( x i ; θ ) ) ) J(\alpha,\theta)=\sum_{i=1}^{N}\exp{(-y_{i}(F_{m-1}(\mathbf{x}_{i})+\alpha h(\mathbf{x}_{i};\theta)))}
( α m , θ m ) = arg ⁡ min ⁡ α , θ J ( α , θ ) (\alpha_{m},\theta_{m})=\arg \min_{\alpha,\theta}J(\alpha,\theta)

J ( α , θ ) = ∑ i = 1 N exp ⁡ ( − y i ( F m − 1 ( x i ) ) ⋅ exp ⁡ ( − y i α h ( x i ; θ ) ) = ∑ i = 1 N w i m ⋅ exp ⁡ ( − y i α h ( x i ; θ ) ) J(\alpha,\theta)=\sum_{i=1}^{N}\exp{(-y_{i}(F_{m-1}(\mathbf{x}_{i}))}\cdot \exp{(-y_{i}\alpha h(\mathbf{x}_{i};\theta))}=\sum_{i=1}^{N}w_{i}^{m}\cdot \exp{(-y_{i}\alpha h(\mathbf{x}_{i};\theta))}
θ m = arg ⁡ min ⁡ θ ∑ i = 1 N w i m ⋅ exp ⁡ ( − y i α h ( x i ; θ ) ) \theta_{m}=\arg \min_{\theta} \sum_{i=1}^{N}w_{i}^{m}\cdot \exp{(-y_{i}\alpha h(\mathbf{x}_{i};\theta))}

θ m = arg ⁡ min ⁡ θ { P m = ∑ i = 1 N w i m I ( 1 − y i h ( x i ; θ ) ) } ,   I ( ⋅ ) = { 0 , 0 1 , o t h e r \theta_{m}=\arg \min_{\theta}\{P_{m}=\sum_{i=1}^{N}w_{i}^{m}I(1-y_{i}h(\mathbf{x}_{i};\theta))\}, \ I(\cdot)=\left\{\begin{matrix} 0,0\\ 1,other \end{matrix}\right.
P m P_{m} 为第 m m 个分类器错误识别样本的权重，选择阈值 P m &lt; 0.5 P_{m}&lt;0.5

J ( α , θ ) = ∑ i = 1 N w i m ⋅ exp ⁡ ( − y i α h ( x i ; θ m ) ) J(\alpha,\theta)=\sum_{i=1}^{N}w_{i}^{m}\cdot \exp{(-y_{i}\alpha h(\mathbf{x}_{i};\theta_{m}))}

α m = arg ⁡ min ⁡ α [ exp ⁡ ( − α ) ( 1 − P m ) + exp ⁡ ( α ) P m ] \alpha_{m}=\arg \min_{\alpha}[\exp{(-\alpha)(1-P_{m})}+\exp{(\alpha)P_{m}}]

[ exp ⁡ ( − α ) ( 1 − P m ) + exp ⁡ ( α ) P m ] ′ = 0 [\exp{(-\alpha)(1-P_{m})}+\exp{(\alpha)P_{m}}]^{&#x27;}=0

α m = 1 2 ln ⁡ 1 − P m P m \alpha_{m}=\frac{1}{2}\ln\frac{1-P_{m}}{P_{m}}

w i ( m + 1 ) = exp ⁡ ( − y i F m ( x i ) ) = w i ( m ) exp ⁡ ( − y i α m h ( x i ; θ m ) ) w_{i}^{(m+1)}=\exp{(-y_{i}F_{m}(\mathbf x_{i}))}=w_{i}^{(m)}\exp{(-y_{i}\alpha_{m}h(\mathbf x_{i};\theta_{m}))}

1.初始化训练数据的权重向量 P 1 = { 1 N , 1 N , . . . , 1 N } P^{1}=\{\frac{1}{N},\frac{1}{N},...,\frac{1}{N}\} ，并满足 ∑ i = 1 N P i = 1 \sum_{i=1}^{N}P_{i}=1
2. f o r   t = 1   t o T for\ t=1\ to T
3. --------基于 P t P^{t} 训练弱分类器 h t ( x ) h_{t}({\rm{x}})
4. --------计算弱分类器错误率 ϵ t = ∑ i = 1 N P i t [ [ h t ( x ) ≠ y i ] ] \epsilon_{t}=\sum_{i=1}^{N}P^{t}_{i}[[h_{t}({\rm{x}})\neq y_{i}]]
5. --------计算弱分类器权重 α t = 1 2 ln ⁡ 1 − ϵ t ϵ t \alpha_{t}=\frac{1}{2}\ln \frac{1-\epsilon_{t}}{\epsilon_{t}} ,此时 ϵ t \epsilon_{t} 越小， α t \alpha_{t} 越大， ϵ t \epsilon_{t} 是恒大于0.5的。
6. --------重新计算训练数据的权重向量 P i t + 1 = P i t exp ⁡ ( − α t y i h t ( x i ) ) P^{t+1}_{i}=P^{t}_{i}\exp(-\alpha_{t}y_{i}h_{t}({\rm{x}}_{i})) ,此时判断错的样本权重将会增加，正确的权重将会减小。
7. --------归一化权重向量 P i t + 1 = P i t + 1 ∖ ∑ i = 1 N P i t + 1 P^{t+1}_{i}=P^{t+1}_{i} \setminus \sum_{i=1}^{N}P^{t+1}_{i}
8. e n d   f o r end\ for
9. 最终决策： F ( x ) = s g n [ f T ( x ) ] = s g n ( ∑ t = 1 T α t h t ( x ) ) F({\rm{x}})=sgn[f_{T}({\rm{x}})]=sgn(\sum_{t=1}^{T}\alpha_{t}h_{t}({\rm{x}}))

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

le = LabelEncoder()
y = le.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier(criterion='entropy', max_depth=1)
n_estimators=500,
learning_rate=0.1,
random_state=0)
tree.fit(X_train, y_train)
y_train_pred = tree.predict(X_train)
y_test_pred = tree.predict(X_test)
print('Decision tree train/test accuracies: %.3f/%.3f'
% (accuracy_score(y_train, y_train_pred), accuracy_score(y_test, y_test_pred)))

% (accuracy_score(y_train, y_train_pred), accuracy_score(y_test, y_test_pred)))

Decision tree train/test accuracies: 0.604/0.563



10-22
11-24 6677

06-27 9698
12-15 318
07-09 162
03-22 1万+
05-07 5598
09-08 705
09-02 378
09-18 1376
06-11 2596
02-24 150