The goal behind ensemble methods is to combine different classifers into a meta-classifer that has a better generalization performance than each individual classifer alone。focus on the most popular ensemble methods that use the majority voting principle。
1. Ensemble error
from scipy.misc import comb import math def ensemble_error(n_classifier, error): k_start = math.ceil(n_classifier / 2.0) probs = [comb(n_classifier, k) * error**k * (1 - error)**(n_classifier - k) for k in range(int(k_start), n_classifier + 1)] return sum(probs) print('Ensemble error', ensemble_error(n_classifier=11, error=0.25))('Ensemble error', 0.034327507019042969)
2. compute the ensemble error rates for a range of different base errors from 0.0 to 1.0 to visualize the relationship between ensemble and base errors in a line graph
import numpy as np error_range = np.arange(0.0, 1.01, 0.01) ens_errors = [ensemble_error(n_classifier=11, error=error) for error in error_range] plt.plot(error_range, ens_errors, label='Ensemble error', linewidth=2) plt.plot(error_range, error_range, linestyle='--', label='Base error', linewidth=2) plt.xlabel('Base error') plt.ylabel('Base/Ensemble error') plt.legend(loc='upper left') plt.grid() plt.show()
As we can see in the resulting plot, the error probability of an ensemble is always better than the error of an individual base classifer as long as the base classifers perform better than random guessing (ε < 0.5 )
Reference:《Python Machine Learning》