机器学习读书笔记(七):Combining Different Models for Ensemble Learning


学习目标:

  • Making predictions based on majority voting.
  • Reduce overfitting by drawing random combinations of the training set with repetition.
  • Build powerful models form weak learners that learn from their mistakes.

Ensemble Learning

Introduction

The goal behind ensemble methods is to combine different classifiers into a meta-classifier that has a better generalization performance than each individual classifier alone.

!!! To achieve this goal, we have several approaches to create these models. Among the most popular ensemble methods, majority voting is quite outstanding!

Majority voting (plurality voting)

Majority voting simply means that we select the class label that has been predicted by the majority of classifiers. As for multi-class settings, we call it plurality voting!

It can be seen very clearly through the following graph, which indicates that majorit voting is just a rather simple methods not too hard to understand~~~:
在这里插入图片描述
When we put it into actual senerio for practice, we start by training m different calssifiers( C 1 . . . C m C_1 ...C_m C1...Cm ). By comparing all these classification algorithms, it’s trivial to conduct a majority voting method, which can also be illustrated obviously by the following graph:
在这里插入图片描述
Except graph, we can also display the process of voting through mathmatical formula:
y ^ = m o d e { C 1 ( x ) , C 2 ( x ) , . . . , C m } \widehat{y}=mode \left\{C_1(x),C_2(x), ...,C_m \right \} y =mode{ C1(x),C2(x),...,Cm}

Now comes the theoretical time!! To illustrate why ensemble methods can work better than individual classifiers alone, let’s first make some assumptions!
For the following example,
First, we make the assumption that all n base classifiers for a binary classification task have the same error rate ε \varepsilon ε.
Futhermore, we assume that the classifiers are independent and the error rates are not correlated.
Under all these assumption, we can simply express the error probability of an emsemble of base classifiers as a probability mass function of a binomial distribution:
P ( y ≥ k ) = ∑ i = k n ( n i ) ε i ( 1 − ε ) n − i = ε e n s e m b l e P(y\geq k)=\sum_{i=k}^{n}{\binom{n}{i}\varepsilon^i (1-\varepsilon)^{n-i}}=\varepsilon_{ensemble} P(yk)=i=kn(in)εi(1ε)ni=εensemble
, where k refers to n 2 \frac{n}{2} 2n when n is an even number.

It is just an simple expression in probability so we don’t have to discuss too much about that. Just remember: it reveals the whole probability that the prediction of the ensemble is wrong. Now let’s take a look at a more concrete example of 11 base calssifiers(n=11) with an error rate ε = 0.25 \varepsilon=0.25 ε=0.25:
P ( y ≥ 6 ) = ∑ i = 6 11 ( 11 i ) ε i ( 1 − ε ) 11 − i = 0.034 P(y\geq 6)=\sum_{i=6}^{11}{\binom{11}{i}\varepsilon^i (1-\varepsilon)^{11-i}}=0.034 P(y6)=i=611(i11)εi(1ε)11i=0.034
As we can see, the error rate of the ensemble(0.034) is much lower than the error rate of each individual classifier(0.25)

To visualize this process, we use Python to plot the curve of the relationship between Ensemble error and Base error as follows:

from scipy.special import comb
import math

def ensemble_error(n_classifier, error):
    k_start = int(math.ceil(n_classifier / 2.))
    probs = [comb(n_classifier, k) * error**k * (1-error)**(n_classifier - k)
             for k in range(k_start, n_classifier + 1)]
    return sum(probs)
from scipy.special import comb
import math
​
def ensemble_error(n_classifier, error):
    k_start = int(math.ceil(n_classifier / 2.))
    probs = [comb(n_classifier, k) * error**k * (1-error)**(n_classifier - k)
             for k in range(k_start, n_classifier + 1)]
    return sum(probs)
ensemble_error(n_classifier=11, error=0.25)
0.03432750701904297
import numpy as np

error_range = np.arange(0.0, 1.01, 0.01)
ens_errors = [ensemble_error(n_classifier=11, error=error)
              for error in error_range]
import numpy as np
​
error_range = np.arange(0.0, 1.01, 0.01)
ens_errors = [ensemble_error(n_classifier=11, error=error)
              for error in error_range]
import matplotlib.pyplot as plt
​
plt.plot(error_range, 
         ens_errors, 
         label='Ensemble error', 
         linewidth=2)
​
plt.plot(error_range, 
         error_range, 
         linestyle='--',
         label='Base error',
         linewidth=2)
​
plt.xlabel('Base error')
plt.ylabel('Base/Ensemble error')
plt.legend(loc='upper left')
plt.grid(alpha=0.5)
#plt.savefig('images/07_03.png', dpi=300)
plt.show()

It doesn’t matter if you are a little confused by the overwhelming code blocks above~
they are just the translation of the process we’d dicussed before. Let’s concentrate on the result of the plotted graph!
在这里插入图片描述
As we can see in the resulting plot, the error probability of an ensemble is always better than the error of an individual base classifier as long as the base classifiers perform better than random gussing( ε < 0.5 \varepsilon<0.5 ε<0.5).

Implementating a simple majority classifier

In this section, we are going to expend the equal weighted model to the algorithm associated with individual weights for confidence, and besides, some classifiers that return the probability of a predicted calss. Finally, Python will be used to realize this algorithm for us as usual.

1.Weighted majority voting could be written in more precise mathmatical terms as follows:

y ^ = a r g max ⁡ i ∑ j = 1 m w j χ A ( C j ( x ) = i ) \widehat{y}=arg\max_{i} \sum_{j=1}^{m}w_j\chi_A(C_j({\it{x}})=i) y =argimaxj=1mwjχA(Cj(x)=i)
where y ^ \widehat{y} y varies from i ∈ i \in i {0,1} and is one of the two possible values that maximize the sum of all class labels by their individual weights.

when the weights are assunmed to be the same, we get the familiar simplified formula that we have seen before:
y ^ = m o d e { C 1 ( x ) , C 2 ( x ) , . . . , C m } \widehat{y}=mode \left\{C_1(x),C_2(x), ...,C_m \right \}

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值