Scouting:
Scouting is done by testing the classifiers in the pool using a training set T of N multidimensional data points x.
We test and rank all classifiers in the expert pool by charging a cost any time a classifier fails(a miss), and a cost every time a classifier provides the right label(a success or "hit"). We require so that misses are penalized more heavily than hits. It might seem strange to penalize a hit with non-zero cost, but as long as the penalty of success is smaller than the penalty for a miss everything is fine. This kind of error function different from the usual squared Euclidian distance to the classification target is called an exponential loss function.
AdaBoost uses exponential error loss as error criterion.
The main idea of AdaBoost is to proceed systematically by extracting one classifier from the pool in each of M iterations. The drafting process concentrates in selecting new classifiers for the committee focusing on which can help with the still misclassified examples. The best team players are those which can provide new insights to the committee. Classifiers being drafted should complement each other in an optimal way.
Drafting:
In each iteration we need to rank all classifiers, so that we can select the current best out of the pool. At the m-th iteration we have already included m-1 classifiers in the committee and we want to draft the next one. The current linear combination of classifiers is
We define the total cost, or total error, of the extended classifier as the exponential loss
where are yet to be determined in an optimal way. Since our intention is to draft we rewrite the above expression as
for i = 1, ..., N. In the first iteration for i=1,...,N. During later iterations, the vector represents the weights assigned to each data point in the training set at iteration m. We can split the above Eq into two sums
This means that the total cost is the weighted cost of all hits plus the weighted cost of all misses.
Writing the first summand as and the second as we simplify the notation to
Now, is the total sum W of the weights of all data points, that is, a constant in the current iteration. The right hand side of the equation is minimized when at the m-th iteration we pick the classifier with the lowest total cost (that is the lowest rate of weighted error). Intuitively this makes sense, the next draftee, km, should be the one with the lowest penalty given the current set of weights.
Weighting:
That is, the pool of classifiers dose not need to be given in advance, it only needs to ideally exist.