


Rapid Object Detection With ACascade of Boosted Classifiers Based on Haar-like Features


This is a modified version of the official haartraining utility manualdocument. The statements of this color are my additions.

This documentdescribes how to train and use a cascade of boosted classifiers for rapid objectdetection. A large set of over-complete haar-like features provide the basisfor the simple individual classifiers. Examples of object detection tasks areface, eye and nose detection, as well as logo detection.


The sample detectiontask in this document is logo detection, since logo detection does not requirethe collection of large set of registered and carefully marked object samples.Instead we assume that from one prototype image, a very large set of derivedobject examples can be derived (createsamples utility, seebelow).


A detaileddescription of the training/evaluation algorithm can be found in [1] and [2].

The haartraining utilities have a character, almost no error handling. Makesure option names by yourself. The utilities silently ignore when options werespecified wrongly.

Samples Creation

For training atraining samples must be collected. There are two sample types: negativesamples and positive samples. Negative samples correspond to non-object images.Positive samples correspond to object images.

Negative Samples

Negative samples aretaken from arbitrary images. These images must not contain objectrepresentations. Negative samples are passed through background descriptionfile. It is a text file in which each text line contains the filename (relativeto the directory of the description file) of negative sample image. This filemust be created manually. Note that the negative samples and sample images arealso called background samples or background samples images, and are used interchangeablyin this document


Example of negativedescription file:


Directory structure:






File bg.txt:



We can create such a collection file using UNIX commands as

 $ find img/ -name '*.jpg' > bg.txt

Positive Samples

Positive samples arecreated by createsamples utility. Theymay be created from single object image or from collection of previously markedup images.

This is a list of options of createsamples utility. You can see this byexecuting as

 $ createsamples

Usage: ./createsamples





  [-num<number_of_samples = 1000>]

  [-bgcolor<background_color = 0>]

  [-inv][-randinv] [-bgthresh <background_color_threshold = 80>]

  [-maxidev<max_intensity_deviation = 40>]

  [-maxxangle<max_x_rotation_angle = 1.100000>]

  [-maxyangle<max_y_rotation_angle = 1.100000>]

  [-maxzangle<max_z_rotation_angle = 0.500000>]

  [-show[<scale = 4.000000>]]

  [-w<sample_width = 24>]

  [-h<sample_height = 24>]

The single objectimage may for instance contain a company logo. Then are large set of positivesamples are created from the given object image by randomly rotating, changingthe logo color as well as placing the logo on arbitrary background.

The amount and rangeof randomness can be controlled by command line arguments.

Command linearguments:

- vec <vec_file_name>

name of the output file containing the positive samplesfor training

- img <image_file_name>

source object image (e.g., a company logo)

- bg <background_file_name>

background description file; contains a list of imagesinto which randomly distorted versions of the object are pasted for positivesample generation

- num <number_of_samples>

number of positive samples to generate

- bgcolor <background_color>

background color (currently grayscale images areassumed); the background color denotes the transparent color. Since there mightbe compression artifacts, the amount of color tolerance can be specified by -bgthresh. All pixels between bgcolor-bgthresh and bgcolor+bgthresh are regarded as transparent.

- bgthresh <background_color_threshold>

- inv

if specified, the colors will be inverted

- randinv

if specified, the colors will be inverted randomly

- maxidev <max_intensity_deviation>

maximal intensity deviation of foreground samples pixelsunits are in [0, 255].

- maxxangle <max_x_rotation_angle>,

distortion byrotation; units are in radians. Rotate around horizontal axe. Use to get faceslike seeing up and down. 

- maxyangle <max_y_rotation_angle>,

distortion byrotation; units are in radians. Rotate around vertical axe. Use to get faceslike seeing left and right. 

- maxzangle <max_z_rotation_angle>

distortion byrotation; units are in radians. Rotate around depth axe. Use to get faces liketilting head to left and right. 

maximum rotation angles in radians


if specified, each sample will be shown. Pressing ‘Esc’will continue creation process without samples showing. Useful debugging option.

- w <sample_width>

width (in pixels) of the output samples

- h <sample_height>

height (in pixels) of the output samples

- info <collection_file_name>

See below


For followingprocedure is used to create a sample object instance:

The source image is rotatedrandom around all three axes. The chosen angle is limited my -max?angle. Next pixels ofintensities in the range of [bg_color-bg_color_threshold;bg_color+bg_color_threshold] are regarded as transparent. White noise is addedto the intensities of the foreground. If РІР‚“inv key is specified then foreground pixel intensities are inverted. If -randinv key is specified then it is randomly selected whether for this sampleinversion will be applied. Finally, the obtained image is placed onto arbitrarybackground from the background description file, resized to the pixel sizespecified by -w and -h and stored into the file specified by the -vec command line parameter.


Positive samples alsomay be obtained from a collection of previously marked up images. This collectionis described by text file similar to background description file. Each line ofthis file corresponds to collection image. The first element of the line isimage file name. It is followed by number of object instances. The followingnumbers are the coordinates of bounding rectangles (x, y, width, height).


Example ofdescription file:


Directory structure:






File info.dat:

img/img1.jpg 1 140 10045 45

img/img2.jpg 2 100 20050 50 50 30 25 25

[filename] [# of objects] [[left_x top_y width height] [... 2nd object]...]


Image img1.jpg contains single object instance with bounding rectangle (140, 100,45, 45). Image img2.jpg contains two object instances where (x, y) is the position of the left-upper cornerposition of the object where the origin (0,0) is the left-upper corner of theentire image.


In order to createpositive samples from such collection -info argument shouldbe specified instead of -img:

- info <collection_file_name>

description file of marked up images collection


The scheme of samplecreation in this case is as follows. The object instances are taken fromimages. Then they are resized to samples size and stored in output file. Nodistortion is applied, so the only affecting arguments are -w-h-show and РІР‚“num.


createsamples utility may be used forexamining samples stored in positive samples file. In order to do thisonly -vec, -w and -h parameters should bespecified.


Note that fortraining, it does not matter how positive samples files are generated. So the createsamples utility is only one way to collect/create a vectorfile of positive samples.


The next step aftersamples creation is training of classifier. It is performed by the haartraining utility.

This is a list of options of the haartraining

Usage: haartraining




  [-npos<number_of_positive_samples = 2000>]

  [-nneg<number_of_negative_samples = 2000>]

  [-nstages<number_of_stages = 14>]

  [-nsplits<number_of_splits = 1>]

  [-mem<memory_in_MB = 200>]

  [-sym(default)] [-nonsym]

  [-minhitrate<min_hit_rate = 0.995000>]

  [-maxfalsealarm<max_false_alarm_rate = 0.500000>]

 [-weighttrimming <weight_trimming = 0.950000>]


  [-mode<BASIC (default) | CORE | ALL>]

  [-w<sample_width = 24>]

  [-h<sample_height = 24>]

  [-bt<DAB | RAB | LB | GAB (default)>]

 [-err <misclass (default) | gini | entropy>]

  [-maxtreesplits<max_number_of_splits_in_tree_cascade = 0>]

  [-minpos<min_number_of_positive_samples_per_cluster = 500>]


Command linearguments:

- data <dir_name>

directory name in which the trained classifier is stored

- vec <vec_file_name>

file name of positive sample file (created by trainingsamples utility or by any other means)

- bg <background_file_name>

background description file

- npos <number_of_positive_samples>,

- nneg <number_of_negative_samples>

number of positive/negative samples used in training of each classifierstage. Reasonable values are npos = 7000 and nneg = 3000.

- nstages <number_of_stages>

number of stages to be trained

- nsplits <number_of_splits>

determines the weak classifier used in stage classifiers. If 1, then asimple stump classifier is used, if 2 and more, then CART classifier with number_of_splits internal (split) nodes isused. This is thenumber of feature values to be used in a weak classifier, not the number ofsplits of classifier tree. One stage is composed as a linear combination of theweak classifiers, and the final classifier is a cascade of such stages. PS. Ido not think CART helps theoretically, but an empirical result shows it helped[2].

- mem <memory_in_MB>

Available memory in MB for precalculation. The more memory you have thefaster the training process

- sym (default),

- nonsym

specifies whether the object class under training has vertical symmetry ornot. Vertical symmetry speeds up training process. For instance, frontal facesshow off vertical symmetry

- minhitrate <min_hit_rate>

minimal desired hit rate for each stage classifier. Overall hit rate may beestimated as (min_hit_rate^number_of_stages)

- maxfalsealarm <max_false_alarm_rate>

maximal desired false alarm rate for each stage classifier. Overall falsealarm rate may be estimated as (max_false_alarm_rate^number_of_stages)

- weighttrimming <weight_trimming>

Specifies whether and how big weight trimming should be used. A decentchoice is 0.90. This is a parameter for boosting algorithm. You may refer Boosting sectionof OpenCVMachine Learning Reference (cvBoostParams.weight_trim_rate).

- eqw

Enable equal weights for positives and negatives. You may feel that it isunfair to handle positives and negatives equally when their numbers areunequal.  The haartrainig utility calculates error in response to theratio of number of positives and negatives to achieve the fairness as default.This option disables it. [1] states as "error is calculated with respectto the weighted positive and negative samples."  

- mode <BASIC (default) | CORE | ALL>

selects the type of haar features set used in training. BASIC use onlyupright features, while ALL uses the full set of upright and 45 degree rotatedfeature set. See [1] for more details.

- w <sample_width>,

- h <sample_height>

Size of training samples (in pixels). Must have exactly the same values asused during training samples creation (utility trainingsamples)

-  bt  <DAB | RAB |LB | GAB (default)>

DAB is Discrete AdaBoost, RAB is Real Ada Boost, LB is Logit Boost, GAB is Gentle Adaboost. [2]states as "GAB is not only the best, but also the fastestclassifier."

-  err<misclass (default) | gini | entropy>]

Type of used erroravailable only when Discrete AdaBoost algorithm is applied. misclass == numberof misclassified samples / total samples. Entropy or Gini can also be used to measure a kind of errors.

-  maxtreesplits<max_number_of_splits_in_tree_cascade = 0>]

Construct classifierstrees, rather than cascades. Theoretically, a serial cascade should be enough,but empirically, a tree structure may help something.

-  minpos<min_number_of_positive_samples_per_cluster = 500>]

Note: in order to use multiprocessoradvantage a compiler that supports OpenMP 1.0 standard should be used.

The haartraining utility outputs asfollows:


|  N |%SMP|F| ST.THR |    HR   |    FA   | EXP. ERR|


|  1|100%|-|-0.857040| 1.000000| 1.000000| 0.082075|


|  2|100%|+|-1.702127| 1.000000| 1.000000| 0.102168|



The iteration number of feature selection training.


The percentage of original samples left. 


+ indicates the feature is flipped. Related to -sym (default) option. 


Stage threshold


Hit rate


False alarm rate. FYI: (HR, FA) = (1.0, 1.0) means the detector simply alarms every time for everything.


Expected (misclassification) error.

The haartraining utility creates<dir_name>.xml file when the training completely finished where<dir_name> is the argument for -data option. If you want to generate axml file before the haartraining utility has finished, you can use aconvert_cascade utility located at  OpenCV/samples/c/convert_cascade.c (your opencv installed directory).Compile it. The usage of the utility is as follows:

 $ convert_cascade--size="<sample_width>x<sampe_height>"<haartraining_ouput_dir> <ouput_xml_file>


OpenCV cvHaarDetectObjects()function (in particular haarFaceDetect demo) is used for detection.

There is a facedetect utility in OpenCV/samples/c/facedetect.c (your opencvinstalled directory). Compile it. The usage of the utility is as follows:

$ facedetect --cascade=<xml_file>[filename(image or video)|camera_index]

Usually the camera_index is 0. If you connect more thanone camera to your computer, it would be 1 or 2, etc.

Test Samples

In order to evaluatethe performance of trained classifier a collection of marked up images isneeded. When such collection is not available test samples may be created fromsingle object image by createsamples utility. Thescheme of test samples creation in this case is similar to training samplescreation since each test sample is a background image into which a randomlydistorted and randomly scaled instance of the object picture is pasted at arandom position.


If both -img and -info arguments are specified thentest samples will be created by createsamples utility. Thesample image is arbitrary distorted as it was described below, then it isplaced at random location to background image and stored. The correspondingdescription line is added to the file specified by -info argument.


The -w and -h keys determine the minimal sizeof placed object picture.


The test image filename format is as follows:

imageOrderNumber_x_y_width_height.jpg, where x, y, width and height are the coordinates of placedobject bounding rectangle.

Note that you should use abackground images set different from the background image set used duringtraining.

Performance Evaluation

In order to evaluatethe performance of the classifier performance utility may beused. It takes a collection of marked up images, applies the classifier andoutputs the performance, i.e. number of found objects, number of missedobjects, number of false alarms and other information.

Here is a list of options of the performance utility

Usage: ./performance



  [-maxSizeDiff<max_size_difference = 1.500000>]

  [-maxPosDiff<max_position_difference = 0.300000>]

  [-sf<scale_factor = 1.200000>]


  [-nos<number_of_stages = -1>]

  [-rs<roc_size = 40>]

  [-w<sample_width = 24>]

  [-h<sample_height = 24>]

Command linearguments:

- data <dir_name>

directory name in which the trained classifier is stored. A haarcascade xml file can also bespecified. In that case -w and -h options are not necessary and ignored becausethe haarcascade xml file includes the infomation. FYI:cvLoadHaarClassifierCascade function used in the performance utilitysupports both classifier directory and haarcascade xml file, but this functionis obsolete.

- w <sample_width>,

- h <sample_height>

Size of training samples (in pixels). Must have exactly the same values asused during training (utility haartraining)

- info <tests_collection_file_name>

file with test samples description

- maxSizeDiff <max_size_difference>,

- maxPosDiff <max_position_difference>

determine the criterion of reference and detected rectangles coincidence.Default values are 1.5 and 0.3 respectively.

- sf <scale_factor>,

detection parameter. Default value is 1.2. Enlarge window sizes by multiplyingwith this number until exceeding the size of the picture.

- ni

Do not save resultedimage files of detection. As default, the performance utility requiresdirectories which prefix 'det-' is added to test image directories to store theresulted image files showing positions of detected objects by rectangles. Forexample, if a test image file has a name as "tests/01/img01.bmp/0001_0341_0241_0039_0039.jpg", we have to create adirectory "det-tests/01/img01.bmp" beforehand, otherwise, we will seean error message as "OpenCV ERROR: Unspecified error (could not save theimage) in function cvSaveImage, loadsave.cpp(520)".  We can avoid theerror with -ni option or by creating directories as

    $cat <tests_collection_file_name> | perl -pe 's!^(.*)/.*$!det-$1!g' |xargs mkdir -p

- rs <roc_size>

Resolution of ReceiverOperating Curves (ROCs). Default value is 40. This is not a parameter for detection, but foroutputs (just required for malloc)

An output of the performance utilityis as follows:


|           File Name           | Hits |Missed| False|


|tests/01/img01.bmp/0001_0153_005|    0|     1|     0|




|                          Total|   874|  554|    72|


Number of stages: 15

Number of weak classifiers: 68

Total time: 115.000000


        874    72      0.612045       0.050420

        874    72      0.612045       0.050420

        360    2       0.252101       0.001401

        115    0       0.080532       0.000000

        26     0       0.018207       0.000000

        8      0       0.005602       0.000000

        4      0       0.002801       0.000000

        1      0       0.000700       0.000000


'Hits' shows the number of correct detections. 'Missed' shows the number ofmissed detections or false negatives (Truly there exists, but the detectormissed to detect it). 'False' shows the number of false alarms or falsepositives (Truly there does not exist, but the detector alarmed as thereexists.)

The latter table is for ROC plot. ROC shows how well we can correctlydetects when we allow some false alarm probabilities. The simplest way todetect everything is to alarm always. Refer ReceiverOperating Curves (ROCs).You may plot it as following matlab codes:

>> ROC = [       874     72      0.612045       0.050420

        874    72      0.612045       0.050420

        360    2       0.252101       0.001401

        115    0       0.080532       0.000000

        26     0       0.018207       0.000000

        8      0       0.005602       0.000000

        4      0       0.002801       0.000000

        1      0       0.000700       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000        0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000

        0      0       0.000000       0.000000];

>> plot(ROC(:,4),ROC(:,3));

0.05 is the max false alarm value specified at thehaartraining stage. This ROC plot has values upto 0.05, not 1.0 as an usual ROCplot.


[1] Rainer Lienhartand Jochen Maydt. An Extended Set of Haar-like Features for Rapid ObjectDetection. Submitted to ICIP2002.

[2] AlexanderKuranov, Rainer Lienhart, and Vadim Pisarevsky. An Empirical Analysis ofBoosting Algorithms for Rapid Objects With an Extended Set of Haar-likeFeatures. Intel Technical Report MRL-TR-July02-01, 2002.

[3] Paul Viola and Michael J. Jones. Rapid Object Detection using a BoostedCascade of Simple Features. IEEE CVPR, 2001.






