Unofficial Documentation
This is an attempt to document the OpenCV Video Surveillance facility.
-
It is at the moment, just a collection of insights from working with the under-documented code, and not a complete full documentation. In fact, at the moment this document will only focus on the Foreground / Background Discrimination part of the complete algorithm. Please feel free to add more details, improve the layout and edit the contents. (AdiShavit)
The OpenCV "Video Surveillance" facility, also called "blob tracker" through much of the code, is a simple but practical facility intended to track moving foreground objects against a relatively static background. Conceptually it consists of a three-stage video processing pipeline:
- A foreground/background discriminator which labels each pixel as either foreground or background.
- A blob detector which groups adjacent "foreground" pixels into blobs, flood-fill style.
- A blob tracker which assigns ID numbers to blobs and tracks their motion frame-to-frame.
Almost all the sophistication (and CPU time!) in this facility is devoted to the first stage, which uses a state of the art (as of 2003) algorithm. The other two stages use relatively unsophisticated algorithms. This has the advantage of making the module fast and quite generic.
More specialized applications will typically go on to use sophisticated algorithms to classify the blobs and perhaps extract six-degree-of-freedom orientation information from them using domain-specific object models. The basic module provided does not do this, but may be used as a jumping-off point to develop such a system, for example by using the OpenCV Haar-based routines to classify the blobs.
The foreground/background discrimination stage is both memory and CPU intensive. It builds and maintains a histogram-based statistical model of the video image background on a per-pixel basis; consequently it may easily wind up using on the loose order of a gigabyte of ram to process a TV-resolution video stream. (This may sound like gross overkill. It is not. Vision is a hard problem. It seems deceptively easy only because humans come with extraordinarily good subconscious vision logic built in. Computers are not so lucky.)
Resource consumption may be controlled via a number of tuning parameters. In general, the results obtained do not depend too critically upon the parameter settings; the casual user can (and probably should) simply leave them at their default settings.
The casual user can simply treat the module as a black box which translates a video stream into a list of moving blobs. At this level, the sample program samples/c/blobtrack.cpp included with the OpenCV source distribution may be used as-is or lightly tweaked, or equivalent code may be written using (say) the OpenCV Python or Matlab bindings. A Linux-based Gtk wrapper in C is available here. A "beta" documentation of these modules can be found on the SVN: doc/vidsurv/Blob_Tracking_Modules.doc
To build more sophisticated computer vision applications you will need to understand and extend the detailed internal structure of this facility, which actually involves six major video pipeline processing stages, each of which in general is implemented by several alternative modules offering different space/time/complexity tradeoffs, each alternative module having its own set of tuning parameters.
An article providing a good overview of the complete facility internal architecture may be found here:
-
Chen, T.; Haussecker, H.; Bovyrin, A.; Belenov, R.; Rodyushkin, K.; Kuranov, A.; Eruhimov, V. "Computer Vision Workload Analysis: Case Study of Video Surveillance Systems." Intel Technology Journal. (May 2005). (PDF)
Foreground / Background Segmentation
The implementation provides a choice of two modules for this subtask:
-
CV_BG_MODEL_FGD
-
CV_BG_MODEL_MOG
CV_BG_MODEL_FGD
The most sophisticated (and default) module is based on an algorithm from Foreground Object Detection from Videos Containing Complex Background, Liyuan Li, Weimin Huang, Irene Y.H. Gu, and Qi Tian, ACM MM2003 9p, available here (pdf).
Internally (in cvbgfg_acmmm2003.cpp) it uses a change-detection algorithm from: P.Rosin, Thresholding for Change Detection, ICCV, 1998, available here (pdf).
Parameters for this module are supplied via the CvFGDStatModelParams struct:
typedef struct CvFGDStatModelParams { int Lc; /* Quantized levels per 'color' component. Power of two, typically 32, 64 or 128. */ int N1c; /* Number of color vectors used to model normal background color variation at a given pixel. */ int N2c; /* Number of color vectors retained at given pixel. Must be > N1c, typically ~ 5/3 of N1c. */ /* Used to allow the first N1c vectors to adapt over time to changing background. */ int Lcc; /* Quantized levels per 'color co-occurrence' component. Power of two, typically 16, 32 or 64. */ int N1cc; /* Number of color co-occurrence vectors used to model normal background color variation at a given pixel. */ int N2cc; /* Number of color co-occurrence vectors retained at given pixel. Must be > N1cc, typically ~ 5/3 of N1cc. */ /* Used to allow the first N1cc vectors to adapt over time to changing background. */ int is_obj_without_holes;/* If TRUE we ignore holes within foreground blobs. Defaults to TRUE. */ int perform_morphing; /* Number of erode-dilate-erode foreground-blob cleanup iterations. */ /* These erase one-pixel junk blobs and merge almost-touching blobs. Default value is 1. */ float alpha1; /* How quickly we forget old background pixel values seen. Typically set to 0.1 */ float alpha2; /* "Controls speed of feature learning". Depends on T. Typical value circa 0.005. */ float alpha3; /* Alternate to alpha2, used (e.g.) for quicker initial convergence. Typical value 0.1. */ float delta; /* Affects color and color co-occurrence quantization, typically set to 2. */ float T; /* "A percentage value which determines when new features can be recognized as new background." (Typically 0.9).*/ float minArea; /* Discard foreground blobs whose bounding box is smaller than this threshold. */ } CvFGDStatModelParams;
This is initialized with the defaults:
/* default paremeters of foreground detection algorithm */ #define CV_BGFG_FGD_LC 128 #define CV_BGFG_FGD_N1C 15 #define CV_BGFG_FGD_N2C 25 #define CV_BGFG_FGD_LCC 64 #define CV_BGFG_FGD_N1CC 25 #define CV_BGFG_FGD_N2CC 40 /* BG reference image update parameter */ #define CV_BGFG_FGD_ALPHA_1 0.1f /* stat model update parameter 0.002f ~ 1K frame(~45sec), 0.005 ~ 18sec (if 25fps and absolutely static BG) */ #define CV_BGFG_FGD_ALPHA_2 0.005f /* start value for alpha parameter (to fast initiate statistic model) */ #define CV_BGFG_FGD_ALPHA_3 0.1f #define CV_BGFG_FGD_DELTA 2 #define CV_BGFG_FGD_T 0.9f #define CV_BGFG_FGD_MINAREA 15.f #define CV_BGFG_FGD_BG_UPDATE_TRESH 0.5f
If we want to update alpha2, the paper explains:
-
If we think that n frames for the system to response to an “once-off” background change is quick enough, we should choose the learning rate alpha2 from (22):
-
(22) alpha2 > 1 - (1 - T)^(1/n)
As example, if we want the system to response to an ideal "once-off" background change after 20 seconds with 25 fps frame rate and T = 90%, alpha2 should be larger than 0.0046 but not too larger than it to prevent the system not to sensitive to noise and foreground objects. So:
n = 20*25 = 500 T = 0.9 => 0.005f = CV_BGFG_FGD_ALPHA_2 = alpha2 > 1 - ((1 - 0.9)^(1 / 500)) = 0.00459458265
T is a percentage value which determines when the new features can be recognized as new background appearance. With a large value of T, the system is stable but slow to response to the "once-off" changes. However, if T is small, the system is easy to learn the frequent foreground features as new background appearances. In our tests, T was set as 90%.
-
CV_BG_MODEL_MOG
This is an implementation of the Mixture of Gaussians paper: P. KadewTraKuPong and R. Bowden,An improved adaptive background mixture model for real-time tracking with shadow detection, in Proc. 2nd European Workshp on Advanced Video-Based Surveillance Systems, 2001. It can be foundhere (pdf)(citeceer).
just a comment
Due to the complexity of the process and lack of documentation, the following link may be quite helpful in understanding all the necessary steps. http://www.merl.com/papers/docs/TR2003-36.pdf
Example
Blob Entrance Detection
The “Blob Entering Detection” module uses the result (FG mask) of the “FG/BG Detection” module to detect that a new blob object enters the scene.
Actually, two implementations exist for this module:
-
BD_CC - Detect new blob by tracking connected components of ForeGround mask
- BD_Simple - Detect new blob by uniform moving of connected components of FG mask
The execution is similar for both, using the background model explained above we can resume the call of this module by:
#include "cvaux.h" IplImage *current_frame; int nextBlobID=1; CvBlobSeq* newBlobList, CvBlobSeq* blobList; CvBGStatModel* bgModel = cvCreateGaussianBGModel(cvQueryFrame(capture),NULL); CvBlobDetector* blobDetect = cvCreateBlobDetectorCC(); //or cvCreateBlobDetectorSimple(); ... while(cvGrabFrame(capture)) { //Compute the FG current_frame = cvRetrieveFrame(capture); cvUpdateBGStatModel(current_frame,bgModel); .... //Then once the BG is trained use FG to detect new blob. if(FrameCount > FGTrainFrames) { blobDetect->DetectNewBlob(current_frame, bgModel->foreground, &newBlobList, &blobList); //Loop on the new blob found. for(i=0; i<newBlobList.GetBlobNum(); ++i) { CvBlob* pBN = NewBlobList.GetBlob(i); pBN->ID = nextBlobID; //Check if the size of the new blob is big enough to be inserted in the blobList. if(pBN && pBN->w >= CV_BLOB_MINW && pBN->h >= CV_BLOB_MINH) { cout << "Add blob #" << nextBlobID << endl; blobList.AddBlob(pBN); nextBlobID++; } } } //Then a tracking should be performed to follow the blob ... }
This code is a simplification of the code in "cvaux/vs/blobtrackingauto.cpp"