Camshift

最新推荐文章于 2023-06-30 15:42:58 发布

hachirou

最新推荐文章于 2023-06-30 15:42:58 发布

阅读量3k

点赞数

文章标签： image 算法 object each algorithm float

Camshift算法是Continuously Adaptive Mean Shift algorithm的简称。它是一个基于MeanSift的改进算法。它首次由Gary R.Bradski等人提出和应用在人脸的跟踪上，并取得了不错的效果。由于它是利用颜色的概率信息进行的跟踪，使得它的运行效率比较高。 Camshift算法的过程由下面步骤组成：

（1）确定初始目标及其区域；

（2）计算出目标的色度（Hue)分量的直方图；

（3）利用直方图计算输入图像的反向投影图（后面做进一步的解释）；

（４）利用MeanShift算法在反向投影图中迭代收索，直到其收敛或达到最大迭代次数。并保存零次矩；

（5）从第（4）步中获得收索窗口的中心位置和计算出新的窗口大小，以此为参数，进入到下一幀的目标跟踪。（即跳转到第（2）步）；

几点说明：

1. 在输入图像进行反向投影图之前在HSV空间内做了一个阀值处理，用以滤掉一些噪声。

2. 反向投影图则是概率分布图，在反向投影图中某一像素点的值指的是这个点符合目标的概率分布的概率是多少，或者直接说其为目标图像像素点的像素点是多少。计算方法为：根据像素点的像素值查目标的直方图，其对应像素值的概率是多少就做为该点在反向投影图中的值。

3. Camshit算法到底是怎样自适应调整窗口的大小的。扩大：Canshift算法在计算窗口大小前，在MeanShift算出的窗口的四个方向上增大了 TOLERANCE，即高和宽都增大了2TOLERANCE（此值自己调整设置），这才有可能使得窗口能够变大。缩小：在扩大的窗口内重新计算0阶矩，1 阶矩和2阶矩，利用矩的值重新计算高和宽。因此Camshif算法相当于在MeanShift的结果上，再做了一个调整，从而使得跟踪的窗口大小能够随目标的大小变化。

优点：算法的效率比较高，如果能利用多少特征做出来的统计直方图，我估计实验效果会更好。

缺点：（1）只利用颜色统计做的跟踪，在背景有相似颜色时，会出现跟踪错误的情况。（2）不能做多目标跟踪。（3）由于它只在初始位置（而不是从每个像素点）开始迭代，所以有可能在初始位置错了后，收敛的位置还是原位置（即跟丢了后，可能会找不回来）。

问题：论文中有关于窗口大小调整，是根据直方图来迭代求解，不知是怎么回事？在代码中没看到实现。在此向大家请教！

下面是Camshift算法Demo的代码：

代码

1 #ifdef _CH_
  2 #pragma package <opencv>
  3 #endif
  4
  5 #define CV_NO_BACKWARD_COMPATIBILITY
  6
  7 #ifndef _EiC
  8 #include " cv.h "
  9 #include " highgui.h "
10 #include < stdio.h >
11 #include < ctype.h >
12 #endif
13
14 IplImage * image = 0 , * hsv = 0 , * hue = 0 , * mask = 0 , * backproject = 0 , * histimg = 0 ;
15 CvHistogram * hist = 0 ;
16
17 int backproject_mode = 0 ;
18 int select_object = 0 ;
19 int track_object = 0 ;
20 int show_hist = 1 ;
21 CvPoint origin;
22 CvRect selection;
23 CvRect track_window;
24 CvBox2D track_box;
25 CvConnectedComp track_comp;
26 int hdims = 16 ;
27 float hranges_arr[] = { 0 , 180 };
28 float * hranges = hranges_arr;
29 int vmin = 10 , vmax = 256 , smin = 30 ;
30
31 void on_mouse( int event , int x, int y, int flags, void * param )
32 {
33       if ( ! image )
34           return ;
35
36       if ( image -> origin )
37          y = image -> height - y;
38
39       if ( select_object ) // 表明还正在框选目标
40      {
41          selection.x = MIN(x,origin.x);
42          selection.y = MIN(y,origin.y);
43          selection.width = selection.x + CV_IABS(x - origin.x);
44          selection.height = selection.y + CV_IABS(y - origin.y);
45
46           // 保证数据的有效性
47          selection.x = MAX( selection.x, 0 );
48          selection.y = MAX( selection.y, 0 );
49          selection.width = MIN( selection.width, image -> width );
50          selection.height = MIN( selection.height, image -> height );
51          selection.width -= selection.x;
52          selection.height -= selection.y;
53      }
54
55       switch ( event )
56      {
57       case CV_EVENT_LBUTTONDOWN: // 框选目标
58          origin = cvPoint(x,y);
59          selection = cvRect(x,y, 0 , 0 );
60          select_object = 1 ;
61           break ;
62       case CV_EVENT_LBUTTONUP: // 框选结束
63          select_object = 0 ;
64           if ( selection.width > 0 && selection.height > 0 )
65              track_object = - 1 ;
66           break ;
67      }
68 }
69
70
71 CvScalar hsv2rgb( float hue )
72 {
73       int rgb[ 3 ], p, sector;
74       static const int sector_data[][ 3 ] =
75          {{ 0 , 2 , 1 }, { 1 , 2 , 0 }, { 1 , 0 , 2 }, { 2 , 0 , 1 }, { 2 , 1 , 0 }, { 0 , 1 , 2 }};
76      hue *= 0.033333333333333333333333333333333f ;
77      sector = cvFloor(hue);
78      p = cvRound( 255 * (hue - sector));
79      p ^= sector & 1 ? 255 : 0 ;
80
81      rgb[sector_data[sector][ 0 ]] = 255 ;
82      rgb[sector_data[sector][ 1 ]] = 0 ;
83      rgb[sector_data[sector][ 2 ]] = p;
84
85       return cvScalar(rgb[ 2 ], rgb[ 1 ], rgb[ 0 ], 0 );
86 }
87
88 int main( int argc, char ** argv )
89 {
90      CvCapture * capture = 0 ;
91
92       if ( argc == 1 || (argc == 2 && strlen(argv[ 1 ]) == 1 && isdigit(argv[ 1 ][ 0 ])))
93          capture = cvCaptureFromCAM( argc == 2 ? argv[ 1 ][ 0 ] - ' 0 ' : 0 );
94       else if ( argc == 2 )
95          capture = cvCaptureFromAVI( argv[ 1 ] );
96
97       if ( ! capture )
98      {
99          fprintf(stderr, " Could not initialize capturing.../n " );
100           return - 1 ;
101      }
102
103      printf( " Hot keys: /n "
104           " /tESC - quit the program/n "
105           " /tc - stop the tracking/n "
106           " /tb - switch to/from backprojection view/n "
107           " /th - show/hide object histogram/n "
108           " To initialize tracking, select the object with mouse/n " );
109
110      cvNamedWindow( " Histogram " , 1 );
111      cvNamedWindow( " CamShiftDemo " , 1 );
112      cvSetMouseCallback( " CamShiftDemo " , on_mouse, 0 );
113      cvCreateTrackbar( " Vmin " , " CamShiftDemo " , & vmin, 256 , 0 );
114      cvCreateTrackbar( " Vmax " , " CamShiftDemo " , & vmax, 256 , 0 );
115      cvCreateTrackbar( " Smin " , " CamShiftDemo " , & smin, 256 , 0 );
116
117       for (;;)
118      {
119          IplImage * frame = 0 ;
120           int i, bin_w, c;
121
122          frame = cvQueryFrame( capture );
123           if ( ! frame )
124               break ;
125
126           if ( ! image )
127          {
128               /* allocate all the buffers */
129              image = cvCreateImage( cvGetSize(frame), 8 , 3 );
130              image -> origin = frame -> origin;
131              hsv = cvCreateImage( cvGetSize(frame), 8 , 3 );
132              hue = cvCreateImage( cvGetSize(frame), 8 , 1 );
133              mask = cvCreateImage( cvGetSize(frame), 8 , 1 );
134              backproject = cvCreateImage( cvGetSize(frame), 8 , 1 );
135              hist = cvCreateHist( 1 , & hdims, CV_HIST_ARRAY, & hranges, 1 );
136              histimg = cvCreateImage( cvSize( 320 , 200 ), 8 , 3 );
137              cvZero( histimg );
138          }
139
140          cvCopy( frame, image, 0 );
141          cvCvtColor( image, hsv, CV_BGR2HSV );
142
143           if ( track_object )
144          {
145               int _vmin = vmin, _vmax = vmax;
146
147              cvInRangeS( hsv, cvScalar( 0 ,smin,MIN(_vmin,_vmax), 0 ),
148                          cvScalar( 180 , 256 ,MAX(_vmin,_vmax), 0 ), mask ); // 去除噪声，在此数据内的值，确定mask为1
149              cvSplit( hsv, hue, 0 , 0 , 0 ); // 获得色调分量，并以此来做反向投影图
150
151               if ( track_object < 0 )
152              {
153                   float max_val = 0 .f;
154                  cvSetImageROI( hue, selection );
155                  cvSetImageROI( mask, selection );
156                  cvCalcHist( & hue, hist, 0 , mask ); // 计算选中部分直方图
157                  cvGetMinMaxHistValue( hist, 0 , & max_val, 0 , 0 );
158                  cvConvertScale( hist -> bins, hist -> bins, max_val ? 255 . / max_val : 0 ., 0 );
159                  cvResetImageROI( hue );
160                  cvResetImageROI( mask );
161                  track_window = selection;
162                  track_object = 1 ;
163
164                  cvZero( histimg );
165                  bin_w = histimg -> width / hdims;
166                   for ( i = 0 ; i < hdims; i ++ )
167                  {
168                       int val = cvRound( cvGetReal1D(hist -> bins,i) * histimg -> height / 255 ); // 获取直方图的中每一项的高
169                      CvScalar color = hsv2rgb(i * 180 .f / hdims); // 直方图每一项的颜色是根据项数变化的
170                      cvRectangle( histimg, cvPoint(i * bin_w,histimg -> height),   // 画直方图
171                                   cvPoint((i + 1 ) * bin_w,histimg -> height - val),
172                                   color, - 1 , 8 , 0 );
173                  }
174              }
175
176              cvCalcBackProject( & hue, backproject, hist ); // 计算反向投影图backproject
177              cvAnd( backproject, mask, backproject, 0 );         // 去除上下阀值外的点后的投影图
178              cvCamShift( backproject, track_window,         // 利用camshift搜索0-255的灰度图像
179                          cvTermCriteria( CV_TERMCRIT_EPS | CV_TERMCRIT_ITER, 10 , 1 ),
180                           & track_comp, & track_box );
181              track_window = track_comp.rect;     // 获得新的跟踪窗口
182
183               if ( backproject_mode )
184                  cvCvtColor( backproject, image, CV_GRAY2BGR );
185
186               if ( ! image -> origin ) // 如果为假，需要改变椭圆的角度
187                  track_box.angle = - track_box.angle;
188              cvEllipseBox( image, track_box, CV_RGB( 255 , 0 , 0 ), 3 , CV_AA, 0 ); // 画跟踪椭圆
189          }
190
191           if ( select_object && selection.width > 0 && selection.height > 0 ) // 在框住的时候反向显示
192          {
193              cvSetImageROI( image, selection );
194              cvXorS( image, cvScalarAll( 255 ), image, 0 );
195              cvResetImageROI( image );
196          }
197
198          cvShowImage( " CamShiftDemo " , image );
199          cvShowImage( " Histogram " , histimg );
200
201          c = cvWaitKey( 10 );
202           if ( ( char ) c == 27 )
203               break ;
204           switch ( ( char ) c )
205          {
206           case ' b ' :
207              backproject_mode ^= 1 ;
208               break ;
209           case ' c ' :
210              track_object = 0 ;
211              cvZero( histimg );
212               break ;
213           case ' h ' :
214              show_hist ^= 1 ;
215               if ( ! show_hist )
216                  cvDestroyWindow( " Histogram " );
217               else
218                  cvNamedWindow( " Histogram " , 1 );
219               break ;
220           default :
221              ;
222          }
223      }
224
225      cvReleaseCapture( & capture );
226      cvDestroyWindow( " CamShiftDemo " );
227
228       return 0 ;
229 }
230
231 #ifdef _EiC
232 main( 1 , " camshiftdemo.c " );
233 #endif

这里主要介绍下MeanShift算法的迭代过程，毕竟Camshift算法是以它为核心的。MeanShift算法是一种寻找局部极值的方法。做为一种直观上的理解是它一步一步爬向最高点即爬山算法.而怎么个爬法，用计算出的重心做为下一步窗口的中心，直到窗口的位置不再变化。在理解 MeanShift算法的时候，可以先不加入核函数（是计算距离对统计分布的影响）和权重函数（如人为主观的影响）。

在Camshift算法中MeanShift是通过1阶矩除以0阶矩来计算重心的。其算法的代码如下：

代码

CV_IMPL int
cvMeanShift( const void * imgProb, CvRect windowIn,
             CvTermCriteria criteria, CvConnectedComp * comp )
{
    CvMoments moments;
     int     i = 0 , eps;
    CvMat  stub, * mat = (CvMat * )imgProb; // 输入的整个图像
    CvMat  cur_win;
    CvRect cur_rect = windowIn; // 当前矩形窗口初始化为输入窗口

    CV_FUNCNAME( " cvMeanShift " );

     if ( comp )
        comp -> rect = windowIn; // 初始化联通区域

    moments.m00 = moments.m10 = moments.m01 = 0 ; // 初始化0、1阶矩

    __BEGIN__;

    CV_CALL( mat = cvGetMat( mat, & stub ));

     if ( CV_MAT_CN( mat -> type ) > 1 )
        CV_ERROR( CV_BadNumChannels, cvUnsupportedFormat );

     if ( windowIn.height <= 0 || windowIn.width <= 0 )
        CV_ERROR( CV_StsBadArg, " Input window has non-positive sizes " );

     if ( windowIn.x < 0 || windowIn.x + windowIn.width > mat -> cols ||        // x，y是指角点坐标而不是中心坐标
        windowIn.y < 0 || windowIn.y + windowIn.height > mat -> rows )
        CV_ERROR( CV_StsBadArg, " Initial window is not inside the image ROI " );

    CV_CALL( criteria = cvCheckTermCriteria( criteria, 1 ., 100 )); // 迭代的结束条件，

    eps = cvRound( criteria.epsilon * criteria.epsilon );

     for ( i = 0 ; i < criteria.max_iter; i ++ )
    {
         int dx, dy, nx, ny;
         double inv_m00;

        CV_CALL( cvGetSubRect( mat, & cur_win, cur_rect )); // cur_win指向窗口内的数据
        CV_CALL( cvMoments( & cur_win, & moments ));          // 计算窗口内的各种矩

         /* Calculating center of mass */
         if ( fabs(moments.m00) < DBL_EPSILON )
             break ;

        inv_m00 = moments.inv_sqrt_m00 * moments.inv_sqrt_m00;
        dx = cvRound( moments.m10 * inv_m00 - windowIn.width * 0.5 ); // 中心点的坐标-宽的一半
        dy = cvRound( moments.m01 * inv_m00 - windowIn.height * 0.5 ); // 中心点的坐标-高的一半

        nx = cur_rect.x + dx; // 新的x坐标
        ny = cur_rect.y + dy; // 新的y坐标

         if ( nx < 0 )
            nx = 0 ;
         else if ( nx + cur_rect.width > mat -> cols )
            nx = mat -> cols - cur_rect.width;

         if ( ny < 0 )
            ny = 0 ;
         else if ( ny + cur_rect.height > mat -> rows )
            ny = mat -> rows - cur_rect.height;

        dx = nx - cur_rect.x; // 重新
        dy = ny - cur_rect.y;
        cur_rect.x = nx;      // 新窗口的坐标值
        cur_rect.y = ny;

         /* Check for coverage centers mass & window */
         if ( dx * dx + dy * dy < eps )     // 迭代终止
             break ;
    }

    __END__;

     if ( comp ) // 返回矩形和0阶矩
    {
        comp -> rect = cur_rect;
        comp -> area = ( float )moments.m00;
    }

     return i;   // 返回迭代次数
}

Camshift算法代码：

代码

CV_IMPL int
cvCamShift( const void * imgProb, CvRect windowIn,
            CvTermCriteria criteria,
            CvConnectedComp * _comp,
            CvBox2D * box )
{
     const int TOLERANCE = 10 ;
    CvMoments moments;
     double m00 = 0 , m10, m01, mu20, mu11, mu02, inv_m00;
     double a, b, c, xc, yc;
     double rotate_a, rotate_c;
     double theta = 0 , square;
     double cs, sn;
     double length = 0 , width = 0 ;
     int itersUsed = 0 ;
    CvConnectedComp comp;
    CvMat  cur_win, stub, * mat = (CvMat * )imgProb;

    CV_FUNCNAME( " cvCamShift " );

    comp.rect = windowIn; // 初始化comp

    __BEGIN__;

    CV_CALL( mat = cvGetMat( mat, & stub ));

    CV_CALL( itersUsed = cvMeanShift( mat, windowIn, criteria, & comp )); // 调用meanshift计算质心
    windowIn = comp.rect; // 获得新的窗口的位置

     // 为了容错，窗口的四边都增大了TOLERANCE
    windowIn.x -= TOLERANCE;
     if ( windowIn.x < 0 )
        windowIn.x = 0 ;

    windowIn.y -= TOLERANCE;
     if ( windowIn.y < 0 )
        windowIn.y = 0 ;

    windowIn.width += 2 * TOLERANCE;
     if ( windowIn.x + windowIn.width > mat -> width )
        windowIn.width = mat -> width - windowIn.x;

    windowIn.height += 2 * TOLERANCE;
     if ( windowIn.y + windowIn.height > mat -> height )
        windowIn.height = mat -> height - windowIn.y;

    CV_CALL( cvGetSubRect( mat, & cur_win, windowIn )); // 获得指向子窗口的数据指针

     /* Calculating moments in new center mass */
    CV_CALL( cvMoments( & cur_win, & moments )); // 重新计算窗口内的各种矩

    m00 = moments.m00;
    m10 = moments.m10;
    m01 = moments.m01;
    mu11 = moments.mu11;
    mu20 = moments.mu20;
    mu02 = moments.mu02;

     if ( fabs(m00) < DBL_EPSILON )
        EXIT;

    inv_m00 = 1 . / m00;
    xc = cvRound( m10 * inv_m00 + windowIn.x ); // 新的中心坐标
    yc = cvRound( m01 * inv_m00 + windowIn.y );
    a = mu20 * inv_m00;
    b = mu11 * inv_m00;
    c = mu02 * inv_m00;

     /* Calculating width & height */
    square = sqrt( 4 * b * b + (a - c) * (a - c) );

     /* Calculating orientation */
    theta = atan2( 2 * b, a - c + square );

     /* Calculating width & length of figure */
    cs = cos( theta );
    sn = sin( theta );

    rotate_a = cs * cs * mu20 + 2 * cs * sn * mu11 + sn * sn * mu02;
    rotate_c = sn * sn * mu20 - 2 * cs * sn * mu11 + cs * cs * mu02;
    length = sqrt( rotate_a * inv_m00 ) * 4 ; // 长与宽的计算
    width = sqrt( rotate_c * inv_m00 ) * 4 ;

     /* In case, when tetta is 0 or 1.57... the Length & Width may be exchanged */
     if ( length < width )
    {
         double t;

        CV_SWAP( length, width, t );
        CV_SWAP( cs, sn, t );
        theta = CV_PI * 0.5 - theta;
    }

     /* Saving results */
     // 由于有宽和高的重新计算，使得能自动调整窗口大小
     if ( _comp || box )
    {
         int t0, t1;
         int _xc = cvRound( xc ); // 取整
         int _yc = cvRound( yc );

        t0 = cvRound( fabs( length * cs ));
        t1 = cvRound( fabs( width * sn ));

        t0 = MAX( t0, t1 ) + 2 ; // 宽的重新计算
        comp.rect.width = MIN( t0, (mat -> width - _xc) * 2 ); // 保证宽不超出范围

        t0 = cvRound( fabs( length * sn ));
        t1 = cvRound( fabs( width * cs ));

        t0 = MAX( t0, t1 ) + 2 ; // 高的重新计算
        comp.rect.height = MIN( t0, (mat -> height - _yc) * 2 ); // 保证高不超出范围
        comp.rect.x = MAX( 0 , _xc - comp.rect.width / 2 );
        comp.rect.y = MAX( 0 , _yc - comp.rect.height / 2 );

        comp.rect.width = MIN( mat -> width - comp.rect.x, comp.rect.width );
        comp.rect.height = MIN( mat -> height - comp.rect.y, comp.rect.height );
        comp.area = ( float ) m00;
    }

    __END__;

     if ( _comp )
         * _comp = comp;

     if ( box )
    {
        box -> size.height = ( float )length;
        box -> size.width = ( float )width;
        box -> angle = ( float )(theta * 180 . / CV_PI);
        box -> center = cvPoint2D32f( comp.rect.x + comp.rect.width * 0.5f ,
                                    comp.rect.y + comp.rect.height * 0.5f );
    }

     return itersUsed;
}

这是一外国人写的使用示例：

OpenCV's face tracker uses an algorithm called Camshift. Camshift consists of four steps:

Create a color histogram to represent the face
Calculate a "face probability" for each pixel in the incoming video frames
Shift the location of the face rectangle in each video frame
Calculate the size and angle

Here's how each step works:

1. Create a histogram Camshift represents the face it's tracking as a histogram (also called a barchart) of color values. Figure 1 shows two example histograms produced by the Camshift demo program that ships with OpenCV. The height of each colored bar indicates how many pixels in an image region have that "hue." Hue is one of three values describing a pixel's color in the HSV (Hue, Saturation, Value) color model. (For more on color, and color models, and see "The World of Color," SERVOMagazine , November 2005.)

Figure 1. Two examples of the color histogram that Camshift uses to represent a face

Figure 3. The normal and face-probability views as Camshift tracks my face. In the face-probability view, black pixels have the lowest value, and white, the highest. Gray pixels lie somewhere in the middle.

In the image region represented by the top histogram, a bluish hue is most common, and a slightly more lavender hue is the next most common. The bottom histogram shows a region in which the most common hue is the rightmost bin. This hue is almost, but not quite, red.

2. Calculate face probability - simpler than it sounds! The histogram is created only once, at the start of tracking. Afterwards, it's used to assign a "face-probability" value to each image pixel in the video frames that follow.

"Face probability" sounds terribly complicated, and heavily mathematical, but it's neither! Here's how it works. Figure 2 shows the bars from a histogram stacked one atop the other. After stacking them, it's clear that the rightmost bar accounts for about 45% of the pixels in the region. That means the probability that a pixel selected randomly from this region would fall into the rightmost bin is 45%. That's the "face probability" for a pixel with this hue. The same reasoning indicates that the face probability for the next histogram bin to the right is about 20%, since it accounts for about 20% of the stack's total height. That's all there is to it.

As new video frames arrive, the hue value for each pixel is determined. From that, the face histogram is used to assign a face probability to the pixel. This process is called "histogram backprojection" in OpenCV. There's a built-in method that implements it, called cvCalcBackProject().

Figure 3 shows the face-probability image in one video frame as Camshift tracks my face. Black pixels have the lowest probability value, and white, the highest. Gray pixels lie somewhere in the middle.

3. Shift to a new location With each new video frame, Camshift "shifts" its estimate of the face location, keeping it centered over the area with the highest concentration of bright pixels in the face-probability image. It finds this new location by starting at the previous location and computing the center of gravity of the face-probability values within a rectangle. It then shifts the rectangle so it's right over the center of gravity. It does this a few times to center the rectangle well. The OpenCV function cvCamShift()implements the steps for shifting to the new location.

This process of shifting the rectangle to correspond with the center of gravity is based on an algorithm called "Mean Shift," by Dorin Comaniciu. In fact, Camshift stands for "Continuously Adaptive Mean Shift."

4. Calculate size and angle The OpenCV method is called "Continuously Adaptive," and not just "Mean Shift," because it also adjusts the size and angle of the face rectangle each time it shifts it. It does this by selecting the scale and orientation that are the best fit to the face-probability pixels inside the new rectangle location.

Figure 2. (Click for larger view.) To see what "face probability" means, imaging stacking the bars in a histogram one atop the other. The probability associated with each color is the percent that color bar contributes to the total height of this stack.