mean-shift均值偏移算法

最新推荐文章于 2024-06-04 22:23:28 发布

伴君

最新推荐文章于 2024-06-04 22:23:28 发布

阅读量1.6k

点赞数 1

分类专栏： # 三维重建代码学习文章标签：算法机器学习概率论

本文链接：https://blog.csdn.net/AAAA202012/article/details/119964883

版权

三维重建代码学习专栏收录该内容

8 篇文章 7 订阅

订阅专栏

背景

Meanshift算法是Fukunaga于1975年提出的，其基本思想是利用概率密度的梯度爬升来寻找局部最优。1995年，YizongCheng针对离x越近的采样点对x周围的统计特性越有效，定义了一族核函数，并根据所有样本点的重要性不足，设定了一个权重系数，扩大了Meanshift的使用范围。

原理

给定d维空间中的n个样本点 $x_i$ ( $i$ = 1,…, $n$ ),在x点的Meanshift向量的基本形式定义为：
在这里插入图片描述
其中， $S_h$ 是一个半径为h的高维球区域，k表示n个样本点中由k个点落入区域Sh中。Meanshift向量表示区域中k个样本点相对于点x求偏移量再平均，该向量指向概率密度梯度的方向。

在扩展的Meanshift中，引入了核函数和权重的概念，相当于给不同的点赋予不同的质量，求解这些不同质量分布的点的质心位置。Meanshift的扩展形式可以表示为：
在这里插入图片描述
在Meanshift目标跟踪算法中，其样本点 $x_i$ ( $i$ =1,…,n)为候选区域内的像素点，其和函数g采用Epanechnikov。即计算空间中任意一点到中心位置（质心处）的欧式距离。而最重要的是权重w( $x_i$ )的求解，方法如下：

首先计算目标核函数直方图：
在这里插入图片描述
其中C表示归一化常数，k表示核函数（在核估计中通常是平滑作用），目标区域共n个样本点，该区域颜色分布为m级，b( $x_i$ )表示像素点 $x_i$ 的量化值。在彩色图像中，一般有RGB三通道，每个通道量化为16（4个比特位），三个通道共4096级（12位）

然后是候选核函数直方图：
在这里插入图片描述
其中y表示候选区域的中心位置，h表示核函数k的窗宽（用作归一化）， $x_i$ 为目标候选区域的样本点。

当目标区域与候选区域之间存在一定差异时，直方图的相似性较弱，Meanshift通过加入核函数k，来改变候选框质心的位置，提高 $\hat{q}_u$ 、 $\hat{p}_u$ 的相似性，其相似度度量准则为：
在这里插入图片描述

经过Taylor展开，并将 $\hat{p}_u$ (y)代入并化简得到：
在这里插入图片描述
由于第一项为常量，因此我们最大化 $\hat{\rho}_u$ (y)，本质上是寻找新的质心y使得候选区域和模板的相似程度最大化。对相似性函数求导，可得目标的新位置：

Meanshift之图形分割

Meanshift是一种基于颜色空间分布的图像分割算法，该算法的输出是一个经过滤色的”分色“图像，其颜色会变得渐变，并且细纹理会变得平缓。在Meanshift算法中每个像素点用一个五维的向量表示，前两个量是像素点在图像中的坐标，后三个量是每个像素点的颜色分量（蓝、绿、红）。在颜色分布的峰值处开始，通过华东窗口不断寻找属于同一类的像素点并统一像素点的像素值。滑动窗口由半径和颜色幅度构成，半径决定了滑动窗口的范围，即坐标的范围，颜色幅度决定了半径内像素点分类的标准。通过不断地移动滑动窗口，实现基于像素点颜色的图像分割。由于分割后同一类像素点具由相同像素值，因此Meanshift算法的输出结果是一个颜色渐变、纹理平缓的图像。

openvc库中自带有基于Meanshift的分割方法pyrMeanShiftFiltering()。

void PyrMeanShiftFiltering( const CvArr* srcarr,  //输入图像
				            CvArr* dstarr,        //输出图像
				            double  sp,           //颜色域半径
				            double sr,            //空间域半径
				            int max_level,        //金字塔最大层数                    
			                CvTermCriteria termcrit     //迭代终止条件
)

其中函数前两个参数是待分割的输入图像和分割后的输出图像，两个图像具有相同的尺寸并且必须是CV_8U的三通道彩色图像。第三个参数为滑动窗口的半径，第四个参数为滑动窗口的颜色幅度。第五个参数为分割金字塔缩放层数，当参数大于1时构建maxLevel + 1层高斯金字塔。该算法首先在尺寸最小的图像层中进行分类，之后将结果传播到尺寸较大的图像层，并且仅在颜色与上一层颜色差异大于滑动窗口颜色幅度的像素上再次进行分类，从而使得颜色区域的边界更清晰。当分割金字塔缩放层数为0时表示直接在整个原始图像时进行均值平移分割。

具体实现如下：

#include "opencv2/highgui/highgui.hpp"
#include "opencv2/core/core.hpp"
#include "opencv2/imgproc/imgproc.hpp"

#include <iostream>

using namespace cv;
using namespace std;

static void help(char** argv)
{
	cout << "\nDemonstrate mean-shift based color segmentation in spatial pyramid.\n"
		<< "Call:\n   " << argv[0] << " image\n"
		<< "This program allows you to set the spatial and color radius\n"
		<< "of the mean shift window as well as the number of pyramid reduction levels explored\n"
		<< endl;
}

//This colors the segmentations
static void floodFillPostprocess(Mat& img, const Scalar& colorDiff = Scalar::all(1))
{
	CV_Assert(!img.empty());
	RNG rng = theRNG();
	Mat mask(img.rows + 2, img.cols + 2, CV_8UC1, Scalar::all(0));
	for (int y = 0; y < img.rows; y++)
	{
		for (int x = 0; x < img.cols; x++)
		{
			if (mask.at<uchar>(y + 1, x + 1) == 0)
			{
				Scalar newVal(rng(256), rng(256), rng(256));
				floodFill(img, mask, Point(x, y), newVal, 0, colorDiff, colorDiff);
			}
		}
	}
}

string winName = "meanshift";
int spatialRad, colorRad, maxPyrLevel;
Mat img, res;

static void meanShiftSegmentation(int, void*)
{
	cout << "spatialRad=" << spatialRad << "; "
		<< "colorRad=" << colorRad << "; "
		<< "maxPyrLevel=" << maxPyrLevel << endl;
	pyrMeanShiftFiltering(img, res, spatialRad, colorRad, maxPyrLevel);
	//Mat imgGray;
	//cvtColor(res,imgGray,CV_RGB2GRAY);
	//imshow("res",res);
	floodFillPostprocess(res, Scalar::all(2));
	imshow(winName, res);
}

int main(int argc, char** argv)
{
	img = imread("shiyan5.tif");

	if (img.empty())
		return -1;

	spatialRad = 10;
	colorRad = 10;
	maxPyrLevel = 1;

	namedWindow(winName, WINDOW_AUTOSIZE);
	//imshow("img",img);	


	createTrackbar("spatialRad", winName, &spatialRad, 80, meanShiftSegmentation);
	createTrackbar("colorRad", winName, &colorRad, 60, meanShiftSegmentation);
	createTrackbar("maxPyrLevel", winName, &maxPyrLevel, 5, meanShiftSegmentation);

	meanShiftSegmentation(0, 0);
	//floodFillPostprocess( img, Scalar::all(2) );
	//imshow("img2",img);
	waitKey();
	return 0;
}

其中，opencv中关于Meanshift的源码如下：

/****************************************************************************************\
*                                         Meanshift                                      *
\****************************************************************************************/
 
CV_IMPL void
cvPyrMeanShiftFiltering( const CvArr* srcarr, CvArr* dstarr,
                         double sp0, double sr, int max_level,
                         CvTermCriteria termcrit )
{
    const int cn = 3;
    const int MAX_LEVELS = 8;
 
    if( (unsigned)max_level > (unsigned)MAX_LEVELS )
        CV_Error( CV_StsOutOfRange, "The number of pyramid levels is too large or negative" );
 
    std::vector<cv::Mat> src_pyramid(max_level+1);
    std::vector<cv::Mat> dst_pyramid(max_level+1);
    cv::Mat mask0;
    int i, j, level;
    //uchar* submask = 0;
 
    #define cdiff(ofs0) (tab[c0-dptr[ofs0]+255] + \
        tab[c1-dptr[(ofs0)+1]+255] + tab[c2-dptr[(ofs0)+2]+255] >= isr22) // color diffference  Note： it‘s >= not <
 
    double sr2 = sr * sr;// sr: ||x|| color window radius 
    int isr2 = cvRound(sr2), isr22 = MAX(isr2,16);
    int tab[768]; // 256*,lookup table for fast square distance  computation 
    cv::Mat src0 = cv::cvarrToMat(srcarr);
    cv::Mat dst0 = cv::cvarrToMat(dstarr);
 
    if( src0.type() != CV_8UC3 )
        CV_Error( CV_StsUnsupportedFormat, "Only 8-bit, 3-channel images are supported" );
 
    if( src0.type() != dst0.type() )
        CV_Error( CV_StsUnmatchedFormats, "The input and output images must have the same type" );
 
    if( src0.size() != dst0.size() )
        CV_Error( CV_StsUnmatchedSizes, "The input and output images must have the same size" );
 
    if( !(termcrit.type & CV_TERMCRIT_ITER) )
        termcrit.max_iter = 5; // if we don't have set the number of iterations ,it will iterate 5  times at most .   
    termcrit.max_iter = MAX(termcrit.max_iter,1);
    termcrit.max_iter = MIN(termcrit.max_iter,100); // max iteration is 100
    if( !(termcrit.type & CV_TERMCRIT_EPS) )
        termcrit.epsilon = 1.f; // the default epsilon 
    termcrit.epsilon = MAX(termcrit.epsilon, 0.f);
 
    for( i = 0; i < 768; i++ )
        tab[i] = (i - 255)*(i - 255); //tab[0]=255^2,tab[255]=0,tab[512]=255^2
 
    // 1. construct pyramid
    src_pyramid[0] = src0;
    dst_pyramid[0] = dst0;
    for( level = 1; level <= max_level; level++ )
    {
        src_pyramid[level].create( (src_pyramid[level-1].rows+1)/2,
                        (src_pyramid[level-1].cols+1)/2, src_pyramid[level-1].type() );// plus 1 to ensure both the clos and row are even 
        dst_pyramid[level].create( src_pyramid[level].rows,
                        src_pyramid[level].cols, src_pyramid[level].type() ); //
        cv::pyrDown( src_pyramid[level-1], src_pyramid[level], src_pyramid[level].size() );// first using Gaussian blure then  removing every even-numbered row and column
        //CV_CALL( cvResize( src_pyramid[level-1], src_pyramid[level], CV_INTER_AREA ));
    }
 
    mask0.create(src0.rows, src0.cols, CV_8UC1);  // memory buf for mask of every scale 
    //CV_CALL( submask = (uchar*)cvAlloc( (sp+2)*(sp+2) ));
 
    // 2. apply meanshift, starting from the pyramid top (i.e. the smallest layer)
    for( level = max_level; level >= 0; level-- )
    {
        cv::Mat src = src_pyramid[level]; // the current processing layer 
        cv::Size size = src.size();
        uchar* sptr = src.data; //  
        int sstep = (int)src.step;// all bytee in a row(including the padded pixels )
        uchar* mask = 0;
        int mstep = 0;
        uchar* dptr;
        int dstep;
        float sp = (float)(sp0 / (1 << level)); // spatial window radius,keep the contents which the kernel can cover are identical     
        sp = MAX( sp, 1 );
 
        if( level < max_level ) //except for the top level，先跳过，其实也可以忽略
        {
            cv::Size size1 = dst_pyramid[level+1].size(); // notice that layer level+1 has been  processed 
            cv::Mat m( size.height, size.width, CV_8UC1, mask0.data );  // Note that the memory to which .data point don't have the same size as the m. Howerver,the former will alway large or equal to the later. We just use the mask0 as an big enough container that only allocate one time. 
            dstep = (int)dst_pyramid[level+1].step;//
            dptr = dst_pyramid[level+1].data + dstep + cn; //jump the first row and first cloumn(including 3 channels) 
            mstep = (int)m.step;
            mask = m.data + mstep;//jump the first  row
            //cvResize( dst_pyramid[level+1], dst_pyramid[level], CV_INTER_CUBIC );
            cv::pyrUp( dst_pyramid[level+1], dst_pyramid[level], dst_pyramid[level].size() ); // 这一行有意义吗？完全可以去掉啊？？？？？
			// Note：the image is first upsized with new even rows and cols filled with 0s ,thereafter the missing values is approximated with the Gaussian convolution.
            m.setTo(cv::Scalar::all(0));
 
            for( i = 1; i < size1.height-1; i++, dptr += dstep - (size1.width-2)*3, mask += mstep*2 )
			// dptr + dstep + width*3*2 : jump to the second point of the next row 
			// mstep*2  : 2 row in mask is correspondence to 1 rows in dst_pyramid[level+1]; 
                for( j = 1; j < size1.width-1; j++, dptr += cn )//jump the first and the last column,Notice that before jump to the next row,the dptr have pointed to the last column
                {
                    int c0 = dptr[0], c1 = dptr[1], c2 = dptr[2];
                    mask[j*2 - 1] = cdiff(-3) || cdiff(3) || cdiff(-dstep-3) || cdiff(-dstep) ||
                        cdiff(-dstep+3) || cdiff(dstep-3) || cdiff(dstep) || cdiff(dstep+3);//if any of it's 8 neigbours doesn't similar with it in color sapce,it should be proceed  which labeled with 1
                }
            }
 
            cv::dilate( m, m, cv::Mat() );
            mask = m.data;
        }
 
        dptr = dst_pyramid[level].data;
        dstep = (int)dst_pyramid[level].step;
 
        for( i = 0; i < size.height; i++, sptr += sstep - size.width*3,// jumpping the padding bytes in the ends of each row of src 
                                          dptr += dstep - size.width*3,// jumpping the padding bytes in the ends of each row of src
                                          mask += mstep ) // the offset of mask
        {
            for( j = 0; j < size.width; j++, sptr += 3, dptr += 3 )
            {
                int x0 = j, y0 = i, x1, y1, iter;// x1，y1： the position of mode 
                int c0, c1, c2;
 
                if( mask && !mask[j] ) // 可以忽略，mask !=0：except for the top level, mask[j]==0: similar to all of it's 8 neighbors
                    continue;
 
                c0 = sptr[0], c1 = sptr[1], c2 = sptr[2]; // b,g,r or L,u,v of the central position 
 
                // iterate meanshift procedure，核心部分
                for( iter = 0; iter < termcrit.max_iter; iter++ )
                {
                    uchar* ptr;
                    int x, y, count = 0; // count : count the number of pixels whithin the color support in a square window  
                    int minx, miny, maxx, maxy;
                    int s0 = 0, s1 = 0, s2 = 0, sx = 0, sy = 0;//
                    double icount;
                    int stop_flag;
 
                    //mean shift: process pixels in window (p-sigmaSp)x(p+sigmaSp)
					//a square window of (2*sp+1)*(2*sp+1)
                    minx = cvRound(x0 - sp); minx = MAX(minx, 0);//ensure minx doesn't less than the first column 
                    miny = cvRound(y0 - sp); miny = MAX(miny, 0);//ensure minx doesn't less than the first row
                    maxx = cvRound(x0 + sp); maxx = MIN(maxx, size.width-1);
                    maxy = cvRound(y0 + sp); maxy = MIN(maxy, size.height-1);
                    ptr = sptr + (miny - i)*sstep + (minx - j)*3;// move to (minx,miny)
 
                    for( y = miny; y <= maxy; y++, ptr += sstep - (maxx-minx+1)*3 ) 
                    {
                        int row_count = 0; // count the number of pixels whthin the color support in a row
                        x = minx;
                        #if CV_ENABLE_UNROLLED //展开，先跳过
                        for( ; x + 3 <= maxx; x += 4, ptr += 12 )//process 4 colums every cycle to reduce the cycle times. it will be  faster than loop every column
                        {
                            int t0 = ptr[0], t1 = ptr[1], t2 = ptr[2];
                            if( tab[t0-c0+255] + tab[t1-c1+255] + tab[t2-c2+255] <= isr2 )
                            {
                                s0 += t0; s1 += t1; s2 += t2;
                                sx += x; row_count++;
                            }
                            t0 = ptr[3], t1 = ptr[4], t2 = ptr[5];
                            if( tab[t0-c0+255] + tab[t1-c1+255] + tab[t2-c2+255] <= isr2 )
                            {
                                s0 += t0; s1 += t1; s2 += t2;
                                sx += x+1; row_count++;
                            }
                            t0 = ptr[6], t1 = ptr[7], t2 = ptr[8];
                            if( tab[t0-c0+255] + tab[t1-c1+255] + tab[t2-c2+255] <= isr2 )
                            {
                                s0 += t0; s1 += t1; s2 += t2;
                                sx += x+2; row_count++;
                            }
                            t0 = ptr[9], t1 = ptr[10], t2 = ptr[11];
                            if( tab[t0-c0+255] + tab[t1-c1+255] + tab[t2-c2+255] <= isr2 )
                            {
                                s0 += t0; s1 += t1; s2 += t2;
                                sx += x+3; row_count++;
                            }
                        }
                        #endif
                        for( ; x <= maxx; x++, ptr += 3 ) // if we have defined CV_ENABLE_UNROLLED then processing the remain (maxx+1)%4 cloumns otherwise processing all of columns
                        {
                            int t0 = ptr[0], t1 = ptr[1], t2 = ptr[2]; //b,g,r
                            if( tab[t0-c0+255] + tab[t1-c1+255] + tab[t2-c2+255] <= isr2 )//truncate with isr2 to have finite support(color similarity ) 
                            {  // the value [-255,255] first map to [0 510],then map to the squre distance to central position (i，j)
                                s0 += t0; s1 += t1; s2 += t2;
                                sx += x; 
								row_count++;
                            }
                        }
                        count += row_count;
                        sy += y*row_count;
                    }
 
                    if( count == 0 )
                        break;
 
                    icount = 1./count;
                    x1 = cvRound(sx*icount); // x mean 
                    y1 = cvRound(sy*icount); // Y mean 
                    s0 = cvRound(s0*icount); // b mean 
                    s1 = cvRound(s1*icount); // g mean 
                    s2 = cvRound(s2*icount); // r mean
 
                    stop_flag = (x0 == x1 && y0 == y1) || abs(x1-x0) + abs(y1-y0) + // converge to (i，j)
                        tab[s0 - c0 + 255] + tab[s1 - c1 + 255] +
                        tab[s2 - c2 + 255] <= termcrit.epsilon; //movement can be ignored
 
                    x0 = x1; y0 = y1;
                    c0 = s0; c1 = s1; c2 = s2;// Notice；the center color was replaced by the filtered value  
 
                    if( stop_flag )
                        break;
                }
				
                dptr[0] = (uchar)c0; //assign the filtered value of converging point to the starting point
                dptr[1] = (uchar)c1;
                dptr[2] = (uchar)c2;
            }
        }
    }
}
 
void cv::pyrMeanShiftFiltering( InputArray _src, OutputArray _dst,
                                double sp, double sr, int maxLevel,
                                TermCriteria termcrit )
{
    Mat src = _src.getMat();
 
    if( src.empty() )
        return;
 
    _dst.create( src.size(), src.type() );
    CvMat c_src = src, c_dst = _dst.getMat();
    cvPyrMeanShiftFiltering( &c_src, &c_dst, sp, sr, maxLevel, termcrit );
}

效果图

在这里插入图片描述

伴君

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
mean-shift均值偏移算法

背景Meanshift算法是Fukunaga于1975年提出的，其基本思想是利用概率密度的梯度爬升来寻找局部最优。1995年，YizongCheng针对离x越近的采样点对x周围的统计特性越有效，定义了一族核函数，并根据所有样本点的重要性不足，设定了一个权重系数，扩大了Meanshift的使用范围。原理给定d维空间中的n个样本点xix_ixi(iii = 1,…,nnn),在x点的Meanshift向量的基本形式定义为：其中，ShS_hSh是一个半径为h的高维球区域，k表示n个样本点中由k个点落
复制链接

扫一扫