HOG特征

最新推荐文章于 2022-11-22 09:30:00 发布

听雨听风眠

最新推荐文章于 2022-11-22 09:30:00 发布

阅读量4.9k

点赞数 1

分类专栏： opencv 文章标签： svm hog HOG svm HOG svm行人检测行人检测

本文链接：https://blog.csdn.net/hgz_gs/article/details/51957651

版权

opencv 专栏收录该内容

7 篇文章 1 订阅

订阅专栏

http://blog.csdn.net/hgz_gs/article/details/51957651

原理：http://blog.csdn.net/zouxy09/article/details/7929348

http://www.xuebuyuan.com/582349.html

svm：http://blog.csdn.net/hgz_gs/article/details/51942480

行人检测代码 HOG+svm：http://blog.csdn.net/pb09013037/article/details/41256945

http://blog.csdn.net/qianqing13579/article/details/46509037

行人检测DataSets

MIT数据库

该数据库为较早公开的行人数据库，共924张行人图片（ppm格式，宽高为64x128），肩到脚的距离约80象素。该数据库只含正面和背面两个视角，无负样本，未区分训练集和测试集。Dalal等采用“HOG+SVM”，在该数据库上的检测准确率接近100%。

INRIA数据库

该数据库是目前使用最多的静态行人检测数据库，提供原始图片及相应的标注文件。训练集有正样本614张（包含2416个行人），负样本1218张；测试集有正样本288张（包含1126个行人），负样本453张。图片中人体大部分为站立姿势且高度大于100个象素，部分标注可能不正确。图片主要来源于GRAZ-01、个人照片及google，因此图片的清晰度较高。在XP操作系统下部分训练或者测试图片无法看清楚，但可用OpenCV正常读取和显示。

Daimler行人数据库

该数据库采用车载摄像机获取，分为检测和分类两个数据集。检测数据集的训练样本集有正样本大小为18x36和48x96的图片各15560（3915x4）张，行人的最小高度为72个象素；负样本6744张（大小为640x480或360x288）。测试集为一段27分钟左右的视频（分辨率为640x480），共21790张图片，包含56492个行人。分类数据库有三个训练集和两个测试集，每个数据集有4800张行人图片，5000张非行人图片，大小均为18x36，另外还有3个辅助的非行人图片集，各1200张图片。

Caltech行人数据库

该数据库是目前规模较大的行人数据库，采用车载摄像头拍摄，约10个小时左右，视频的分辨率为640x480，30帧/秒。标注了约250,000帧（约137分钟），350000个矩形框，2300个行人，另外还对矩形框之间的时间对应关系及其遮挡的情况进行标注。数据集分为set00~set10，其中set00~set05为训练集，set06~set10为测试集（标注信息尚未公开）。性能评估方法有以下三种：（1）用外部数据进行训练，在set06~set10进行测试；（2）6-fold交叉验证，选择其中的5个做训练，另外一个做测试，调整参数，最后给出训练集上的性能；（3）用set00~set05训练，set06~set10做测试。由于测试集的标注信息没有公开，需要提交给Pitor Dollar。结果提交方法为每30帧做一个测试，将结果保存在txt文档中（文件的命名方式为I00029.txt I00059.txt ……），每个txt文件中的每行表示检测到一个行人，格式为“[left, top,width, height, score]”。如果没有检测到任何行人，则txt文档为空。该数据库还提供了相应的Matlab工具包，包括视频标注信息的读取、画ROC（Receiver Operatingcharacteristic Curve）曲线图和非极大值抑制等工具。

TUD行人数据库

TUD行人数据库为评估运动信息在行人检测中的作用，提供图像对以便计算光流信息。训练集的正样本为1092对图像（图片大小为720x576，包含1776个行人）；负样本为192对非行人图像（手持摄像机85对，车载摄像机107对）；另外还提供26对车载摄像机拍摄的图像（包含183个行人）作为附加训练集。测试集有508对图像（图像对的时间间隔为1秒，分辨率为640x480），共有1326个行人。Andriluka等也构建了一个数据库用于验证他们提出的检测与跟踪相结合的行人检测技术。该数据集的训练集提供了行人的矩形框信息、分割掩膜及其各部位（脚、小腿、大腿、躯干和头部）的大小和位置信息。测试集为250张图片（包含311个完全可见的行人）用于测试检测器的性能，2个视频序列（TUD-Campus和TUD-Crossing）用于评估跟踪器的性能。

NICTA行人数据库

该数据库是目前规模较大的静态图像行人数据库，25551张含单人的图片，5207张高分辨率非行人图片，数据库中已分好训练集和测试集，方便不同分类器的比较。Overett等用“RealBoost+Haar”评估训练样本的平移、旋转和宽高比等各种因素对分类性能的影响：（1）行人高度至少要大于40个象素；（2）在低分辨率下，对于Haar特征来说，增加样本宽度的性能好于增加样本高度的性能；（3）训练图片的大小要大于行人的实际大小，即背景信息有助于提高性能；（4）对训练样本进行平移提高检测性能，旋转对性能的提高影响不大。以上的结论对于构建行人数据库具有很好的指导意义。

ETH行人数据库

Ess等构建了基于双目视觉的行人数据库用于多人的行人检测与跟踪研究。该数据库采用一对车载的AVT Marlins F033C摄像头进行拍摄，分辨率为640x480，帧率13-14fps，给出标定信息和行人标注信息，深度信息采用置信度传播方法获取。

CVC行人数据库

该数据库目前包含三个数据集（CVC-01、CVC-02和CVC-Virtual），主要用于车辆辅助驾驶中的行人检测研究。CVC-01[Geronimo,2007]有1000个行人样本，6175个非行人样本（来自于图片中公路区域中的非行人图片，不像有的行人数据库非行人样本为天空、沙滩和树木等自然图像）。CVC-02包含三个子数据集（CVC-02-CG、CVC-02-Classification和CVC-02-System），分别针对行人检测的三个不同任务：感兴趣区域的产生、分类和系统性能评估。图像的采集采用Bumblebee2立体彩色视觉系统，分辨率640x480，焦距6mm，对距离摄像头0~50m的行人进行标注，最小的行人图片为12x24。CVC-02-CG主要针对候选区域的产生，有100张彩色图像，包含深度和3D点信息；CVC-02-Classification主要针对行人分类，训练集有1016张正样本，7650张负样本，测试集分为基于切割窗口的分类（570张行人，7500张非行人）和整张图片的检测（250张包含行人的图片，共587个行人）；CVC-02-System主要用于系统的性能评估，包含15个视频序列（4364帧），7983个行人。CVC-Virtual是通过Half-Life 2图像引擎产生的虚拟行人数据集，共包含1678虚拟行人，2048个非行人图片用于测试。

USC行人数据库

该数据库包含三组数据集（USC-A、USC-B和USC-C），以XML格式提供标注信息。USC-A[Wu, 2005]的图片来自于网络，共205张图片，313个站立的行人，行人间不存在相互遮挡，拍摄角度为正面或者背面；USC-B的图片主要来自于CAVIAR视频库，包括各种视角的行人，行人之间有的相互遮挡，共54张图片，271个行人；USC-C有100张图片来自网络的图片，232个行人（多角度），行人之间无相互遮挡。

下面，就说说使用OpenCV 中的HOG+SVM实现行人检测的两种方式

说明：程序运行环境为VS2013+OpenCV3.0

第一种

先说第一种方式，直接上代码:

///HOG+SVM识别方式2///  
void Train()
{
    读入训练样本图片路径和类别///
    //图像路径和类别
    vector<string> imagePath;
    vector<int> imageClass;
    int numberOfLine = 0;
    string buffer;
    ifstream trainingData(string(FILEPATH)+"TrainData.txt");
    unsigned long n;

    while (!trainingData.eof())
    {
        getline(trainingData, buffer);
        if (!buffer.empty())
        {
            ++numberOfLine;
            if (numberOfLine % 2 == 0)
            {
                //读取样本类别
                imageClass.push_back(atoi(buffer.c_str()));
            }
            else
            {
                //读取图像路径
                imagePath.push_back(buffer);
            }
        }
    }

    //关闭文件  
    trainingData.close();


    获取样本的HOG特征///
    //样本特征向量矩阵
    int numberOfSample = numberOfLine / 2;
    Mat featureVectorOfSample(numberOfSample, 3780, CV_32FC1);//矩阵中每行为一个样本

    //样本的类别
    Mat classOfSample(numberOfSample, 1, CV_32SC1);

    Mat convertedImage;
    Mat trainImage;

    // 计算HOG特征
    for (vector<string>::size_type i = 0; i <= imagePath.size() - 1; ++i)
    {
        //读入图片
        Mat src = imread(imagePath[i], -1);
        if (src.empty())
        {
            cout << "can not load the image:" << imagePath[i] << endl;
            continue;
        }
        //cout << "processing:" << imagePath[i] << endl;

        // 归一化
        resize(src, trainImage, Size(64, 128));

        // 提取HOG特征
        HOGDescriptor hog(cvSize(64, 128), cvSize(16, 16), cvSize(8, 8), cvSize(8, 8), 9);
        vector<float> descriptors;
        double time1 = getTickCount();
        hog.compute(trainImage, descriptors);//这里可以设置检测窗口步长，如果图片大小超过64×128，可以设置winStride
        double time2 = getTickCount();
        double elapse_ms = (time2 - time1) * 1000 / getTickFrequency();
        //cout << "HOG dimensions:" << descriptors.size() << endl;
        //cout << "Compute time:" << elapse_ms << endl;


        //保存到特征向量矩阵中
        for (vector<float>::size_type j = 0; j <= descriptors.size() - 1; ++j)
        {
            featureVectorOfSample.at<float>(i, j) = descriptors[j];
        }

        //保存类别到类别矩阵
        //!!注意类别类型一定要是int 类型的
        classOfSample.at<int>(i, 0) = imageClass[i];
    }


    ///使用SVM分类器训练///    
    //设置参数，注意Ptr的使用
    Ptr<SVM> svm = SVM::create();
    svm->setType(SVM::C_SVC);
    svm->setKernel(SVM::LINEAR);//注意必须使用线性SVM进行训练，因为HogDescriptor检测函数只支持线性检测！！！
    svm->setTermCriteria(TermCriteria(CV_TERMCRIT_ITER, 1000, FLT_EPSILON));

    //使用SVM学习         
    svm->train(featureVectorOfSample, ROW_SAMPLE, classOfSample);

    //保存分类器(里面包括了SVM的参数，支持向量,α和rho)
    svm->save(string(FILEPATH) + "Classifier.xml");

    /*
    SVM训练完成后得到的XML文件里面，有一个数组，叫做support vector，还有一个数组，叫做alpha,有一个浮点数，叫做rho;
    将alpha矩阵同support vector相乘，注意，alpha*supportVector,将得到一个行向量，将该向量前面乘以-1。之后，再该行向量的最后添加一个元素rho。
    如此，变得到了一个分类器，利用该分类器，直接替换opencv中行人检测默认的那个分类器（cv::HOGDescriptor::setSVMDetector()），
    */
    //获取支持向量机：矩阵默认是CV_32F
    Mat supportVector = svm->getSupportVectors();//

    //获取alpha和rho
    Mat alpha;//每个支持向量对应的参数α(拉格朗日乘子)，默认alpha是float64的
    Mat svIndex;//支持向量所在的索引
    float rho = svm->getDecisionFunction(0, alpha, svIndex);

    //转换类型:这里一定要注意，需要转换为32的
    Mat alpha2;
    alpha.convertTo(alpha2, CV_32FC1);

    //结果矩阵，两个矩阵相乘
    Mat result(1, 3780, CV_32FC1);
    result = alpha2*supportVector;

    //乘以-1，这里为什么会乘以-1？
    //注意因为svm.predict使用的是alpha*sv*another-rho，如果为负的话则认为是正样本，在HOG的检测函数中，使用rho+alpha*sv*another(another为-1)
    for (int i = 0; i < 3780; ++i)
        result.at<float>(0, i) *= -1;

    //将分类器保存到文件，便于HOG识别
    //这个才是真正的判别函数的参数(ω)，HOG可以直接使用该参数进行识别
    FILE *fp = fopen((string(FILEPATH) + "HOG_SVM.txt").c_str(), "wb");
    for (int i = 0; i<3780; i++)
    {
        fprintf(fp, "%f \n", result.at<float>(0,i));
    }
    fprintf(fp, "%f", rho);

    fclose(fp);

}

// 使用训练好的分类器识别
void Detect()
{
    Mat img;
    FILE* f = 0;
    char _filename[1024];

    // 获取测试图片文件路径
    f = fopen((string(FILEPATH) + "TestData.txt").c_str(), "rt");
    if (!f)
    {
        fprintf(stderr, "ERROR: the specified file could not be loaded\n");
        return;
    }


    //加载训练好的判别函数的参数(注意，与svm->save保存的分类器不同)
    vector<float> detector;
    ifstream fileIn(string(FILEPATH) + "HOG_SVM.txt", ios::in);
    float val = 0.0f;
    while (!fileIn.eof())
    {
        fileIn >> val;
        detector.push_back(val);
    }
    fileIn.close();

    //设置HOG
    HOGDescriptor hog;
    hog.setSVMDetector(detector);// 使用自己训练的分类器
    //hog.setSVMDetector(HOGDescriptor::getDefaultPeopleDetector());//可以直接使用05 CVPR已训练好的分类器,这样就不用Train()这个步骤了
    namedWindow("people detector", 1);

    // 检测图片
    for (;;)
    {
        // 读取文件名
        char* filename = _filename;
        if (f)
        {
            if (!fgets(filename, (int)sizeof(_filename)-2, f))
                break;
            //while(*filename && isspace(*filename))
            //  ++filename;
            if (filename[0] == '#')
                continue;

            //去掉空格
            int l = (int)strlen(filename);
            while (l > 0 && isspace(filename[l - 1]))
                --l;
            filename[l] = '\0';
            img = imread(filename);
        }
        printf("%s:\n", filename);
        if (!img.data)
            continue;

        fflush(stdout);
        vector<Rect> found, found_filtered;
        double t = (double)getTickCount();
        // run the detector with default parameters. to get a higher hit-rate
        // (and more false alarms, respectively), decrease the hitThreshold and
        // groupThreshold (set groupThreshold to 0 to turn off the grouping completely).
        //多尺度检测
        hog.detectMultiScale(img, found, 0, Size(8, 8), Size(32, 32), 1.05, 2);
        t = (double)getTickCount() - t;
        printf("detection time = %gms\n", t*1000. / cv::getTickFrequency());
        size_t i, j;

        //去掉空间中具有内外包含关系的区域，保留大的
        for (i = 0; i < found.size(); i++)
        {
            Rect r = found[i];
            for (j = 0; j < found.size(); j++)
            if (j != i && (r & found[j]) == r)
                break;
            if (j == found.size())
                found_filtered.push_back(r);
        }

        // 适当缩小矩形
        for (i = 0; i < found_filtered.size(); i++)
        {
            Rect r = found_filtered[i];
            // the HOG detector returns slightly larger rectangles than the real objects.
            // so we slightly shrink the rectangles to get a nicer output.
            r.x += cvRound(r.width*0.1);
            r.width = cvRound(r.width*0.8);
            r.y += cvRound(r.height*0.07);
            r.height = cvRound(r.height*0.8);
            rectangle(img, r.tl(), r.br(), cv::Scalar(0, 255, 0), 3);
        }

        imshow("people detector", img);
        int c = waitKey(0) & 255;
        if (c == 'q' || c == 'Q' || !f)
            break;
    }
    if (f)
        fclose(f);
    return;
}

void HOG_SVM2()
{
    //如果使用05 CVPR提供的默认分类器，则不需要Train(),直接使用Detect检测图片
    Train();
    Detect();
}

int main()
{
    //HOG+SVM识别方式1：直接输出类别
    //HOG_SVM1();

    //HOG+SVM识别方式2：输出图片中的存在目标的矩形
    HOG_SVM2();

}

这里我想说明一下TrainData.txt,这个文件放置了所有样本的路径和类别，如下：
这里写图片描述
关
于如何读取正负样本的路径到txt文件，可以使用批处理文件，批处理文件我上传到了CSDN，大家可以去下载
点击下载

正负样本至少保证有1000，不能太少，否则效果就不好了，其中HOG_SVM.txt里面包含了判别函数的参数，这个参数可以直接给HOG用
下面就是我的测试效果：
这里写图片描述

这里写图片描述

检测效果还可以.
测试图片我也上传到网上了
点击下载

当然你也可以不用自己训练分类器，直接使用OpenCV自带的分类器，OpenCV自带的分类器使用的是05年CVPR那篇文章中作者训练好的分类器，下面我们就来看看效果：
这里写图片描述

这里写图片描述

图中可以看出，OpenCV自带的分类器效果要比自己训练的好，主要原因大概有以下几点
1.训练样本不足，我的正负样本才900多
2.正样本图片不够清晰，导致特征提取有比较大的误差

最近有人在运行博客中程序的时候出现了问题，让我看看程序，我也不太清楚什么问题，所以我将整个程序和程序测试数据打包了一下，上传到了CSDN上。
点击下载
解压后，将”Pedestrians64x128”文件夹放置在D盘根目录，然后使用HOG.cpp新建一个工程，直接可以运行。

注：环境是VS2013+OpenCV3.0.0,Release版本

第二种

下面说说第二种方式，第二种方式就是传统的方式了，就是对于测试样本，提取特征，然后使用训练好的分类器进行识别,代码

///HOG+SVM识别方式1///
void HOG_SVM1()
{
    读入训练样本图片路径和类别///
    //图像路径和类别
    vector<string> imagePath;
    vector<int> imageClass;
    int numberOfLine = 0;
    string buffer;
    ifstream trainingData(string(FILEPATH) + "TrainData.txt", ios::in);
    unsigned long n;

    while (!trainingData.eof())
    {
        getline(trainingData, buffer);
        if (!buffer.empty())
        {
            ++numberOfLine;
            if (numberOfLine % 2 == 0)
            {
                //读取样本类别
                imageClass.push_back(atoi(buffer.c_str()));
            }
            else
            {
                //读取图像路径
                imagePath.push_back(buffer);
            }
        }
    }
    trainingData.close();


    获取样本的HOG特征///
    //样本特征向量矩阵
    int numberOfSample = numberOfLine / 2;
    Mat featureVectorOfSample(numberOfSample, 3780, CV_32FC1);//矩阵中每行为一个样本

    //样本的类别
    Mat classOfSample(numberOfSample, 1, CV_32SC1);

    //开始计算训练样本的HOG特征
    for (vector<string>::size_type i = 0; i <= imagePath.size() - 1; ++i)
    {
        //读入图片
        Mat src = imread(imagePath[i], -1);
        if (src.empty())
        {
            cout << "can not load the image:" << imagePath[i] << endl;
            continue;
        }
        cout << "processing" << imagePath[i] << endl;

        //缩放
        Mat trainImage;
        resize(src, trainImage, Size(64, 128));

        //提取HOG特征
        HOGDescriptor hog(Size(64, 128), Size(16, 16), Size(8, 8), Size(8, 8), 9);
        vector<float> descriptors;
        hog.compute(trainImage, descriptors);//这里可以设置检测窗口步长，如果图片大小超过64×128，可以设置winStride
        cout << "HOG dimensions:" << descriptors.size() << endl;

        //保存特征向量矩阵中
        for (vector<float>::size_type j = 0; j <= descriptors.size() - 1; ++j)
        {
            featureVectorOfSample.at<float>(i, j) = descriptors[j];
        }

        //保存类别到类别矩阵
        //!!注意类别类型一定要是int 类型的
        classOfSample.at<int>(i, 0) = imageClass[i];
    }


    ///使用SVM分类器训练///    
    //设置参数
    //参考3.0的Demo
    Ptr<SVM> svm = SVM::create();
    svm->setKernel(SVM::RBF);
    svm->setType(SVM::C_SVC);
    svm->setC(10);
    svm->setCoef0(1.0);
    svm->setP(1.0);
    svm->setNu(0.5);
    svm->setTermCriteria(TermCriteria(CV_TERMCRIT_EPS, 1000, FLT_EPSILON));

    //使用SVM学习         
    svm->train(featureVectorOfSample, ROW_SAMPLE, classOfSample);

    //保存分类器
    svm->save("Classifier.xml");


    ///使用训练好的分类器进行识别///
    vector<string> testImagePath;
    ifstream testData(string(FILEPATH) + "TestData.txt", ios::out);
    while (!testData.eof())
    {
        getline(testData, buffer);
        //读取
        if (!buffer.empty())
            testImagePath.push_back(buffer);

    }
    testData.close();

    ofstream fileOfPredictResult(string(FILEPATH) + "PredictResult.txt"); //最后识别的结果
    for (vector<string>::size_type i = 0; i <= testImagePath.size() - 1; ++i)
    {
        //读取测试图片
        Mat src = imread(testImagePath[i], -1);
        if (src.empty())
        {
            cout << "Can not load the image:" << testImagePath[i] << endl;
            continue;
        }

        //缩放
        Mat testImage;
        resize(src, testImage, Size(64, 64));

        //测试图片提取HOG特征
        HOGDescriptor hog(cvSize(64, 64), cvSize(16, 16), cvSize(8, 8), cvSize(8, 8), 9);
        vector<float> descriptors;
        hog.compute(testImage, descriptors);
        cout << "HOG dimensions:" << descriptors.size() << endl;

        Mat featureVectorOfTestImage(1, descriptors.size(), CV_32FC1);
        for (int j = 0; j <= descriptors.size() - 1; ++j)
        {
            featureVectorOfTestImage.at<float>(0, j) = descriptors[j];
        }

        //对测试图片进行分类并写入文件
        int predictResult = svm->predict(featureVectorOfTestImage);
        char line[512];
        //printf("%s %d\r\n", testImagePath[i].c_str(), predictResult);
        std::sprintf(line, "%s %d\n", testImagePath[i].c_str(), predictResult);
        fileOfPredictResult << line;

    }
    fileOfPredictResult.close();
}

int main()
{
    //HOG+SVM识别方式1：直接输出类别
    HOG_SVM1();

    //HOG+SVM识别方式2：输出图片中的存在目标的矩形
    //HOG_SVM2();

}

</pre><pre name="code" class="html">自举法(Bootstrap)

提高样本训练能力：

<span style="white-space:pre">	</span>1、使用负样本进行行人检测，把误检到的矩形截取出来，作为负样本进行训练。降低误检率。自举法(Bootstrap)

<span style="white-space:pre">	</span>2、使用正样本进行行人检测，在没有检测到的行人中画矩形进行标记，重新进行学习训练。

</pre><pre name="code" class="html">输出准确率：

输出漏检率：

输出误检率

hog源码的注释

在读源码时，由于里面用到了intel的ipp库，优化了算法的速度，所以在程序中遇到#ifdef HAVE_IPP后面的代码时，可以直接跳过不读，直接读#else后面的代码，这并不影响对原hog算法的理解。

首先来看看hog源码中用到的头文件目录图，如下：

　　　　进行移植时很有用

　　　　下面是我对hog源码的一些注释，由于本人接触c++比较少，可能有些c++的语法常识也给注释起来了，还望大家能理解。另外程序中还有一些细节没有读懂，或者说是注释错了的，大家可以一起来讨论下,很多细节要在源码中才能看懂。

hog.cpp:

   1 /*M///
   2 //
   3 //  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
   4 //
   5 //  By downloading, copying, installing or using the software you agree to this license.
   6 //  If you do not agree to this license, do not download, install,
   7 //  copy or use the software.
   8 //
   9 //
  10 //                           License Agreement
  11 //                For Open Source Computer Vision Library
  12 //
  13 // Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
  14 // Copyright (C) 2009, Willow Garage Inc., all rights reserved.
  15 // Third party copyrights are property of their respective owners.
  16 //
  17 // Redistribution and use in source and binary forms, with or without modification,
  18 // are permitted provided that the following conditions are met:
  19 //
  20 //   * Redistribution's of source code must retain the above copyright notice,
  21 //     this list of conditions and the following disclaimer.
  22 //
  23 //   * Redistribution's in binary form must reproduce the above copyright notice,
  24 //     this list of conditions and the following disclaimer in the documentation
  25 //     and/or other materials provided with the distribution.
  26 //
  27 //   * The name of the copyright holders may not be used to endorse or promote products
  28 //     derived from this software without specific prior written permission.
  29 //
  30 // This software is provided by the copyright holders and contributors "as is" and
  31 // any express or implied warranties, including, but not limited to, the implied
  32 // warranties of merchantability and fitness for a particular purpose are disclaimed.
  33 // In no event shall the Intel Corporation or contributors be liable for any direct,
  34 // indirect, incidental, special, exemplary, or consequential damages
  35 // (including, but not limited to, procurement of substitute goods or services;
  36 // loss of use, data, or profits; or business interruption) however caused
  37 // and on any theory of liability, whether in contract, strict liability,
  38 // or tort (including negligence or otherwise) arising in any way out of
  39 // the use of this software, even if advised of the possibility of such damage.
  40 //
  41 //M*/
  42 
  43 #include "precomp.hpp"
  44 #include <iterator>
  45 #ifdef HAVE_IPP
  46 #include "ipp.h"
  47 #endif
  48 /****************************************************************************************\
  49       The code below is implementation of HOG (Histogram-of-Oriented Gradients)
  50       descriptor and object detection, introduced by Navneet Dalal and Bill Triggs.
  51 
  52       The computed feature vectors are compatible with the
  53       INRIA Object Detection and Localization Toolkit
  54       (http://pascal.inrialpes.fr/soft/olt/)
  55 \****************************************************************************************/
  56 
  57 namespace cv
  58 {
  59 
  60 size_t HOGDescriptor::getDescriptorSize() const
  61 {
  62     //下面2个语句是保证block中有整数个cell;保证block在窗口中能移动整数次
  63     CV_Assert(blockSize.width % cellSize.width == 0 &&
  64         blockSize.height % cellSize.height == 0);
  65     CV_Assert((winSize.width - blockSize.width) % blockStride.width == 0 &&
  66         (winSize.height - blockSize.height) % blockStride.height == 0 );
  67     //返回的nbins是每个窗口中检测到的hog向量的维数
  68     return (size_t)nbins*
  69         (blockSize.width/cellSize.width)*
  70         (blockSize.height/cellSize.height)*
  71         ((winSize.width - blockSize.width)/blockStride.width + 1)*
  72         ((winSize.height - blockSize.height)/blockStride.height + 1);
  73 }
  74 
  75 //winSigma到底是什么作用呢？
  76 double HOGDescriptor::getWinSigma() const
  77 {
  78     return winSigma >= 0 ? winSigma : (blockSize.width + blockSize.height)/8.;
  79 }
  80 
  81 //svmDetector是HOGDescriptor内的一个成员变量，数据类型为向量vector。
  82 //用来保存hog特征用于svm分类时的系数的.
  83 //该函数返回为真的实际含义是什么呢？保证与hog特征长度相同，或者相差1，但为什么
  84 //相差1也可以呢？
  85 bool HOGDescriptor::checkDetectorSize() const
  86 {
  87     size_t detectorSize = svmDetector.size(), descriptorSize = getDescriptorSize();
  88     return detectorSize == 0 ||
  89         detectorSize == descriptorSize ||
  90         detectorSize == descriptorSize + 1;
  91 }
  92 
  93 void HOGDescriptor::setSVMDetector(InputArray _svmDetector)
  94 {  
  95     //这里的convertTo函数只是将图像Mat属性更改，比如说通道数，矩阵深度等。
  96     //这里是将输入的svm系数矩阵全部转换成浮点型。
  97     _svmDetector.getMat().convertTo(svmDetector, CV_32F);
  98     CV_Assert( checkDetectorSize() );
  99 }
 100 
 101 #define CV_TYPE_NAME_HOG_DESCRIPTOR "opencv-object-detector-hog"
 102 
 103 //FileNode是opencv的core中的一个文件存储节点类，这个节点用来存储读取到的每一个文件元素。
 104 //一般是读取XML和YAML格式的文件
 105 //又因为该函数是把文件节点中的内容读取到其类的成员变量中，所以函数后面不能有关键字const
 106 bool HOGDescriptor::read(FileNode& obj)
 107 {
 108     //isMap()是用来判断这个节点是不是一个映射类型，如果是映射类型，则每个节点都与
 109     //一个名字对应起来。因此这里的if语句的作用就是需读取的文件node是一个映射类型
 110     if( !obj.isMap() )
 111         return false;
 112     //中括号中的"winSize"是指返回名为winSize的一个节点，因为已经知道这些节点是mapping类型
 113     //也就是说都有一个对应的名字。
 114     FileNodeIterator it = obj["winSize"].begin();
 115     //操作符>>为从节点中读入数据，这里是将it指向的节点数据依次读入winSize.width,winSize.height
 116     //下面的几条语句功能类似
 117     it >> winSize.width >> winSize.height;
 118     it = obj["blockSize"].begin();
 119     it >> blockSize.width >> blockSize.height;
 120     it = obj["blockStride"].begin();
 121     it >> blockStride.width >> blockStride.height;
 122     it = obj["cellSize"].begin();
 123     it >> cellSize.width >> cellSize.height;
 124     obj["nbins"] >> nbins;
 125     obj["derivAperture"] >> derivAperture;
 126     obj["winSigma"] >> winSigma;
 127     obj["histogramNormType"] >> histogramNormType;
 128     obj["L2HysThreshold"] >> L2HysThreshold;
 129     obj["gammaCorrection"] >> gammaCorrection;
 130     obj["nlevels"] >> nlevels;
 131     
 132     //isSeq()是判断该节点内容是不是一个序列
 133     FileNode vecNode = obj["SVMDetector"];
 134     if( vecNode.isSeq() )
 135     {
 136         vecNode >> svmDetector;
 137         CV_Assert(checkDetectorSize());
 138     }
 139     //上面的都读取完了后就返回读取成功标志
 140     return true;
 141 }
 142     
 143 void HOGDescriptor::write(FileStorage& fs, const String& objName) const
 144 {
 145     //将objName名字输入到文件fs中
 146     if( !objName.empty() )
 147         fs << objName;
 148 
 149     fs << "{" CV_TYPE_NAME_HOG_DESCRIPTOR
 150     //下面几句依次将hog描述子内的变量输入到文件fs中，且每次输入前都输入
 151     //一个名字与其对应，因此这些节点是mapping类型。
 152     << "winSize" << winSize
 153     << "blockSize" << blockSize
 154     << "blockStride" << blockStride
 155     << "cellSize" << cellSize
 156     << "nbins" << nbins
 157     << "derivAperture" << derivAperture
 158     << "winSigma" << getWinSigma()
 159     << "histogramNormType" << histogramNormType
 160     << "L2HysThreshold" << L2HysThreshold
 161     << "gammaCorrection" << gammaCorrection
 162     << "nlevels" << nlevels;
 163     if( !svmDetector.empty() )
 164         //svmDetector则是直接输入序列，也有对应的名字。
 165         fs << "SVMDetector" << "[:" << svmDetector << "]";
 166     fs << "}";
 167 }
 168 
 169 //从给定的文件中读取参数
 170 bool HOGDescriptor::load(const String& filename, const String& objname)
 171 {
 172     FileStorage fs(filename, FileStorage::READ);
 173     //一个文件节点有很多叶子，所以一个文件节点包含了很多内容，这里当然是包含的
 174     //HOGDescriptor需要的各种参数了。
 175     FileNode obj = !objname.empty() ? fs[objname] : fs.getFirstTopLevelNode();
 176     return read(obj);
 177 }
 178 
 179 //将类中的参数以文件节点的形式写入文件中。
 180 void HOGDescriptor::save(const String& filename, const String& objName) const
 181 {
 182     FileStorage fs(filename, FileStorage::WRITE);
 183     write(fs, !objName.empty() ? objName : FileStorage::getDefaultObjectName(filename));
 184 }
 185 
 186 //复制HOG描述子到c中
 187 void HOGDescriptor::copyTo(HOGDescriptor& c) const
 188 {
 189     c.winSize = winSize;
 190     c.blockSize = blockSize;
 191     c.blockStride = blockStride;
 192     c.cellSize = cellSize;
 193     c.nbins = nbins;
 194     c.derivAperture = derivAperture;
 195     c.winSigma = winSigma;
 196     c.histogramNormType = histogramNormType;
 197     c.L2HysThreshold = L2HysThreshold;
 198     c.gammaCorrection = gammaCorrection;
 199     //vector类型也可以用等号赋值
 200     c.svmDetector = svmDetector; c.nlevels = nlevels; } 
 201 
 202 //计算图像img的梯度幅度图像grad和梯度方向图像qangle.
 203 //paddingTL为需要在原图像img左上角扩增的尺寸，同理paddingBR
 204 //为需要在img图像右下角扩增的尺寸。
 205 void HOGDescriptor::computeGradient(const Mat& img, Mat& grad, Mat& qangle,
 206                                     Size paddingTL, Size paddingBR) const
 207 {
 208     //该函数只能计算8位整型深度的单通道或者3通道图像.
 209     CV_Assert( img.type() == CV_8U || img.type() == CV_8UC3 );
 210 
 211     //将图像按照输入参数进行扩充,这里不是为了计算边缘梯度而做的扩充，因为
 212     //为了边缘梯度而扩充是在后面的代码完成的，所以这里为什么扩充暂时还不明白。
 213     Size gradsize(img.cols + paddingTL.width + paddingBR.width,
 214                   img.rows + paddingTL.height + paddingBR.height);
 215     grad.create(gradsize, CV_32FC2);  // <magnitude*(1-alpha), magnitude*alpha>
 216     qangle.create(gradsize, CV_8UC2); // [0..nbins-1] - quantized gradient orientation
 217     Size wholeSize;
 218     Point roiofs;
 219     //locateROI在此处是如果img图像是从其它父图像中某一部分得来的，那么其父图像
 220     //的大小尺寸就为wholeSize了，img图像左上角相对于父图像的位置点就为roiofs了。
 221     //对于正样本，其父图像就是img了，所以这里的wholeSize就和img.size()是一样的，
 222     //对应负样本，这2者不同；因为里面的关系比较不好懂，这里权且将wholesSize理解为
 223     //img的size，所以roiofs就应当理解为Point(0, 0)了。
 224     img.locateROI(wholeSize, roiofs);
 225 
 226     int i, x, y;
 227     int cn = img.channels();
 228 
 229     //_lut为行向量，用来作为浮点像素值的存储查找表
 230     Mat_<float> _lut(1, 256);
 231     const float* lut = &_lut(0,0);
 232 
 233     //gamma校正指的是将0～256的像素值全部开根号，即范围缩小了，且变换范围都不成线性了，
 234     if( gammaCorrection )
 235         for( i = 0; i < 256; i++ )
 236             _lut(0,i) = std::sqrt((float)i);
 237     else
 238         for( i = 0; i < 256; i++ )
 239             _lut(0,i) = (float)i;
 240 
 241     //创建长度为gradsize.width+gradsize.height+4的整型buffer
 242     AutoBuffer<int> mapbuf(gradsize.width + gradsize.height + 4);
 243     int* xmap = (int*)mapbuf + 1;
 244     int* ymap = xmap + gradsize.width + 2; 
 245 
 246     //言外之意思borderType就等于4了，因为opencv的源码中是如下定义的。
 247     //#define IPL_BORDER_REFLECT_101    4
 248     //enum{...,BORDER_REFLECT_101=IPL_BORDER_REFLECT_101,...}
 249     //borderType为边界扩充后所填充像素点的方式。   
 250     /*
 251     Various border types, image boundaries are denoted with '|'
 252 
 253     * BORDER_REPLICATE:     aaaaaa|abcdefgh|hhhhhhh
 254     * BORDER_REFLECT:       fedcba|abcdefgh|hgfedcb
 255     * BORDER_REFLECT_101:   gfedcb|abcdefgh|gfedcba
 256     * BORDER_WRAP:          cdefgh|abcdefgh|abcdefg        
 257     * BORDER_CONSTANT:      iiiiii|abcdefgh|iiiiiii  with some specified 'i'
 258    */
 259     const int borderType = (int)BORDER_REFLECT_101;
 260 
 261     for( x = -1; x < gradsize.width + 1; x++ )
 262     /*int borderInterpolate(int p, int len, int borderType)
 263       其中参数p表示的是扩充后图像的一个坐标，相对于对应的坐标轴而言；
 264       len参数表示对应源图像的一个坐标轴的长度；borderType为扩充类型，
 265       在上面已经有过介绍.
 266       所以这个函数的作用是从扩充后的像素点坐标推断出源图像中对应该点
 267       的坐标值。
 268    */
 269     //这里的xmap和ymap实际含义是什么呢？其实xmap向量里面存的就是
 270     //扩充后图像第一行像素点对应与原图像img中的像素横坐标，可以看
 271         //出，xmap向量中有些元素的值是相同的，因为扩充图像肯定会对应
 272         //到原图像img中的某一位置，而img本身尺寸内的像素也会对应该位置。
 273         //同理，ymap向量里面存的是扩充后图像第一列像素点对应于原图想img
 274         //中的像素纵坐标。
 275         xmap[x] = borderInterpolate(x - paddingTL.width + roiofs.x,
 276                         wholeSize.width, borderType) - roiofs.x;
 277     for( y = -1; y < gradsize.height + 1; y++ )
 278         ymap[y] = borderInterpolate(y - paddingTL.height + roiofs.y,
 279                         wholeSize.height, borderType) - roiofs.y;
 280 
 281     // x- & y- derivatives for the whole row
 282     int width = gradsize.width;
 283     AutoBuffer<float> _dbuf(width*4);
 284     float* dbuf = _dbuf;
 285     //DX为水平梯度图，DY为垂直梯度图，Mag为梯度幅度图，Angle为梯度角度图
 286     //该构造方法的第4个参数表示矩阵Mat的数据在内存中存放的位置。由此可以
 287     //看出，这4幅图像在内存中是连续存储的。
 288     Mat Dx(1, width, CV_32F, dbuf);
 289     Mat Dy(1, width, CV_32F, dbuf + width);
 290     Mat Mag(1, width, CV_32F, dbuf + width*2);
 291     Mat Angle(1, width, CV_32F, dbuf + width*3);
 292 
 293     int _nbins = nbins;
 294     //angleScale==9/pi;
 295     float angleScale = (float)(_nbins/CV_PI);
 296 #ifdef HAVE_IPP
 297     Mat lutimg(img.rows,img.cols,CV_MAKETYPE(CV_32F,cn));
 298     Mat hidxs(1, width, CV_32F);
 299     Ipp32f* pHidxs  = (Ipp32f*)hidxs.data;
 300     Ipp32f* pAngles = (Ipp32f*)Angle.data;
 301 
 302     IppiSize roiSize;
 303     roiSize.width = img.cols;
 304     roiSize.height = img.rows;
 305 
 306     for( y = 0; y < roiSize.height; y++ )
 307     {
 308        const uchar* imgPtr = img.data + y*img.step;
 309        float* imglutPtr = (float*)(lutimg.data + y*lutimg.step);
 310 
 311        for( x = 0; x < roiSize.width*cn; x++ )
 312        {
 313           imglutPtr[x] = lut[imgPtr[x]];
 314        }
 315     }
 316 
 317 #endif
 318     for( y = 0; y < gradsize.height; y++ )
 319     {
 320 #ifdef HAVE_IPP
 321         const float* imgPtr  = (float*)(lutimg.data + lutimg.step*ymap[y]);
 322         const float* prevPtr = (float*)(lutimg.data + lutimg.step*ymap[y-1]);
 323         const float* nextPtr = (float*)(lutimg.data + lutimg.step*ymap[y+1]);
 324 #else
 325     //imgPtr在这里指的是img图像的第y行首地址；prePtr指的是img第y-1行首地址；
 326     //nextPtr指的是img第y+1行首地址；
 327         const uchar* imgPtr  = img.data + img.step*ymap[y];
 328         const uchar* prevPtr = img.data + img.step*ymap[y-1];
 329         const uchar* nextPtr = img.data + img.step*ymap[y+1];
 330 #endif
 331         float* gradPtr = (float*)grad.ptr(y);
 332         uchar* qanglePtr = (uchar*)qangle.ptr(y);
 333     
 334     //输入图像img为单通道图像时的计算
 335         if( cn == 1 )
 336         {
 337             for( x = 0; x < width; x++ )
 338             {
 339                 int x1 = xmap[x];
 340 #ifdef HAVE_IPP
 341                 dbuf[x] = (float)(imgPtr[xmap[x+1]] - imgPtr[xmap[x-1]]);
 342                 dbuf[width + x] = (float)(nextPtr[x1] - prevPtr[x1]);
 343 #else
 344         //下面2句把Dx，Dy就计算出来了，因为其对应的内存都在dbuf中
 345                 dbuf[x] = (float)(lut[imgPtr[xmap[x+1]]] - lut[imgPtr[xmap[x-1]]]);
 346                 dbuf[width + x] = (float)(lut[nextPtr[x1]] - lut[prevPtr[x1]]);
 347 #endif
 348             }
 349         }
 350     //当cn==3时，也就是输入图像为3通道图像时的处理。
 351         else
 352         {
 353             for( x = 0; x < width; x++ )
 354             {
 355         //x1表示第y行第x1列的地址
 356                 int x1 = xmap[x]*3;
 357                 float dx0, dy0, dx, dy, mag0, mag;
 358 #ifdef HAVE_IPP
 359                 const float* p2 = imgPtr + xmap[x+1]*3;
 360                 const float* p0 = imgPtr + xmap[x-1]*3;
 361 
 362                 dx0 = p2[2] - p0[2];
 363                 dy0 = nextPtr[x1+2] - prevPtr[x1+2];
 364                 mag0 = dx0*dx0 + dy0*dy0;
 365 
 366                 dx = p2[1] - p0[1];
 367                 dy = nextPtr[x1+1] - prevPtr[x1+1];
 368                 mag = dx*dx + dy*dy;
 369 
 370                 if( mag0 < mag )
 371                 {
 372                     dx0 = dx;
 373                     dy0 = dy;
 374                     mag0 = mag;
 375                 }
 376 
 377                 dx = p2[0] - p0[0];
 378                 dy = nextPtr[x1] - prevPtr[x1];
 379                 mag = dx*dx + dy*dy;
 380 #else
 381         //p2为第y行第x+1列的地址
 382         //p0为第y行第x-1列的地址
 383                 const uchar* p2 = imgPtr + xmap[x+1]*3;
 384                 const uchar* p0 = imgPtr + xmap[x-1]*3;
 385         
 386         //计算第2通道的幅值
 387                 dx0 = lut[p2[2]] - lut[p0[2]];
 388                 dy0 = lut[nextPtr[x1+2]] - lut[prevPtr[x1+2]];
 389                 mag0 = dx0*dx0 + dy0*dy0;
 390 
 391         //计算第1通道的幅值
 392                 dx = lut[p2[1]] - lut[p0[1]];
 393                 dy = lut[nextPtr[x1+1]] - lut[prevPtr[x1+1]];
 394                 mag = dx*dx + dy*dy;
 395 
 396         //取幅值最大的那个通道
 397                 if( mag0 < mag )
 398                 {
 399                     dx0 = dx;
 400                     dy0 = dy;
 401                     mag0 = mag;
 402                 }
 403 
 404         //计算第0通道的幅值
 405                 dx = lut[p2[0]] - lut[p0[0]];
 406                 dy = lut[nextPtr[x1]] - lut[prevPtr[x1]];
 407                 mag = dx*dx + dy*dy;
 408  #endif
 409         //取幅值最大的那个通道
 410                 if( mag0 < mag )
 411                 {
 412                     dx0 = dx;
 413                     dy0 = dy;
 414                     mag0 = mag;
 415                 }
 416 
 417                 //最后求出水平和垂直方向上的梯度图像
 418         dbuf[x] = dx0;
 419                 dbuf[x+width] = dy0;
 420             }
 421         }
 422 #ifdef HAVE_IPP
 423         ippsCartToPolar_32f((const Ipp32f*)Dx.data, (const Ipp32f*)Dy.data, (Ipp32f*)Mag.data, pAngles, width);
 424         for( x = 0; x < width; x++ )
 425         {
 426            if(pAngles[x] < 0.f)
 427              pAngles[x] += (Ipp32f)(CV_PI*2.);
 428         }
 429 
 430         ippsNormalize_32f(pAngles, pAngles, width, 0.5f/angleScale, 1.f/angleScale);
 431         ippsFloor_32f(pAngles,(Ipp32f*)hidxs.data,width);
 432         ippsSub_32f_I((Ipp32f*)hidxs.data,pAngles,width);
 433         ippsMul_32f_I((Ipp32f*)Mag.data,pAngles,width);
 434 
 435         ippsSub_32f_I(pAngles,(Ipp32f*)Mag.data,width);
 436         ippsRealToCplx_32f((Ipp32f*)Mag.data,pAngles,(Ipp32fc*)gradPtr,width);
 437 #else
 438     //cartToPolar()函数是计算2个矩阵对应元素的幅度和角度，最后一个参数为是否
 439     //角度使用度数表示，这里为false表示不用度数表示，即用弧度表示。
 440     //如果只需计算2个矩阵对应元素的幅度图像，可以采用magnitude()函数。
 441     //-pi/2<Angle<pi/2;
 442         cartToPolar( Dx, Dy, Mag, Angle, false );
 443 #endif
 444         for( x = 0; x < width; x++ )
 445         {
 446 #ifdef HAVE_IPP
 447             int hidx = (int)pHidxs[x];
 448 #else
 449         //-5<angle<4
 450             float mag = dbuf[x+width*2], angle = dbuf[x+width*3]*angleScale - 0.5f;
 451             //cvFloor()返回不大于参数的最大整数
 452         //hidx={-5,-4,-3,-2,-1,0,1,2,3,4};
 453             int hidx = cvFloor(angle);
 454             //0<=angle<1;angle表示的意思是与其相邻的较小的那个bin的弧度距离(即弧度差)
 455             angle -= hidx;
 456             //gradPtr为grad图像的指针
 457         //gradPtr[x*2]表示的是与x处梯度方向相邻较小的那个bin的幅度权重；
 458         //gradPtr[x*2+1]表示的是与x处梯度方向相邻较大的那个bin的幅度权重
 459         gradPtr[x*2] = mag*(1.f - angle);
 460             gradPtr[x*2+1] = mag*angle;
 461 #endif
 462             if( hidx < 0 )
 463                 hidx += _nbins;
 464             else if( hidx >= _nbins )
 465                 hidx -= _nbins;
 466             assert( (unsigned)hidx < (unsigned)_nbins );
 467 
 468             qanglePtr[x*2] = (uchar)hidx;
 469             hidx++;
 470             //-1在补码中的表示为11111111,与-1相与的话就是自己本身了；
 471         //0在补码中的表示为00000000,与0相与的结果就是0了.
 472             hidx &= hidx < _nbins ? -1 : 0;
 473             qanglePtr[x*2+1] = (uchar)hidx;
 474         }
 475     }
 476 }
 477 
 478 
 479 struct HOGCache
 480 {
 481     struct BlockData
 482     {
 483         BlockData() : histOfs(0), imgOffset() {}
 484         int histOfs;
 485         Point imgOffset;
 486     };
 487 
 488     struct PixData
 489     {
 490         size_t gradOfs, qangleOfs;
 491         int histOfs[4];
 492         float histWeights[4];
 493         float gradWeight;
 494     };
 495 
 496     HOGCache();
 497     HOGCache(const HOGDescriptor* descriptor,
 498         const Mat& img, Size paddingTL, Size paddingBR,
 499         bool useCache, Size cacheStride);
 500     virtual ~HOGCache() {};
 501     virtual void init(const HOGDescriptor* descriptor,
 502         const Mat& img, Size paddingTL, Size paddingBR,
 503         bool useCache, Size cacheStride);
 504 
 505     Size windowsInImage(Size imageSize, Size winStride) const;
 506     Rect getWindow(Size imageSize, Size winStride, int idx) const;
 507 
 508     const float* getBlock(Point pt, float* buf);
 509     virtual void normalizeBlockHistogram(float* histogram) const;
 510 
 511     vector<PixData> pixData;
 512     vector<BlockData> blockData;
 513 
 514     bool useCache;
 515     vector<int> ymaxCached;
 516     Size winSize, cacheStride;
 517     Size nblocks, ncells;
 518     int blockHistogramSize;
 519     int count1, count2, count4;
 520     Point imgoffset;
 521     Mat_<float> blockCache;
 522     Mat_<uchar> blockCacheFlags;
 523 
 524     Mat grad, qangle;
 525     const HOGDescriptor* descriptor;
 526 };
 527 
 528 //默认的构造函数,不使用cache,块的直方图向量大小为0等
 529 HOGCache::HOGCache()
 530 {
 531     useCache = false;
 532     blockHistogramSize = count1 = count2 = count4 = 0;
 533     descriptor = 0;
 534 }
 535 
 536 //带参的初始化函数，采用内部的init函数进行初始化
 537 HOGCache::HOGCache(const HOGDescriptor* _descriptor,
 538         const Mat& _img, Size _paddingTL, Size _paddingBR,
 539         bool _useCache, Size _cacheStride)
 540 {
 541     init(_descriptor, _img, _paddingTL, _paddingBR, _useCache, _cacheStride);
 542 }
 543 
 544 //HOGCache结构体的初始化函数
 545 void HOGCache::init(const HOGDescriptor* _descriptor,
 546         const Mat& _img, Size _paddingTL, Size _paddingBR,
 547         bool _useCache, Size _cacheStride)
 548 {
 549     descriptor = _descriptor;
 550     cacheStride = _cacheStride;
 551     useCache = _useCache;
 552 
 553     //首先调用computeGradient()函数计算输入图像的权值梯度幅度图和角度量化图
 554     descriptor->computeGradient(_img, grad, qangle, _paddingTL, _paddingBR);
 555     //imgoffset是Point类型，而_paddingTL是Size类型，虽然类型不同，但是2者都是
 556     //一个二维坐标，所以是在opencv中是允许直接赋值的。
 557     imgoffset = _paddingTL;
 558 
 559     winSize = descriptor->winSize;
 560     Size blockSize = descriptor->blockSize;
 561     Size blockStride = descriptor->blockStride;
 562     Size cellSize = descriptor->cellSize;
 563     int i, j, nbins = descriptor->nbins;
 564     //rawBlockSize为block中包含像素点的个数
 565     int rawBlockSize = blockSize.width*blockSize.height;
 566     
 567     //nblocks为Size类型，其长和宽分别表示一个窗口中水平方向和垂直方向上block的
 568     //个数(需要考虑block在窗口中的移动)
 569     nblocks = Size((winSize.width - blockSize.width)/blockStride.width + 1,
 570                    (winSize.height - blockSize.height)/blockStride.height + 1);
 571     //ncells也是Size类型，其长和宽分别表示一个block中水平方向和垂直方向容纳下
 572     //的cell个数
 573     ncells = Size(blockSize.width/cellSize.width, blockSize.height/cellSize.height);
 574     //blockHistogramSize表示一个block中贡献给hog描述子向量的长度
 575     blockHistogramSize = ncells.width*ncells.height*nbins;
 576 
 577     if( useCache )
 578     {
 579         //cacheStride= _cacheStride,即其大小是由参数传入的,表示的是窗口移动的大小
 580         //cacheSize长和宽表示扩充后的图像cache中，block在水平方向和垂直方向出现的个数
 581         Size cacheSize((grad.cols - blockSize.width)/cacheStride.width+1,
 582                        (winSize.height/cacheStride.height)+1);
 583         //blockCache为一个float型的Mat，注意其列数的值
 584         blockCache.create(cacheSize.height, cacheSize.width*blockHistogramSize);
 585         //blockCacheFlags为一个uchar型的Mat
 586         blockCacheFlags.create(cacheSize);
 587         size_t cacheRows = blockCache.rows;
 588         //ymaxCached为vector<int>类型
 589         //Mat::resize()为矩阵的一个方法，只是改变矩阵的行数，与单独的resize()函数不相同。
 590         ymaxCached.resize(cacheRows);
 591         //ymaxCached向量内部全部初始化为-1
 592         for(size_t ii = 0; ii < cacheRows; ii++ )
 593             ymaxCached[ii] = -1;
 594     }
 595     
 596     //weights为一个尺寸为blockSize的二维高斯表,下面的代码就是计算二维高斯的系数
 597     Mat_<float> weights(blockSize);
 598     float sigma = (float)descriptor->getWinSigma();
 599     float scale = 1.f/(sigma*sigma*2);
 600 
 601     for(i = 0; i < blockSize.height; i++)
 602         for(j = 0; j < blockSize.width; j++)
 603         {
 604             float di = i - blockSize.height*0.5f;
 605             float dj = j - blockSize.width*0.5f;
 606             weights(i,j) = std::exp(-(di*di + dj*dj)*scale);
 607         }
 608 
 609     //vector<BlockData> blockData;而BlockData为HOGCache的一个结构体成员
 610     //nblocks.width*nblocks.height表示一个检测窗口中block的个数，
 611     //而cacheSize.width*cacheSize.heigh表示一个已经扩充的图片中的block的个数
 612     blockData.resize(nblocks.width*nblocks.height);
 613     //vector<PixData> pixData;同理，Pixdata也为HOGCache中的一个结构体成员
 614     //rawBlockSize表示每个block中像素点的个数
 615     //resize表示将其转换成列向量
 616     pixData.resize(rawBlockSize*3);
 617 
 618     // Initialize 2 lookup tables, pixData & blockData.
 619     // Here is why:
 620     //
 621     // The detection algorithm runs in 4 nested loops (at each pyramid layer):
 622     //  loop over the windows within the input image
 623     //    loop over the blocks within each window
 624     //      loop over the cells within each block
 625     //        loop over the pixels in each cell
 626     //
 627     // As each of the loops runs over a 2-dimensional array,
 628     // we could get 8(!) nested loops in total, which is very-very slow.
 629     //
 630     // To speed the things up, we do the following:
 631     //   1. loop over windows is unrolled in the HOGDescriptor::{compute|detect} methods;
 632     //         inside we compute the current search window using getWindow() method.
 633     //         Yes, it involves some overhead (function call + couple of divisions),
 634     //         but it's tiny in fact.
 635     //   2. loop over the blocks is also unrolled. Inside we use pre-computed blockData[j]
 636     //         to set up gradient and histogram pointers.
 637     //   3. loops over cells and pixels in each cell are merged
 638     //       (since there is no overlap between cells, each pixel in the block is processed once)
 639     //      and also unrolled. Inside we use PixData[k] to access the gradient values and
 640     //      update the histogram
 641     //count1,count2,count4分别表示block中同时对1个cell，2个cell，4个cell有贡献的像素点的个数。
 642     count1 = count2 = count4 = 0;
 643     for( j = 0; j < blockSize.width; j++ )
 644         for( i = 0; i < blockSize.height; i++ )
 645         {
 646             PixData* data = 0;
 647             //cellX和cellY表示的是block内该像素点所在的cell横坐标和纵坐标索引，以小数的形式存在。
 648             float cellX = (j+0.5f)/cellSize.width - 0.5f;
 649             float cellY = (i+0.5f)/cellSize.height - 0.5f;
 650             //cvRound返回最接近参数的整数;cvFloor返回不大于参数的整数;cvCeil返回不小于参数的整数
 651             //icellX0和icellY0表示所在cell坐标索引，索引值为该像素点相邻cell的那个较小的cell索引
 652             //当然此处就是由整数的形式存在了。
 653             //按照默认的系数的话，icellX0和icellY0只可能取值-1,0,1,且当i和j<3.5时对应的值才取-1
 654             //当i和j>11.5时取值为1，其它时刻取值为0(注意i，j最大是15，从0开始的)
 655             int icellX0 = cvFloor(cellX);
 656             int icellY0 = cvFloor(cellY);
 657             int icellX1 = icellX0 + 1, icellY1 = icellY0 + 1;
 658             //此处的cellx和celly表示的是真实索引值与最近邻cell索引值之间的差，
 659             //为后面计算同一像素对不同cell中的hist权重的计算。
 660             cellX -= icellX0;
 661             cellY -= icellY0;
 662       
 663                //满足这个if条件说明icellX0只能为0,也就是说block横坐标在(3.5,11.5)之间时
 664             if( (unsigned)icellX0 < (unsigned)ncells.width &&
 665                 (unsigned)icellX1 < (unsigned)ncells.width )
 666             {
 667                //满足这个if条件说明icellY0只能为0,也就是说block纵坐标在(3.5,11.5)之间时
 668                 if( (unsigned)icellY0 < (unsigned)ncells.height &&
 669                     (unsigned)icellY1 < (unsigned)ncells.height )
 670                 {
 671                     //同时满足上面2个if语句的像素对4个cell都有权值贡献
 672                     //rawBlockSize表示的是1个block中存储像素点的个数
 673                     //而pixData的尺寸大小为block中像素点的3倍，其定义如下：
 674                     //pixData.resize(rawBlockSize*3);
 675                     //pixData的前面block像素大小的内存为存储只对block中一个cell
 676                     //有贡献的pixel；中间block像素大小的内存存储对block中同时2个
 677                     //cell有贡献的pixel；最后面的为对block中同时4个cell都有贡献
 678                     //的pixel
 679                     data = &pixData[rawBlockSize*2 + (count4++)];
 680                     //下面计算出的结果为0
 681                     data->histOfs[0] = (icellX0*ncells.height + icellY0)*nbins;
 682                      //为该像素点对cell0的权重
 683                     data->histWeights[0] = (1.f - cellX)*(1.f - cellY);
 684                     //下面计算出的结果为18
 685                     data->histOfs[1] = (icellX1*ncells.height + icellY0)*nbins;
 686                     data->histWeights[1] = cellX*(1.f - cellY);
 687                     //下面计算出的结果为9
 688                     data->histOfs[2] = (icellX0*ncells.height + icellY1)*nbins;
 689                     data->histWeights[2] = (1.f - cellX)*cellY;
 690                     //下面计算出的结果为27
 691                     data->histOfs[3] = (icellX1*ncells.height + icellY1)*nbins;
 692                     data->histWeights[3] = cellX*cellY;
 693                 }
 694                 else
 695                    //满足这个else条件说明icellY0取-1或者1,也就是说block纵坐标在(0, 3.5)
 696                 //和(11.5, 15)之间.
 697                 //此时的像素点对相邻的2个cell有权重贡献
 698                 {
 699                     data = &pixData[rawBlockSize + (count2++)];                    
 700                     if( (unsigned)icellY0 < (unsigned)ncells.height )
 701                     {
 702                         //(unsigned)-1等于127>2，所以此处满足if条件时icellY0==1；
 703                         //icellY1==1;
 704                         icellY1 = icellY0;
 705                         cellY = 1.f - cellY;
 706                     }
 707                     //不满足if条件时，icellY0==-1;icellY1==0;
 708                     //当然了，这2种情况下icellX0==0;icellX1==1;
 709                     data->histOfs[0] = (icellX0*ncells.height + icellY1)*nbins;
 710                     data->histWeights[0] = (1.f - cellX)*cellY;
 711                     data->histOfs[1] = (icellX1*ncells.height + icellY1)*nbins;
 712                     data->histWeights[1] = cellX*cellY;
 713                     data->histOfs[2] = data->histOfs[3] = 0;
 714                     data->histWeights[2] = data->histWeights[3] = 0;
 715                 }
 716             }
 717             //当block中横坐标满足在(0, 3.5)和(11.5, 15)范围内时，即
 718             //icellX0==-1或==1
 719             else
 720             {
 721                 
 722                 if( (unsigned)icellX0 < (unsigned)ncells.width )
 723                 {
 724                     //icellX1=icllX0=1;
 725                     icellX1 = icellX0;
 726                     cellX = 1.f - cellX;
 727                 }
 728                 //当icllY0=0时，此时对2个cell有贡献
 729                 if( (unsigned)icellY0 < (unsigned)ncells.height &&
 730                     (unsigned)icellY1 < (unsigned)ncells.height )
 731                 {                    
 732                     data = &pixData[rawBlockSize + (count2++)];
 733                     data->histOfs[0] = (icellX1*ncells.height + icellY0)*nbins;
 734                     data->histWeights[0] = cellX*(1.f - cellY);
 735                     data->histOfs[1] = (icellX1*ncells.height + icellY1)*nbins;
 736                     data->histWeights[1] = cellX*cellY;
 737                     data->histOfs[2] = data->histOfs[3] = 0;
 738                     data->histWeights[2] = data->histWeights[3] = 0;
 739                 }
 740                 else
 741                 //此时只对自身的cell有贡献
 742                 {
 743                     data = &pixData[count1++];
 744                     if( (unsigned)icellY0 < (unsigned)ncells.height )
 745                     {
 746                         icellY1 = icellY0;
 747                         cellY = 1.f - cellY;
 748                     }
 749                     data->histOfs[0] = (icellX1*ncells.height + icellY1)*nbins;
 750                     data->histWeights[0] = cellX*cellY;
 751                     data->histOfs[1] = data->histOfs[2] = data->histOfs[3] = 0;
 752                     data->histWeights[1] = data->histWeights[2] = data->histWeights[3] = 0;
 753                 }
 754             }
 755             //为什么每个block中i,j位置的gradOfs和qangleOfs都相同且是如下的计算公式呢？
 756             //那是因为输入的_img参数不是代表整幅图片而是检测窗口大小的图片，所以每个
 757             //检测窗口中关于block的信息可以看做是相同的
 758             data->gradOfs = (grad.cols*i + j)*2;
 759             data->qangleOfs = (qangle.cols*i + j)*2;
 760             //每个block中i，j位置的权重都是固定的
 761             data->gradWeight = weights(i,j);
 762         }
 763 
 764     //保证所有的点都被扫描了一遍
 765     assert( count1 + count2 + count4 == rawBlockSize );
 766     // defragment pixData
 767     //将pixData中按照内存排满，这样节省了2/3的内存
 768     for( j = 0; j < count2; j++ )
 769         pixData[j + count1] = pixData[j + rawBlockSize];
 770     for( j = 0; j < count4; j++ )
 771         pixData[j + count1 + count2] = pixData[j + rawBlockSize*2];
 772     //此时count2表示至多对2个cell有贡献的所有像素点的个数
 773     count2 += count1;
 774     //此时count4表示至多对4个cell有贡献的所有像素点的个数
 775     count4 += count2;
 776 
 777     //上面是初始化pixData,下面开始初始化blockData
 778     // initialize blockData
 779     for( j = 0; j < nblocks.width; j++ )
 780         for( i = 0; i < nblocks.height; i++ )
 781         {
 782             BlockData& data = blockData[j*nblocks.height + i];
 783             //histOfs表示该block对检测窗口贡献的hog描述变量起点在整个
 784             //变量中的坐标
 785             data.histOfs = (j*nblocks.height + i)*blockHistogramSize;
 786             //imgOffset表示该block的左上角在检测窗口中的坐标
 787             data.imgOffset = Point(j*blockStride.width,i*blockStride.height);
 788         }
 789         //一个检测窗口对应一个blockData内存，一个block对应一个pixData内存。
 790 }
 791 
 792 
 793 //pt为该block左上角在滑动窗口中的坐标，buf为指向检测窗口中blocData的指针
 794 //函数返回一个block描述子的指针
 795 const float* HOGCache::getBlock(Point pt, float* buf)
 796 {
 797     float* blockHist = buf;
 798     assert(descriptor != 0);
 799 
 800     Size blockSize = descriptor->blockSize;
 801     pt += imgoffset;
 802 
 803     CV_Assert( (unsigned)pt.x <= (unsigned)(grad.cols - blockSize.width) &&
 804                (unsigned)pt.y <= (unsigned)(grad.rows - blockSize.height) );
 805 
 806     if( useCache )
 807     {
 808         //cacheStride可以认为和blockStride是一样的
 809         //保证所获取到HOGCache是我们所需要的，即在block移动过程中会出现
 810         CV_Assert( pt.x % cacheStride.width == 0 &&
 811                    pt.y % cacheStride.height == 0 );
 812         //cacheIdx表示的是block个数的坐标
 813         Point cacheIdx(pt.x/cacheStride.width,
 814                       (pt.y/cacheStride.height) % blockCache.rows);
 815         //ymaxCached的长度为一个检测窗口垂直方向上容纳的block个数
 816         if( pt.y != ymaxCached[cacheIdx.y] )
 817         {
 818             //取出blockCacheFlags的第cacheIdx.y行并且赋值为0
 819             Mat_<uchar> cacheRow = blockCacheFlags.row(cacheIdx.y);
 820             cacheRow = (uchar)0;
 821             ymaxCached[cacheIdx.y] = pt.y;
 822         }
 823 
 824         //blockHist指向该点对应block所贡献的hog描述子向量，初始值为空
 825         blockHist = &blockCache[cacheIdx.y][cacheIdx.x*blockHistogramSize];
 826         uchar& computedFlag = blockCacheFlags(cacheIdx.y, cacheIdx.x);
 827         if( computedFlag != 0 )
 828             return blockHist;
 829         computedFlag = (uchar)1; // set it at once, before actual computing
 830     }
 831 
 832     int k, C1 = count1, C2 = count2, C4 = count4;
 833     //
 834     const float* gradPtr = (const float*)(grad.data + grad.step*pt.y) + pt.x*2;
 835     const uchar* qanglePtr = qangle.data + qangle.step*pt.y + pt.x*2;
 836 
 837     CV_Assert( blockHist != 0 );
 838 #ifdef HAVE_IPP
 839     ippsZero_32f(blockHist,blockHistogramSize);
 840 #else
 841     for( k = 0; k < blockHistogramSize; k++ )
 842         blockHist[k] = 0.f;
 843 #endif
 844 
 845     const PixData* _pixData = &pixData[0];
 846 
 847     //C1表示只对自己所在cell有贡献的点的个数
 848     for( k = 0; k < C1; k++ )
 849     {
 850         const PixData& pk = _pixData[k];
 851         //a表示的是幅度指针
 852         const float* a = gradPtr + pk.gradOfs;
 853         float w = pk.gradWeight*pk.histWeights[0];
 854         //h表示的是相位指针
 855         const uchar* h = qanglePtr + pk.qangleOfs;
 856 
 857         //幅度有2个通道是因为每个像素点的幅值被分解到了其相邻的两个bin上了
 858         //相位有2个通道是因为每个像素点的相位的相邻处都有的2个bin的序号
 859         int h0 = h[0], h1 = h[1];
 860         float* hist = blockHist + pk.histOfs[0];
 861         float t0 = hist[h0] + a[0]*w;
 862         float t1 = hist[h1] + a[1]*w;
 863         //hist中放的为加权的梯度值
 864         hist[h0] = t0; hist[h1] = t1;
 865     }
 866 
 867     for( ; k < C2; k++ )
 868     {
 869         const PixData& pk = _pixData[k];
 870         const float* a = gradPtr + pk.gradOfs;
 871         float w, t0, t1, a0 = a[0], a1 = a[1];
 872         const uchar* h = qanglePtr + pk.qangleOfs;
 873         int h0 = h[0], h1 = h[1];
 874 
 875         //因为此时的像素对2个cell有贡献，这是其中一个cell的贡献
 876         float* hist = blockHist + pk.histOfs[0];
 877         w = pk.gradWeight*pk.histWeights[0];
 878         t0 = hist[h0] + a0*w;
 879         t1 = hist[h1] + a1*w;
 880         hist[h0] = t0; hist[h1] = t1;
 881 
 882         //另一个cell的贡献
 883         hist = blockHist + pk.histOfs[1];
 884         w = pk.gradWeight*pk.histWeights[1];
 885         t0 = hist[h0] + a0*w;
 886         t1 = hist[h1] + a1*w;
 887         hist[h0] = t0; hist[h1] = t1;
 888     }
 889 
 890     //和上面类似
 891     for( ; k < C4; k++ )
 892     {
 893         const PixData& pk = _pixData[k];
 894         const float* a = gradPtr + pk.gradOfs;
 895         float w, t0, t1, a0 = a[0], a1 = a[1];
 896         const uchar* h = qanglePtr + pk.qangleOfs;
 897         int h0 = h[0], h1 = h[1];
 898 
 899         float* hist = blockHist + pk.histOfs[0];
 900         w = pk.gradWeight*pk.histWeights[0];
 901         t0 = hist[h0] + a0*w;
 902         t1 = hist[h1] + a1*w;
 903         hist[h0] = t0; hist[h1] = t1;
 904 
 905         hist = blockHist + pk.histOfs[1];
 906         w = pk.gradWeight*pk.histWeights[1];
 907         t0 = hist[h0] + a0*w;
 908         t1 = hist[h1] + a1*w;
 909         hist[h0] = t0; hist[h1] = t1;
 910 
 911         hist = blockHist + pk.histOfs[2];
 912         w = pk.gradWeight*pk.histWeights[2];
 913         t0 = hist[h0] + a0*w;
 914         t1 = hist[h1] + a1*w;
 915         hist[h0] = t0; hist[h1] = t1;
 916 
 917         hist = blockHist + pk.histOfs[3];
 918         w = pk.gradWeight*pk.histWeights[3];
 919         t0 = hist[h0] + a0*w;
 920         t1 = hist[h1] + a1*w;
 921         hist[h0] = t0; hist[h1] = t1;
 922     }
 923 
 924     normalizeBlockHistogram(blockHist);
 925 
 926     return blockHist;
 927 }
 928 
 929 
 930 void HOGCache::normalizeBlockHistogram(float* _hist) const
 931 {
 932     float* hist = &_hist[0];
 933 #ifdef HAVE_IPP
 934     size_t sz = blockHistogramSize;
 935 #else
 936     size_t i, sz = blockHistogramSize;
 937 #endif
 938 
 939     float sum = 0;
 940 #ifdef HAVE_IPP
 941     ippsDotProd_32f(hist,hist,sz,&sum);
 942 #else
 943     //第一次归一化求的是平方和
 944     for( i = 0; i < sz; i++ )
 945         sum += hist[i]*hist[i];
 946 #endif
 947     //分母为平方和开根号+0.1
 948     float scale = 1.f/(std::sqrt(sum)+sz*0.1f), thresh = (float)descriptor->L2HysThreshold;
 949 #ifdef HAVE_IPP
 950     ippsMulC_32f_I(scale,hist,sz);
 951     ippsThreshold_32f_I( hist, sz, thresh, ippCmpGreater );
 952     ippsDotProd_32f(hist,hist,sz,&sum);
 953 #else
 954     for( i = 0, sum = 0; i < sz; i++ )
 955     {
 956         //第2次归一化是在第1次的基础上继续求平和和
 957         hist[i] = std::min(hist[i]*scale, thresh);
 958         sum += hist[i]*hist[i];
 959     }
 960 #endif
 961 
 962     scale = 1.f/(std::sqrt(sum)+1e-3f);
 963 #ifdef HAVE_IPP
 964     ippsMulC_32f_I(scale,hist,sz);
 965 #else
 966     //最终归一化结果
 967     for( i = 0; i < sz; i++ )
 968         hist[i] *= scale;
 969 #endif
 970 }
 971 
 972 
 973 //返回测试图片中水平方向和垂直方向共有多少个检测窗口
 974 Size HOGCache::windowsInImage(Size imageSize, Size winStride) const
 975 {
 976     return Size((imageSize.width - winSize.width)/winStride.width + 1,
 977                 (imageSize.height - winSize.height)/winStride.height + 1);
 978 }
 979 
 980 
 981 //给定图片的大小，已经检测窗口滑动的大小和测试图片中的检测窗口的索引，得到该索引处
 982 //检测窗口的尺寸，包括坐标信息
 983 Rect HOGCache::getWindow(Size imageSize, Size winStride, int idx) const
 984 {
 985     int nwindowsX = (imageSize.width - winSize.width)/winStride.width + 1;
 986     int y = idx / nwindowsX;//商
 987     int x = idx - nwindowsX*y;//余数
 988     return Rect( x*winStride.width, y*winStride.height, winSize.width, winSize.height );
 989 }
 990 
 991 
 992 void HOGDescriptor::compute(const Mat& img, vector<float>& descriptors,
 993                             Size winStride, Size padding,
 994                             const vector<Point>& locations) const
 995 {
 996     //Size()表示长和宽都是0
 997     if( winStride == Size() )
 998         winStride = cellSize;
 999     //gcd为求最大公约数，如果采用默认值的话，则2者相同
1000     Size cacheStride(gcd(winStride.width, blockStride.width),
1001                      gcd(winStride.height, blockStride.height));
1002     size_t nwindows = locations.size();
1003     //alignSize(m, n)返回n的倍数大于等于m的最小值
1004     padding.width = (int)alignSize(std::max(padding.width, 0), cacheStride.width);
1005     padding.height = (int)alignSize(std::max(padding.height, 0), cacheStride.height);
1006     Size paddedImgSize(img.cols + padding.width*2, img.rows + padding.height*2);
1007 
1008     HOGCache cache(this, img, padding, padding, nwindows == 0, cacheStride);
1009 
1010     if( !nwindows )
1011         //Mat::area()表示为Mat的面积
1012         nwindows = cache.windowsInImage(paddedImgSize, winStride).area();
1013 
1014     const HOGCache::BlockData* blockData = &cache.blockData[0];
1015 
1016     int nblocks = cache.nblocks.area();
1017     int blockHistogramSize = cache.blockHistogramSize;
1018     size_t dsize = getDescriptorSize();//一个hog的描述长度
1019     //resize()为改变矩阵的行数，如果减少矩阵的行数则只保留减少后的
1020     //那些行，如果是增加行数，则保留所有的行。
1021     //这里将描述子长度扩展到整幅图片
1022     descriptors.resize(dsize*nwindows);
1023 
1024     for( size_t i = 0; i < nwindows; i++ )
1025     {
1026         //descriptor为第i个检测窗口的描述子首位置。
1027         float* descriptor = &descriptors[i*dsize];
1028        
1029         Point pt0;
1030         //非空
1031         if( !locations.empty() )
1032         {
1033             pt0 = locations[i];
1034             //非法的点
1035             if( pt0.x < -padding.width || pt0.x > img.cols + padding.width - winSize.width ||
1036                 pt0.y < -padding.height || pt0.y > img.rows + padding.height - winSize.height )
1037                 continue;
1038         }
1039         //locations为空
1040         else
1041         {
1042             //pt0为没有扩充前图像对应的第i个检测窗口
1043             pt0 = cache.getWindow(paddedImgSize, winStride, (int)i).tl() - Point(padding);
1044             CV_Assert(pt0.x % cacheStride.width == 0 && pt0.y % cacheStride.height == 0);
1045         }
1046 
1047         for( int j = 0; j < nblocks; j++ )
1048         {
1049             const HOGCache::BlockData& bj = blockData[j];
1050             //pt为block的左上角相对检测图片的坐标
1051             Point pt = pt0 + bj.imgOffset;
1052 
1053             //dst为该block在整个测试图片的描述子的位置
1054             float* dst = descriptor + bj.histOfs;
1055             const float* src = cache.getBlock(pt, dst);
1056             if( src != dst )
1057 #ifdef HAVE_IPP
1058                ippsCopy_32f(src,dst,blockHistogramSize);
1059 #else
1060                 for( int k = 0; k < blockHistogramSize; k++ )
1061                     dst[k] = src[k];
1062 #endif
1063         }
1064     }
1065 }
1066 
1067 
1068 void HOGDescriptor::detect(const Mat& img,
1069     vector<Point>& hits, vector<double>& weights, double hitThreshold, 
1070     Size winStride, Size padding, const vector<Point>& locations) const
1071 {
1072     //hits里面存的是符合检测到目标的窗口的左上角顶点坐标
1073     hits.clear();
1074     if( svmDetector.empty() )
1075         return;
1076 
1077     if( winStride == Size() )
1078         winStride = cellSize;
1079     Size cacheStride(gcd(winStride.width, blockStride.width),
1080                      gcd(winStride.height, blockStride.height));
1081     size_t nwindows = locations.size();
1082     padding.width = (int)alignSize(std::max(padding.width, 0), cacheStride.width);
1083     padding.height = (int)alignSize(std::max(padding.height, 0), cacheStride.height);
1084     Size paddedImgSize(img.cols + padding.width*2, img.rows + padding.height*2);
1085 
1086     HOGCache cache(this, img, padding, padding, nwindows == 0, cacheStride);
1087 
1088     if( !nwindows )
1089         nwindows = cache.windowsInImage(paddedImgSize, winStride).area();
1090 
1091     const HOGCache::BlockData* blockData = &cache.blockData[0];
1092 
1093     int nblocks = cache.nblocks.area();
1094     int blockHistogramSize = cache.blockHistogramSize;
1095     size_t dsize = getDescriptorSize();
1096 
1097     double rho = svmDetector.size() > dsize ? svmDetector[dsize] : 0;
1098     vector<float> blockHist(blockHistogramSize);
1099 
1100     for( size_t i = 0; i < nwindows; i++ )
1101     {
1102         Point pt0;
1103         if( !locations.empty() )
1104         {
1105             pt0 = locations[i];
1106             if( pt0.x < -padding.width || pt0.x > img.cols + padding.width - winSize.width ||
1107                 pt0.y < -padding.height || pt0.y > img.rows + padding.height - winSize.height )
1108                 continue;
1109         }
1110         else
1111         {
1112             pt0 = cache.getWindow(paddedImgSize, winStride, (int)i).tl() - Point(padding);
1113             CV_Assert(pt0.x % cacheStride.width == 0 && pt0.y % cacheStride.height == 0);
1114         }
1115         double s = rho;
1116         //svmVec指向svmDetector最前面那个元素
1117         const float* svmVec = &svmDetector[0];
1118 #ifdef HAVE_IPP
1119         int j;
1120 #else
1121         int j, k;
1122 #endif
1123         for( j = 0; j < nblocks; j++, svmVec += blockHistogramSize )
1124         {
1125             const HOGCache::BlockData& bj = blockData[j];
1126             Point pt = pt0 + bj.imgOffset;
1127             
1128             //vec为测试图片pt处的block贡献的描述子指针
1129             const float* vec = cache.getBlock(pt, &blockHist[0]);
1130 #ifdef HAVE_IPP
1131             Ipp32f partSum;
1132             ippsDotProd_32f(vec,svmVec,blockHistogramSize,&partSum);
1133             s += (double)partSum;
1134 #else
1135             for( k = 0; k <= blockHistogramSize - 4; k += 4 )
1136                 //const float* svmVec = &svmDetector[0];
1137                 s += vec[k]*svmVec[k] + vec[k+1]*svmVec[k+1] +
1138                     vec[k+2]*svmVec[k+2] + vec[k+3]*svmVec[k+3];
1139             for( ; k < blockHistogramSize; k++ )
1140                 s += vec[k]*svmVec[k];
1141 #endif
1142         }
1143         if( s >= hitThreshold )
1144         {
1145             hits.push_back(pt0);
1146             weights.push_back(s);
1147         }
1148     }
1149 }
1150 
1151 //不用保留检测到目标的可信度，即权重
1152 void HOGDescriptor::detect(const Mat& img, vector<Point>& hits, double hitThreshold, 
1153                            Size winStride, Size padding, const vector<Point>& locations) const
1154 {
1155     vector<double> weightsV;
1156     detect(img, hits, weightsV, hitThreshold, winStride, padding, locations);
1157 }
1158 
1159 struct HOGInvoker
1160 {
1161     HOGInvoker( const HOGDescriptor* _hog, const Mat& _img,
1162                 double _hitThreshold, Size _winStride, Size _padding,
1163                 const double* _levelScale, ConcurrentRectVector* _vec, 
1164                 ConcurrentDoubleVector* _weights=0, ConcurrentDoubleVector* _scales=0 ) 
1165     {
1166         hog = _hog;
1167         img = _img;
1168         hitThreshold = _hitThreshold;
1169         winStride = _winStride;
1170         padding = _padding;
1171         levelScale = _levelScale;
1172         vec = _vec;
1173         weights = _weights;
1174         scales = _scales;
1175     }
1176 
1177     void operator()( const BlockedRange& range ) const
1178     {
1179         int i, i1 = range.begin(), i2 = range.end();
1180         double minScale = i1 > 0 ? levelScale[i1] : i2 > 1 ? levelScale[i1+1] : std::max(img.cols, img.rows);
1181         //将原图片进行缩放
1182         Size maxSz(cvCeil(img.cols/minScale), cvCeil(img.rows/minScale));
1183         Mat smallerImgBuf(maxSz, img.type());
1184         vector<Point> locations;
1185         vector<double> hitsWeights;
1186 
1187         for( i = i1; i < i2; i++ )
1188         {
1189             double scale = levelScale[i];
1190             Size sz(cvRound(img.cols/scale), cvRound(img.rows/scale));
1191             //smallerImg只是构造一个指针，并没有复制数据
1192             Mat smallerImg(sz, img.type(), smallerImgBuf.data);
1193             //没有尺寸缩放
1194             if( sz == img.size() )
1195                 smallerImg = Mat(sz, img.type(), img.data, img.step);
1196             //有尺寸缩放
1197             else
1198                 resize(img, smallerImg, sz);
1199             //该函数实际上是将返回的值存在locations和histWeights中
1200             //其中locations存的是目标区域的左上角坐标
1201             hog->detect(smallerImg, locations, hitsWeights, hitThreshold, winStride, padding);
1202             Size scaledWinSize = Size(cvRound(hog->winSize.width*scale), cvRound(hog->winSize.height*scale));
1203             for( size_t j = 0; j < locations.size(); j++ )
1204             {
1205                 //保存目标区域
1206                 vec->push_back(Rect(cvRound(locations[j].x*scale),
1207                                     cvRound(locations[j].y*scale),
1208                                     scaledWinSize.width, scaledWinSize.height));
1209                 //保存缩放尺寸
1210                 if (scales) {
1211                     scales->push_back(scale);
1212                 }
1213             }
1214             //保存svm计算后的结果值
1215             if (weights && (!hitsWeights.empty()))
1216             {
1217                 for (size_t j = 0; j < locations.size(); j++)
1218                 {
1219                     weights->push_back(hitsWeights[j]);
1220                 }
1221             }        
1222         }
1223     }
1224 
1225     const HOGDescriptor* hog;
1226     Mat img;
1227     double hitThreshold;
1228     Size winStride;
1229     Size padding;
1230     const double* levelScale;
1231     //typedef tbb::concurrent_vector<Rect> ConcurrentRectVector;
1232     ConcurrentRectVector* vec;
1233     //typedef tbb::concurrent_vector<double> ConcurrentDoubleVector;
1234     ConcurrentDoubleVector* weights;
1235     ConcurrentDoubleVector* scales;
1236 };
1237 
1238 
1239 void HOGDescriptor::detectMultiScale(
1240     const Mat& img, vector<Rect>& foundLocations, vector<double>& foundWeights,
1241     double hitThreshold, Size winStride, Size padding,
1242     double scale0, double finalThreshold, bool useMeanshiftGrouping) const  
1243 {
1244     double scale = 1.;
1245     int levels = 0;
1246 
1247     vector<double> levelScale;
1248 
1249     //nlevels默认的是64层
1250     for( levels = 0; levels < nlevels; levels++ )
1251     {
1252         levelScale.push_back(scale);
1253         if( cvRound(img.cols/scale) < winSize.width ||
1254             cvRound(img.rows/scale) < winSize.height ||
1255             scale0 <= 1 )
1256             break;
1257         //只考虑测试图片尺寸比检测窗口尺寸大的情况
1258         scale *= scale0;
1259     }
1260     levels = std::max(levels, 1);
1261     levelScale.resize(levels);
1262 
1263     ConcurrentRectVector allCandidates;
1264     ConcurrentDoubleVector tempScales;
1265     ConcurrentDoubleVector tempWeights;
1266     vector<double> foundScales;
1267     
1268     //TBB并行计算
1269     parallel_for(BlockedRange(0, (int)levelScale.size()),
1270                  HOGInvoker(this, img, hitThreshold, winStride, padding, &levelScale[0], &allCandidates, &tempWeights, &tempScales));
1271     //将tempScales中的内容复制到foundScales中；back_inserter是指在指定参数迭代器的末尾插入数据
1272     std::copy(tempScales.begin(), tempScales.end(), back_inserter(foundScales));
1273     //容器的clear()方法是指移除容器中所有的数据
1274     foundLocations.clear();
1275     //将候选目标窗口保存在foundLocations中
1276     std::copy(allCandidates.begin(), allCandidates.end(), back_inserter(foundLocations));
1277     foundWeights.clear();
1278     //将候选目标可信度保存在foundWeights中
1279     std::copy(tempWeights.begin(), tempWeights.end(), back_inserter(foundWeights));
1280 
1281     if ( useMeanshiftGrouping )
1282     {
1283         groupRectangles_meanshift(foundLocations, foundWeights, foundScales, finalThreshold, winSize);
1284     }
1285     else
1286     {
1287         //对矩形框进行聚类
1288         groupRectangles(foundLocations, (int)finalThreshold, 0.2);
1289     }
1290 }
1291 
1292 //不考虑目标的置信度
1293 void HOGDescriptor::detectMultiScale(const Mat& img, vector<Rect>& foundLocations, 
1294                                      double hitThreshold, Size winStride, Size padding,
1295                                      double scale0, double finalThreshold, bool useMeanshiftGrouping) const  
1296 {
1297     vector<double> foundWeights;
1298     detectMultiScale(img, foundLocations, foundWeights, hitThreshold, winStride, 
1299                      padding, scale0, finalThreshold, useMeanshiftGrouping);
1300 }
1301 
1302 typedef RTTIImpl<HOGDescriptor> HOGRTTI;
1303 
1304 CvType hog_type( CV_TYPE_NAME_HOG_DESCRIPTOR, HOGRTTI::isInstance,
1305                  HOGRTTI::release, HOGRTTI::read, HOGRTTI::write, HOGRTTI::clone);
1306 
1307 vector<float> HOGDescriptor::getDefaultPeopleDetector()
1308 {
1309     static const float detector[] = {
1310        0.05359386f, -0.14721455f, -0.05532170f, 0.05077307f,
1311        0.11547081f, -0.04268804f, 0.04635834f, ........
1312   };
1313        //返回detector数组的从头到尾构成的向量
1314     return vector<float>(detector, detector + sizeof(detector)/sizeof(detector[0]));
1315 }
1316 //This function renurn 1981 SVM coeffs obtained from daimler's base. 
1317 //To use these coeffs the detection window size should be (48,96)  
1318 vector<float> HOGDescriptor::getDaimlerPeopleDetector()
1319 {
1320     static const float detector[] = {
1321         0.294350f, -0.098796f, -0.129522f, 0.078753f,
1322         0.387527f, 0.261529f, 0.145939f, 0.061520f,
1323       ........
1324         };
1325         //返回detector的首尾构成的向量
1326         return vector<float>(detector, detector + sizeof(detector)/sizeof(detector[0]));
1327 }
1328 
1329 }

objdetect.hpp中关于hog的部分:

  1  HOG (Histogram-of-Oriented-Gradients) Descriptor and Object Detector //
  2 
  3 struct CV_EXPORTS_W HOGDescriptor
  4 {
  5 public:
  6     enum { L2Hys=0 };
  7     enum { DEFAULT_NLEVELS=64 };
  8 
  9     CV_WRAP HOGDescriptor() : winSize(64,128), blockSize(16,16), blockStride(8,8),
 10         cellSize(8,8), nbins(9), derivAperture(1), winSigma(-1),
 11         histogramNormType(HOGDescriptor::L2Hys), L2HysThreshold(0.2), gammaCorrection(true),
 12         nlevels(HOGDescriptor::DEFAULT_NLEVELS)
 13     {}
 14 
 15     //可以用构造函数的参数来作为冒号外的参数初始化传入，这样定义该类的时候，一旦变量分配了
 16     //内存，则马上会被初始化，而不用等所有变量分配完内存后再初始化。
 17     CV_WRAP HOGDescriptor(Size _winSize, Size _blockSize, Size _blockStride,
 18                   Size _cellSize, int _nbins, int _derivAperture=1, double _winSigma=-1,
 19                   int _histogramNormType=HOGDescriptor::L2Hys,
 20                   double _L2HysThreshold=0.2, bool _gammaCorrection=false,
 21                   int _nlevels=HOGDescriptor::DEFAULT_NLEVELS)
 22     : winSize(_winSize), blockSize(_blockSize), blockStride(_blockStride), cellSize(_cellSize),
 23     nbins(_nbins), derivAperture(_derivAperture), winSigma(_winSigma),
 24     histogramNormType(_histogramNormType), L2HysThreshold(_L2HysThreshold),
 25     gammaCorrection(_gammaCorrection), nlevels(_nlevels)
 26     {}
 27 
 28     //可以导入文本文件进行初始化
 29     CV_WRAP HOGDescriptor(const String& filename)
 30     {
 31         load(filename);
 32     }
 33 
 34     HOGDescriptor(const HOGDescriptor& d)
 35     {
 36         d.copyTo(*this);
 37     }
 38 
 39     virtual ~HOGDescriptor() {}
 40 
 41     //size_t是一个long unsigned int型
 42     CV_WRAP size_t getDescriptorSize() const;
 43     CV_WRAP bool checkDetectorSize() const;
 44     CV_WRAP double getWinSigma() const;
 45 
 46     //virtual为虚函数，在指针或引用时起函数多态作用
 47     CV_WRAP virtual void setSVMDetector(InputArray _svmdetector);
 48 
 49     virtual bool read(FileNode& fn);
 50     virtual void write(FileStorage& fs, const String& objname) const;
 51 
 52     CV_WRAP virtual bool load(const String& filename, const String& objname=String());
 53     CV_WRAP virtual void save(const String& filename, const String& objname=String()) const;
 54     virtual void copyTo(HOGDescriptor& c) const;
 55 
 56     CV_WRAP virtual void compute(const Mat& img,
 57                          CV_OUT vector<float>& descriptors,
 58                          Size winStride=Size(), Size padding=Size(),
 59                          const vector<Point>& locations=vector<Point>()) const;
 60     //with found weights output
 61     CV_WRAP virtual void detect(const Mat& img, CV_OUT vector<Point>& foundLocations,
 62                         CV_OUT vector<double>& weights,
 63                         double hitThreshold=0, Size winStride=Size(),
 64                         Size padding=Size(),
 65                         const vector<Point>& searchLocations=vector<Point>()) const;
 66     //without found weights output
 67     virtual void detect(const Mat& img, CV_OUT vector<Point>& foundLocations,
 68                         double hitThreshold=0, Size winStride=Size(),
 69                         Size padding=Size(),
 70                         const vector<Point>& searchLocations=vector<Point>()) const;
 71     //with result weights output
 72     CV_WRAP virtual void detectMultiScale(const Mat& img, CV_OUT vector<Rect>& foundLocations,
 73                                   CV_OUT vector<double>& foundWeights, double hitThreshold=0,
 74                                   Size winStride=Size(), Size padding=Size(), double scale=1.05,
 75                                   double finalThreshold=2.0,bool useMeanshiftGrouping = false) const;
 76     //without found weights output
 77     virtual void detectMultiScale(const Mat& img, CV_OUT vector<Rect>& foundLocations,
 78                                   double hitThreshold=0, Size winStride=Size(),
 79                                   Size padding=Size(), double scale=1.05,
 80                                   double finalThreshold=2.0, bool useMeanshiftGrouping = false) const;
 81 
 82     CV_WRAP virtual void computeGradient(const Mat& img, CV_OUT Mat& grad, CV_OUT Mat& angleOfs,
 83                                  Size paddingTL=Size(), Size paddingBR=Size()) const;
 84 
 85     CV_WRAP static vector<float> getDefaultPeopleDetector();
 86     CV_WRAP static vector<float> getDaimlerPeopleDetector();
 87 
 88     CV_PROP Size winSize;
 89     CV_PROP Size blockSize;
 90     CV_PROP Size blockStride;
 91     CV_PROP Size cellSize;
 92     CV_PROP int nbins;
 93     CV_PROP int derivAperture;
 94     CV_PROP double winSigma;
 95     CV_PROP int histogramNormType;
 96     CV_PROP double L2HysThreshold;
 97     CV_PROP bool gammaCorrection;
 98     CV_PROP vector<float> svmDetector;
 99     CV_PROP int nlevels;
100 };