为什么 SIFT 算法在 OctaveLayers 为 1 时消耗更多的时间

本文链接：https://blog.csdn.net/mightbxg/article/details/118997638

本文是本人在 StackOverflow 上提的一个问题和回答，参见：
Why SIFT costs more time with fewer octave layers?

SIFT 算法原理有很多不错的资料都解释地很详细了，比如：

前段时间在使用 OpenCV SIFT 时，遇到一个有点反直觉的问题，特此记录。

问题描述

众所周知，SIFT 计算速度慢一直都是制约其应用的重要缺点。为了降低其计算时耗，我首先想到的是调整其算法参数，这是最简单的方法。SIFT 检测器创建函数为：

static Ptr<SIFT> create (int nfeatures=0, int nOctaveLayers=3,
  double contrastThreshold=0.04, double edgeThreshold=10, double sigma=1.6)

其中 nOctaveLayers 参数指的是图像金字塔中每一个 Octave 包含的用于提取特征点的 Layer 数量。对应于下图中红框内的 Layer 数量（图片来自前面提到的第一篇博客）：
nOctaveLayers=3
默认情况下，nOctaveLayers 值为 3，也就是说金字塔每个 Octave 有 3+2=5 个 DoG 层、3+2+1=6 个高斯模糊层。而图像金字塔高度（Octave 数量）是自动计算的，由输入图像分辨率确定。

那么提速的思路就有了：减少 nOctaveLayers 的值，能减少高斯模糊层和 DoG 层的数量，降低特征点数量，并减少描述子计算时间。

我写了一个简单的测试程序，以此评估不同 nOctaveLayers 下 SIFT 的时耗：

const int test_num = 100;
const int layers = 5;
cout << "layers: " << layers << endl;

auto sift = SIFT::create(0, layers);
vector<KeyPoint> kps;
Mat descs;

auto t1 = chrono::high_resolution_clock::now();
for (int i = 0; i < test_num; ++i)
    sift->detectAndCompute(img_src, noArray(), kps, descs);
auto t2 = chrono::high_resolution_clock::now();
cout << "num of kps: " << kps.size() << endl;
cout << "avg time cost: " << chrono::duration<double>(t2 - t1).count() * 1e3 / test_num << endl;

然而结果却和预想的不太一样：

nOctaveLayers	特征点数量	时间消耗 (ms)
1	1026	63.41
2	1795	45.07
3	2043	45.74
4	2173	47.83
5	2224	51.86

随着 nOctaveLayers 的减少，特征点数量倒是越来越少，但总时间消耗并没有显著下降，甚至当 nOctaveLayers 为 1 时，时耗不降反升，直接超过设为 5 的情况！

原因分析

进过几个小时的源码阅读和调试分析，我终于找到了问题所在：高斯模糊。

SIFT 算法流程为：

创建初始图像：将输入图像数据类型转化为 float 型，分辨率扩大一倍，并做一次高斯模糊 (sigma=1.56)
构建高斯金字塔
查找关键点：构建 DoG 金字塔，并查找尺度空间极值点
计算每个关键点的描述子

当 nOctaveLayers 增大时，步骤 3 和 4 的时耗确实会增加，但由于多线程计算，这个时耗增加并不显著（大约数毫秒）。

然而，步骤 2 却消耗了总时间的一半以上！当 nOctaveLayers 为 3 时，构建高斯金字塔消耗了 25.27 ms (总共 43.49 ms)，当 nOctaveLayers 为 1 时，消耗了 51.16 ms (总共 63.10 ms)。这是因为，为了保持不同 Octave 之间的尺度空间连续性，当 Layer 数量降低时，高斯模糊所使用的 sigma 急剧增大，而很大的 sigma 值会使高斯模糊极其地慢：

vector<double> sig1 = { 1.6, 2.77128, 5.54256, 11.0851 };
vector<double> sig3 = { 1.6, 1.22627, 1.54501, 1.94659, 2.45255, 3.09002 };
vector<double> sig5 = { 1.6, 0.9044, 1.03888, 1.19336, 1.37081, 1.57465, 1.8088, 2.07777 };

auto blurTest = [](const vector<double>& sigs, const string& label) {
    const int test_num = 100;
    auto t1 = chrono::high_resolution_clock::now();
    for (int i = 0; i < test_num; ++i) {
        vector<Mat> pyr;
        pyr.resize(sigs.size());
        pyr[0] = Mat::zeros(960, 1280, CV_32FC1);
        for (size_t i = 1; i < sigs.size(); ++i)
            GaussianBlur(pyr[i - 1], pyr[i], Size(), sigs[i], sigs[i]);
    }
    auto t2 = chrono::high_resolution_clock::now();
    auto time = chrono::duration<double>(t2 - t1).count() * 1e3 / test_num;
    cout << label << ": " << time << " ms\n";
};

blurTest(sig1, "1");
blurTest(sig3, "3");
blurTest(sig5, "5");

/* output:
1: 45.3958 ms
3: 28.5943 ms
5: 31.4827 ms
*/