C++ 双峰高斯函数拟合

在数据分析与清洗中经常遇到这样的数据:数据不仅仅向单个中心靠拢,而是类似分段的向两个甚至多个中心靠拢。数据向单个中心靠拢时,可以简单地用高斯函数去拟合,得到相关参数,那么可以通过3sigma准则对数据进行去噪以为后续处理做准备。当数据向两个中心靠拢时,单峰高斯函数就不再适用了,这时候就需要双峰高斯函数出马了。

高斯函数(Gaussian Function)是一种常见的数学函数,广泛应用于概率论、统计学、信号处理、图像处理等领域。


一维高斯函数

一维高斯函数的公式为:
f ( x ) = A ⋅ exp ⁡ ( − ( x − μ ) 2 2 σ 2 ) f(x)=A\cdot\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) f(x)=Aexp(2σ2(xμ)2)

参数说明:

  • A A A:振幅(Amplitude),控制函数的最大值。
  • μ \mu μ:均值(Mean),表示函数的中心位置(即峰值所在的位置)。
  • σ \sigma σ:标准差(Standard Deviation),控制函数的宽度(越小越尖锐,越大越平缓)。
  • x x x:自变量。

归一化形式:
如果需要将高斯函数归一化(使其在 ( − ∞ -\infty ) 到 ( + ∞ +\infty +) 的积分等于 1),则公式变为:
f ( x ) = 1 2 π σ ⋅ exp ⁡ ( − ( x − μ ) 2 2 σ 2 ) f(x)=\frac{1}{\sqrt{2\pi}\sigma}\cdot\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) f(x)=2π σ1exp(2σ2(xμ)2)
此时,( A = \frac{1}{\sqrt{2\pi}\sigma} )。


二维高斯函数

二维高斯函数通常用于描述二维空间中的分布,例如图像处理中的模糊滤波器。其公式为:
f ( x , y ) = A ⋅ exp ⁡ ( − ( x − x 0 ) 2 2 σ x 2 − ( y − y 0 ) 2 2 σ y 2 ) f(x,y)=A\cdot\exp\left(-\frac{(x-x_0)^2}{2\sigma_x^2}-\frac{(y-y_0)^2}{2\sigma_y^2}\right) f(x,y)=Aexp(2σx2(xx0)22σy2(yy0)2)

各向同性高斯函数
当 ( \sigma_x = \sigma_y = \sigma ) 时,二维高斯函数变为:
f ( x , y ) = A ⋅ exp ⁡ ( − ( x − x 0 ) 2 + ( y − y 0 ) 2 2 σ 2 ) f(x,y)=A\cdot\exp\left(-\frac{(x-x_0)^2+(y-y_0)^2}{2\sigma^2}\right) f(x,y)=Aexp(2σ2(xx0)2+(yy0)2)


多维高斯函数

对于 n 维高斯函数,其通用形式为:

f ( x ) = A ⋅ exp ⁡ ( − 1 2 ( x − μ ) ⊤ Σ − 1 ( x − μ ) ) f(\mathbf{x}) = A \cdot \exp\left(-\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^\top \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu})\right) f(x)=Aexp(21(xμ)Σ1(xμ))

参数说明:

  • x \mathbf{x} x:( n )-维向量,表示自变量。
  • μ \boldsymbol{\mu} μ:( n )-维向量,表示均值。
  • Σ \Sigma Σ:协方差矩阵,控制分布的形状和方向。
  • Σ − 1 \Sigma^{-1} Σ1:协方差矩阵的逆矩阵。

各向同性多维高斯函数
当协方差矩阵为对角矩阵且所有对角元素相等时(即 Σ = σ 2 I \Sigma = \sigma^2 I Σ=σ2I,其中 I I I 是单位矩阵),公式简化为:
f ( x ) = A ⋅ exp ⁡ ( − ∥ x − μ ∥ 2 2 σ 2 ) f(\mathbf{x}) = A \cdot \exp\left(-\frac{\|\mathbf{x} - \boldsymbol{\mu}\|^2}{2\sigma^2}\right) f(x)=Aexp(2σ2xμ2)
其中 ∥ x − μ ∥ \|\mathbf{x} - \boldsymbol{\mu}\| xμ 表示欧几里得距离。


一维双峰高斯函数

对于双峰高斯函数,我们可以将其视为两个这样的高斯函数的叠加,每个都有自己的振幅,均值及标准差。因此,双峰高斯函数的形式可以表示为:

f ( x ) = A 1 exp ⁡ ( − ( x − μ 1 ) 2 2 σ 1 2 ) + A 2 exp ⁡ ( − ( x − μ 2 ) 2 2 σ 2 2 ) f(x) = A_1 \exp\left(-\frac{(x-\mu_1)^2}{2\sigma_1^2}\right) + A_2 \exp\left(-\frac{(x-\mu_2)^2}{2\sigma_2^2}\right) f(x)=A1exp(2σ12(xμ1)2)+A2exp(2σ22(xμ2)2)

可以用于拟合或描述具有双峰特性的数据集。


代码实现

这里基于ceres,在C++中对双峰高斯拟合进行实现。
思路很简单,输入一维的数据和直方图的宽度,构建直方图,然后通过直方图的顶端坐标来拟合双峰高斯函数,稍微麻烦的是找到合适的初始值:

  1. 建直方图
  2. 确定初值
  3. 优化求解
struct BimodalGaussianResidual
{
	BimodalGaussianResidual(double x, double y) : x_(x), y_(y)
	{
	}

	template <typename T>
	bool operator()(const T* const params, T* residual) const
	{
		const T A1 = params[0];
		const T mu1 = params[1];
		const T sigma1 = params[2];
		const T A2 = params[3];
		const T mu2 = params[4];
		const T sigma2 = params[5];

		T f1 = A1 * ceres::exp(-(x_ - mu1) * (x_ - mu1) / (T(2) * sigma1 * sigma1));
		T f2 = A2 * ceres::exp(-(x_ - mu2) * (x_ - mu2) / (T(2) * sigma2 * sigma2));

		residual[0] = y_ - (f1 + f2);
		return true;
	}

private:
	const double x_;
	const double y_;
};



bool FitBimodalGaussian(std::vector<float>& data, double* params, const double  bin_width)
{
	//==========================建直方图==========================
	double min_val = *std::min_element(data.begin(), data.end());
	double max_val = *std::max_element(data.begin(), data.end());

	std::map<double, int> histogram;
	for (double value : data)
	{
		double bin = std::floor((value - min_val) / bin_width) * bin_width; // 找到所属的bin
		histogram[bin]++;
	}

	std::vector<double> x_data;
	std::vector<double> heights;
	for (const auto& entry : histogram)
	{
		x_data.push_back(entry.first + min_val + 0.5 * bin_width);
		heights.push_back(entry.second);
	}

	//==========================确定初值==========================

	// 找到直方图中的两个峰值
	double max_height1 = 0, max_height2 = 0;
	double mu1 = 0, mu2 = 0;

	size_t peak1_index = 0; // 第一个峰值的索引

	// 找第一个峰值
	for (size_t i = 0; i < heights.size(); ++i)
	{
		if (heights[i] > max_height1)
		{
			max_height1 = heights[i];
			mu1 = x_data[i];
			peak1_index = i;
		}
	}

	// 找第二个峰值(避开第一个峰值附近的区域)
	const double separation_threshold = 5; // 峰值之间的最小距离      // 根据数据确定
	for (size_t i = 0; i < heights.size(); ++i)
	{
		if (std::abs(x_data[i] - mu1) > separation_threshold && heights[i] > max_height2)
		{
			max_height2 = heights[i];
			mu2 = x_data[i];
		}
	}

	// 估计标准差 sigma1 和 sigma2
	double sigma1 = 0.5, sigma2 = 0.5; // 初始猜测值
	const double half_max_height1 = max_height1 / 2.0;
	const double half_max_height2 = max_height2 / 2.0;

	// 找第一个峰值的半高宽
	double left_edge1 = mu1, right_edge1 = mu1;
	for (size_t i = 0; i < x_data.size(); ++i)
	{
		if (x_data[i] < mu1 && heights[i] >= half_max_height1)
		{
			left_edge1 = x_data[i];
		}
		if (x_data[i] > mu1 && heights[i] >= half_max_height1)
		{
			right_edge1 = x_data[i];
			break;
		}
	}
	sigma1 = (right_edge1 - left_edge1) / (2 * std::sqrt(2 * std::log(2)));

	// 找第二个峰值的半高宽
	double left_edge2 = mu2, right_edge2 = mu2;
	for (size_t i = 0; i < x_data.size(); ++i)
	{
		if (x_data[i] < mu2 && heights[i] >= half_max_height2)
		{
			left_edge2 = x_data[i];
		}
		if (x_data[i] > mu2 && heights[i] >= half_max_height2)
		{
			right_edge2 = x_data[i];
			break;
		}
	}
	sigma2 = (right_edge2 - left_edge2) / (2 * std::sqrt(2 * std::log(2)));

	// 设置初始参数 [A1, mu1, sigma1, A2, mu2, sigma2]
	params[0] = max_height1;
	params[1] = mu1;
	params[2] = sigma1;
	params[3] = max_height2;
	params[4] = mu2;
	params[5] = sigma2;

#if MY_LOG
	std::cout << std::fixed << std::setprecision(6);
	std::cout << "\nInitial Parameters:" << std::endl;
	std::cout << "A1 = " << params[0] << ", mu1 = " << params[1] << ", sigma1 = " << params[2] << std::endl;
	std::cout << "A2 = " << params[3] << ", mu2 = " << params[4] << ", sigma2 = " << params[5] << std::endl;
#endif

	//==========================优化求解==========================
	ceres::Problem problem;
	for (size_t i = 0; i < x_data.size(); ++i)
	{
		problem.AddResidualBlock(new ceres::AutoDiffCostFunction<BimodalGaussianResidual, 1, 6>(new BimodalGaussianResidual(x_data[i], heights[i])), nullptr, params);
	}

	ceres::Solver::Options options;
	options.linear_solver_type = ceres::DENSE_QR;
	options.minimizer_progress_to_stdout = false;

	ceres::Solver::Summary summary;
	ceres::Solve(options, &problem, &summary);

#if MY_LOG
	std::cout << "\nFinal Parameters:\n";
	std::cout << "A1: " << params[0] << ", mu1: " << params[1] << ", sigma1: " << params[2] << "\n";
	std::cout << "A2: " << params[3] << ", mu2: " << params[4] << ", sigma2: " << params[5] << "\n";
#endif

	return true;
}

用一组真实数据测试,可得:

Initial Parameters:
A1 = 16223.000000, mu1 = -1.018652, sigma1 = 0.084932
A2 = 2868.000000, mu2 = 7.181348, sigma2 = 0.084932

Final Parameters:
A1: 15804.267874, mu1: -1.138264, sigma1: 0.744859
A2: 2530.132897, mu2: 6.820187, sigma2: 0.759935

在这里插入图片描述

二维双峰高斯函数

类似的,二维双峰高斯函数可表达式如下:
f ( x , y ) = A 1 exp ⁡ ( − ( x − μ 1 x ) 2 2 σ 1 x 2 − ( y − μ 1 y ) 2 2 σ 1 y 2 ) + A 2 exp ⁡ ( − ( x − μ 2 x ) 2 2 σ 2 x 2 − ( y − μ 2 y ) 2 2 σ 2 y 2 ) f(x, y) = A_1 \exp\left(-\frac{(x-\mu_{1x})^2}{2\sigma_{1x}^2} - \frac{(y-\mu_{1y})^2}{2\sigma_{1y}^2}\right) + A_2 \exp\left(-\frac{(x-\mu_{2x})^2}{2\sigma_{2x}^2} - \frac{(y-\mu_{2y})^2}{2\sigma_{2y}^2}\right) f(x,y)=A1exp(2σ1x2(xμ1x)22σ1y2(yμ1y)2)+A2exp(2σ2x2(xμ2x)22σ2y2(yμ2y)2)

代码实现

#include <iostream>
#include <vector>
#include <cmath>
#include <random>
#include <map>
#include <iomanip>  
#include "ceres/ceres.h"
#include "gnuplot-iostream.h"

// 二维双峰高斯残差函数
struct BimodalGaussianResidual2D
{
    BimodalGaussianResidual2D(double x, double y, double z) : x_(x), y_(y), z_(z)
    {
    }

    template <typename T>
    bool operator()(const T* const params, T* residual) const
    {
        const T A1 = params[0];
        const T mu1_x = params[1];
        const T mu1_y = params[2];
        const T sigma1_x = params[3];
        const T sigma1_y = params[4];

        const T A2 = params[5];
        const T mu2_x = params[6];
        const T mu2_y = params[7];
        const T sigma2_x = params[8];
        const T sigma2_y = params[9];

        T f1 = A1 * ceres::exp(-(T(x_ - mu1_x) * T(x_ - mu1_x) / (T(2) * sigma1_x * sigma1_x) +
            T(y_ - mu1_y) * T(y_ - mu1_y) / (T(2) * sigma1_y * sigma1_y)));
        T f2 = A2 * ceres::exp(-(T(x_ - mu2_x) * T(x_ - mu2_x) / (T(2) * sigma2_x * sigma2_x) +
            T(y_ - mu2_y) * T(y_ - mu2_y) / (T(2) * sigma2_y * sigma2_y)));

        residual[0] = z_ - (f1 + f2);
        return true;
    }

private:
    const double x_;
    const double y_;
    const double z_;
};


// 二维双峰高斯数据生成
void GenerateBimodalGaussianData(std::vector<double>& x_data, std::vector<double>& y_data, std::vector<double>& z_data,
    double A1, double mu1_x, double mu1_y, double sigma1_x, double sigma1_y,
    double A2, double mu2_x, double mu2_y, double sigma2_x, double sigma2_y,
    int num_points, double noise_level)
{
    std::default_random_engine generator;
    std::normal_distribution<double> distribution(0.0, noise_level);

    for (int i = 0; i < num_points; ++i)
    {
        double x = static_cast<double>(rand()) / RAND_MAX * 10.0 - 5.0; // 范围 [-5, 5]
        double y = static_cast<double>(rand()) / RAND_MAX * 10.0 - 5.0; // 范围 [-5, 5]

        double f1 = A1 * exp(-(pow(x - mu1_x, 2) / (2 * pow(sigma1_x, 2)) +
            pow(y - mu1_y, 2) / (2 * pow(sigma1_y, 2))));
        double f2 = A2 * exp(-(pow(x - mu2_x, 2) / (2 * pow(sigma2_x, 2)) +
            pow(y - mu2_y, 2) / (2 * pow(sigma2_y, 2))));

        double z = f1 + f2 + distribution(generator); // 添加噪声

        x_data.push_back(x);
        y_data.push_back(y);
        z_data.push_back(z);
    }
}

// 二维双峰高斯函数拟合
bool FitBimodalGaussian2D(const std::vector<double>& x_data, const std::vector<double>& y_data, const std::vector<double>& z_data, double* params)
{
    ceres::Problem problem;

    for (size_t i = 0; i < x_data.size(); ++i)
    {
        problem.AddResidualBlock(
            new ceres::AutoDiffCostFunction<BimodalGaussianResidual2D, 1, 10>(
                new BimodalGaussianResidual2D(x_data[i], y_data[i], z_data[i])),
            nullptr, params);
    }

    ceres::Solver::Options options;
    options.linear_solver_type = ceres::DENSE_QR;
    options.minimizer_progress_to_stdout = false;

    ceres::Solver::Summary summary;
    ceres::Solve(options, &problem, &summary);

    return true;
}


int main()
{
    //==========================数据生成==========================

    std::vector<double> x_data, y_data, z_data;
    int num_points = 1000;
    double noise_level = 0.2;

    double A1_true = -3.0, mu1_x_true = -2.0, mu1_y_true = 2.0, sigma1_x_true = 1.0, sigma1_y_true = 1.0;
    double A2_true = 5.0, mu2_x_true = 2.0, mu2_y_true = -2.0, sigma2_x_true = 1.5, sigma2_y_true = 1.5;

    GenerateBimodalGaussianData(x_data, y_data, z_data,
        A1_true, mu1_x_true, mu1_y_true, sigma1_x_true, sigma1_y_true,
        A2_true, mu2_x_true, mu2_y_true, sigma2_x_true, sigma2_y_true,
        num_points, noise_level);

    //==========================确定初值==========================
    double initial_params[] = { A1_true, mu1_x_true, mu1_y_true,  sigma1_x_true, sigma1_y_true, A2_true , mu2_x_true ,  mu2_y_true, sigma2_x_true, sigma2_y_true };
    // 添加随机噪声
    std::random_device rd;
    std::mt19937 gen(rd());
    std::normal_distribution<> d(0, 2*noise_level);
    for (int i = 0; i < 10;++i)initial_params[i]+= d(gen);

    //==========================优化求解==========================

    std::cout << "\nInitial Parameters:\n";
    std::cout << "A1: " << initial_params[0] << ", mu1_x: " << initial_params[1] << ", mu1_y: " << initial_params[2] << ", sigma1_x: " << initial_params[3] << ", sigma1_y: " << initial_params[4] << "\n";
    std::cout << "A2: " << initial_params[5] << ", mu2_x: " << initial_params[6] << ", mu2_y: " << initial_params[7] << ", sigma2_x: " << initial_params[8] << ", sigma2_y: " << initial_params[9] << "\n";
    FitBimodalGaussian2D(x_data, y_data, z_data, initial_params);
    std::cout << "\nFinal Fitted Parameters:\n";
    std::cout << "A1: " << initial_params[0] << ", mu1_x: " << initial_params[1] << ", mu1_y: " << initial_params[2] << ", sigma1_x: " << initial_params[3] << ", sigma1_y: " << initial_params[4] << "\n";
    std::cout << "A2: " << initial_params[5] << ", mu2_x: " << initial_params[6] << ", mu2_y: " << initial_params[7] << ", sigma2_x: " << initial_params[8] << ", sigma2_y: " << initial_params[9] << "\n";


    //==========================图形绘制==========================

    // 将数据传递给 Gnuplot 并绘制
    std::vector<std::tuple<double, double, double>> data;
    for (size_t i = 0; i < x_data.size(); ++i)
    {
        data.emplace_back(x_data[i], y_data[i], z_data[i]);
    }

    auto format_param = [](double value)
    {
        std::ostringstream oss;
        oss << std::fixed << std::setprecision(3) << value;
        return oss.str();
    };

    Gnuplot gp;

    // 设置绘图标题和坐标轴标签
    gp << "set terminal wxt enhanced\n";    // 使用支持交互的终端
    gp << "set mouse\n";                    // 启用鼠标交互
    gp << "set title 'BimodalGaussian2D Fit'\n";
    gp << "set xlabel 'X'\n";
    gp << "set ylabel 'Y'\n";
    gp << "set zlabel 'Z'\n";

    // 设置绘图为三维模式
    gp << "set style data points\n";    // 使用点样式
    gp << "set ticslevel 0\n";          // 调整 Z 轴刻度位置
    gp << "set isosamples 100,100\n";   // 设置采样数量
    gp << "set hidden3d\n";             // 隐藏被遮挡区域

    gp << "splot '-' with points pointtype 7 pointsize 0.5 title 'Scatter Points', "
        << format_param(initial_params[0]) << "*exp(-((x - " << format_param(initial_params[1])
        << ")**2/(2*" << format_param(initial_params[3]) << "**2) + (y - " << format_param(initial_params[2])
        << ")**2/(2*" << format_param(initial_params[4]) << "**2))) "
        << "+ " << format_param(initial_params[5]) << "*exp(-((x - " << format_param(initial_params[6])
        << ")**2/(2*" << format_param(initial_params[8]) << "**2) + (y - " << format_param(initial_params[7])
        << ")**2/(2*" << format_param(initial_params[9]) << "**2)))\n";
    gp.send(data);
    gp.flush();

    return 0;
}

在真值基础上添加2倍随机噪声的初值及拟合结果如下所示,可以看到其拟合得相当准确的。

Initial Parameters:
A1: -3.11395, mu1_x: -2.43491, mu1_y: 1.9397, sigma1_x: 1.21568, sigma1_y: 1.3008
A2: 4.93584, mu2_x: 2.1473, mu2_y: -1.74558, sigma2_x: 1.88884, sigma2_y: 1.73732

Final Fitted Parameters:
A1: -3.00556, mu1_x: -2.01663, mu1_y: 1.98635, sigma1_x: 0.96315, sigma1_y: 1.01506
A2: 4.95913, mu2_x: 2.01769, mu2_y: -2.01035, sigma2_x: 1.49838, sigma2_y: 1.49798

在这里插入图片描述

在这里插入图片描述


多维多峰高斯函数

同样,理论上也可以发散到多维多峰高斯函数的情形,不过一方面这种情况比较少见,另一方面高维的初值一般不好自动确定,可能应用得很少,这里不再作说明和演示。


补充
对ceres拟合感兴趣可移步C++ 带约束的Ceres形状拟合
对gnuplot绘制感兴趣可移步如何在C++中优雅地绘制图表


打完收工。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

WoooChi

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值