贝叶斯原理 / 贝叶斯估计 / Recursive Bayesian Filter 自回归(递归)贝叶斯滤波器 原理+Matlab 程序

Bayesian Estimation

补充:一个例子看懂最大后验(使用Hit or Miss代价函数的贝叶斯估计)和极大似然的区别

小明今天没来上学,三个可能的Hypothesis(θ):

小明今天生病了  /  美国总统特朗普会见小明  /  地球遭受陨石撞击

用极大似然(MLE)估计出来的θ_hat(对θ的估计)是“地球遭受陨石撞击”,因为

Likelihood(小明今天没来上学|地球遭受陨石撞击)= 1

而用最大后验求出来的是“小明今天生病了”,因为考虑了先验——“地球毁灭”和“特朗普会见小明”的概率都远低于“小明今天生病了”。

用“奥卡姆剃刀”解释这个现象是模型越复杂(宇宙模型》国际关系模型》生活模型),出现的(先验)概率越低。

The intuition behind the theorem

Imagine you have a hypothesis about some phenomenon in the world. What is the probability that the hypothesis is true?

 

To answer the question, you make observations related to the hypothesis and use Bayes’ theorem to update its probability. The hypothesis has some prior probability which is based on past knowledge (previous observations). To update it with a new observation, you multiply the prior probability by the respective likelihood term and then divide by the evidence term. The updated prior probability is called the posterior probability.

The posterior probability then becomes the next prior which you can update from another observation. And so the cycle continues.

Notational Conventions

Example of Bayesian Inference(Risk Factors Associated with Mortality in Game of Thrones)

Some Key Terms in Bayesian Inference in plain English

Bayesian = Stems from the well known Bayes Theorem which was first derived by Reverend Thomas Bayes.

Inference = Educated guessing

Bayesian Inference = Guessing in the style of Bayes

Prior probability density p(θ)

Our knowledge about q is assumed to be contained in a known prior distribution P(q), which expresses previous knowledge of θ from, for example, a past experience, with the absence of some proof.

Likelihood function p(z|θ)

The form of P(z|q) is assumed known, but the value of q is not known exactly. The likelihood reads as “the probability of the observation, given that the hypothesis is true”. This term basically represents how strongly the hypothesis predicts the observation.

So, the higher the likelihood, the higher the posterior probability is going to be.

Normalization factor(Evidence)  p(z)

The rest of our knowledge about q is contained in a set D of n random variables x1, x2, …, xn that follows P(z)

In the context of Bayes’ theorem, the evidence is the probability of a particular statement being true.

Posterior probability density p??(├ θ┤|z)

The posterior probability is obtained after multiplying the prior probability by the likelihood and then dividing by the evidence.

Bayes’ Theorem (Quick reminder):  posterior=(likelihood ×prior)/evidence

Life Example of Bayesian Inference

There are two ways to use the Bayesian formula here,  the first one is on the slide. While the second one is we view prior as ‘Women & long hair’ , the likelihood is ‘your observation’, the evidence is ‘Man&Longhair + Woman&Longhair ’, and the posterior is your new knowledge about the ‘Woman and long hair’.

Recursive Bayesian Filter

 If the variables are normally distributed and the transitions are linear, the Bayes filter becomes equal to the Kalman filter.

当观测方程中Θ是线性时,自回归贝叶斯就是Kalman

这里xθz是我们对θ观测的数据.

The denominator is constant relative to x, so we can always substitute it for a coefficient C, which can usually be ignored in practice. The numerator can be calculated and then simply normalized, since its integral must be unity.

 

本MATLAB例子是个静态的常量所以他可以看成是对一个一阶马尔可夫过程中一个xk的观测;如果θ是个动态的过程,那么zk只与xk有关。

马尔可夫过程视为对非平稳真实环境的简化,既保留了处理复杂问题的本质,同时也便于进行数学分析。

Since some values of the parameters are more consistent with the data than others, the posterior is narrower than prior.

--》Using evidence narrow the probability distribution

一些参数更符合(更一致)数据,所以他们可能性更大。从公式可以看出为什么会变得尖锐。

the Benefit of Bayesian Approach

Bayesian inference versus Frequentist inference

Two different interpretations of probability have long existed. In Bayesian inference, the prior probabilities are specified and then Bayes theorem is used to make probability statements about the parameter as in equation. In frequentist inference such prior probabilities are considered nonsensical. The parameter θ is considered an unknown constant, not a random variable. Since it is not random, making probability statements doesn’t make sense. A counterargument to this is that even if it is a constant, since it is unknown we may view it as a random variable. The uncertainty may be considered randomness.

这里要讲一下统计学的两种对频率的看法——两个学派。

频率学派关注的是一个点,而贝叶斯关注的是不确定性。

It might be one value, it might be another, it might be a third. Such arguments can and have continued for many years and are very interesting.

Bayesian inference versus Frequentist inference

Bayesian inference updates the probability estimate for a hypothesis as additional evidence is acquired. Bayesian inference is explicitly based on the evidence and prior opinion, which allows it to be based on multiple sets of evidence.

Frequentist inference is capable of making operational decisions and estimating parameters with or without confidence intervals.Frequentist inference is based solely on the probability of the data which is often one set of evidence.

Maximum Likelihood Estimation versus Bayesian Estimation

If you are just interested in determining θ, Bayesian and frequentist methods both offer promising paths toward a solution. Often the two methods generate extremely similar answers anyway, making any argument about which one is better nearly meaningless from the standpoint of whether the method arrives at the correct value of θ. Specifically, often the MSEs of the two methods are identical or nearly identical.  

There are certain problems where the frequentist solution (usually Maximum Likelihood Estimation) is easier to follow, other problems where the Bayesian solution is easier to follow. Thus, a knowledge of both methods is useful.

Bayesian estimation

Bayesian estimation considers (the parameter vector to be estimated) to be a random variable.

Before we observe the data, the parameters are described by a prior which is typically very broad. Once we observed the data, we can make use of Bayes’ Theorem to find posterior

Our general setup is that we have a random sample Z = (x1,x2,…,xn) from a distribution p(y|θ), with θ unknown.

Our goal is to use all the available information to construct estimation θ.

 

Bayes Estimator & Cost function

Bayesian estimators are defined by a minimization problem which seeks for the value of  ? ( θ) ̂   that minimizes the average cost.

Bayes Estimator & Cost function

MAtlab程序

效果:

 

%% 
figure(1);clf;
figure(2);clf;
N = 1000
s=[3;5];  % 真值

n=2*randn(2,N); % 方差为2的高斯白噪声
x=zeros(2,N); % 初始化观测序列

figure(1);
h=plot(s(1),s(2),'r.');  % 真值
set(h,'markersize',40,'linewidth',3); 
axis([0,10,0,10]);
hold off;  
hold on
for i=1:N %画带噪信号噪点
    x(:,i)=s+n(:,i);
    plot(x(1,i),x(2,i),'k.','markersize',10);
end;
pause
%% 贝叶斯迭代初始化

%初始化二维平台,分辨率0,05
Sa=[2:0.05:4];
Sb=[4:0.05:6];

% 无先验知识,均匀分布(相当于方差无限大的正太)
L=length(Sa);
Pr=ones(L,L); % 初始化先验
Po=ones(L,L); %初始化后验

Pr=Pr/sum(sum(Pr)); % 归一化
Po=Po/sum(sum(Po)); % 归一化
figure(1);clf;
colormap(hsv)
mesh(Sa,Sb,Po), axis([2 4 4 6 0 0.015])

%% 贝叶斯迭代开始

[a,b]=find(Po==max(max(Po)));  % 找先验最大点开始(此时后验等于先验)
sest=[Sa(a);Sb(b)];  %从先验概率最大点开始
figure(1);
clf
figure(2);
clf
subplot(211); plot(1,sest(1)); hold on;
line([1,N],[s(1),s(1)]); % 画真值线
subplot(212); plot(1,sest(2)); hold on;
line([1,N],[s(2),s(2)]); % 画真值线

K=[3,0;0,3]; % 二维高斯(协)方差形式
for (n=2:length(x));
    Pr=Po; %上次后验变为下一次的先验
    m=0*Pr;   %获得一个初始化的二维矩阵
    %假设似然函数是协方差为K高斯,假设以每一个带噪信息点为均值计算似然函数和刷新后的后验
    %高度:每一次配方后的e的常数
    %位置其实是相当于对所有样点取均值(概率均值)
    for (i=1:length(Pr))
       for (j=1:length(Pr))
           me=[Sa(i);Sb(j)];
           m(i,j) = 1/sqrt((2*pi)^2*det(K)) * exp(-(x(:,n)-me)'*inv(K)*(x(:,n)-me)/2); %似然函数使用高斯形式         
           m(i,j) = m(i,j) * Pr(i,j); % 对应二维概率相乘,贝叶斯公式的分子
       end;
    end;
    Po=m/sum(sum(m)); %归一化
    figure(1);colormap(hsv);surf(Sa,Sb,Po), axis tight%画3维图

    figure(2);
    [a,b]=find(Po==max(max(Po)));  % 当前我们判断点的位置的坐标
    sest=[Sa(a);Sb(b)];  %对应二维平面的位置
    subplot(211);plot(n,sest(1),'k.');axis([0 N 2 4 ]);h1=text(n,sest(1)+0.1,num2str(sest(1)),'color','r');hold on;drawnow;set(h1,'Visible','off');
    subplot(212); plot(n,sest(2),'k.');axis([0 N 4 6 ]);h2=text(n,sest(2)+0.1,num2str(sest(2)),'color','r');hold on;drawnow;set(h2,'Visible','off');
end;  
subplot(211); hold off;
subplot(212); hold off;

  • 1
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
在信号处理领域,DOA(Direction of Arrival)估计是一项关键技术,主要用于确定多个信号源到达接收阵列的方向。本文将详细探讨三种ESPRIT(Estimation of Signal Parameters via Rotational Invariance Techniques)算法在DOA估计中的实现,以及它们在MATLAB环境中的具体应用。 ESPRIT算法是由Paul Kailath等人于1986年提出的,其核心思想是利用阵列数据的旋转不变性来估计信号源的角度。这种算法相比传统的 MUSIC(Multiple Signal Classification)算法具有较低的计算复杂度,且无需进行特征值分解,因此在实际应用中颇具优势。 1. 普通ESPRIT算法 普通ESPRIT算法分为两个主要步骤:构造等效旋转不变系统和估计角度。通过空间平移(如延时)构建两个子阵列,使得它们之间的关系具有旋转不变性。然后,通过对子阵列数据进行最小二乘拟合,可以得到信号源的角频率估计,进一步转换为DOA估计。 2. 常规ESPRIT算法实现 在描述中提到的`common_esprit_method1.m`和`common_esprit_method2.m`是两种不同的普通ESPRIT算法实现。它们可能在实现细节上略有差异,比如选择子阵列的方式、参数估计的策略等。MATLAB代码通常会包含预处理步骤(如数据归一化)、子阵列构造、旋转不变性矩阵的建立、最小二乘估计等部分。通过运行这两个文件,可以比较它们在估计精度和计算效率上的异同。 3. TLS_ESPRIT算法 TLS(Total Least Squares)ESPRIT是对普通ESPRIT的优化,它考虑了数据噪声的影响,提高了估计的稳健性。在TLS_ESPRIT算法中,不假设数据噪声是高斯白噪声,而是采用总最小二乘准则来拟合数据。这使得算法在噪声环境下表现更优。`TLS_esprit.m`文件应该包含了TLS_ESPRIT算法的完整实现,包括TLS估计的步骤和旋转不变性矩阵的改进处理。 在实际应用中,选择合适的ESPRIT变体取决于系统条件,例如噪声水平、信号质量以及计算资源。通过MATLAB实现,研究者和工程师可以方便地比较不同算法的效果,并根据需要进行调整和优化。同时,这些代码也为教学和学习DOA估计提供了一个直观的平台,有助于深入理解ESPRIT算法的工作原理
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值