贝叶斯原理 / 贝叶斯估计 / Recursive Bayesian Filter 自回归(递归)贝叶斯滤波器 原理+Matlab 程序

Bayesian Estimation

补充:一个例子看懂最大后验(使用Hit or Miss代价函数的贝叶斯估计)和极大似然的区别

小明今天没来上学,三个可能的Hypothesis(θ):

小明今天生病了  /  美国总统特朗普会见小明  /  地球遭受陨石撞击

用极大似然(MLE)估计出来的θ_hat(对θ的估计)是“地球遭受陨石撞击”,因为

Likelihood(小明今天没来上学|地球遭受陨石撞击)= 1

而用最大后验求出来的是“小明今天生病了”,因为考虑了先验——“地球毁灭”和“特朗普会见小明”的概率都远低于“小明今天生病了”。

用“奥卡姆剃刀”解释这个现象是模型越复杂(宇宙模型》国际关系模型》生活模型),出现的(先验)概率越低。

The intuition behind the theorem

Imagine you have a hypothesis about some phenomenon in the world. What is the probability that the hypothesis is true?

 

To answer the question, you make observations related to the hypothesis and use Bayes’ theorem to update its probability. The hypothesis has some prior probability which is based on past knowledge (previous observations). To update it with a new observation, you multiply the prior probability by the respective likelihood term and then divide by the evidence term. The updated prior probability is called the posterior probability.

The posterior probability then becomes the next prior which you can update from another observation. And so the cycle continues.

Notational Conventions

Example of Bayesian Inference(Risk Factors Associated with Mortality in Game of Thrones)

Some Key Terms in Bayesian Inference in plain English

Bayesian = Stems from the well known Bayes Theorem which was first derived by Reverend Thomas Bayes.

Inference = Educated guessing

Bayesian Inference = Guessing in the style of Bayes

Prior probability density p(θ)

Our knowledge about q is assumed to be contained in a known prior distribution P(q), which expresses previous knowledge of θ from, for example, a past experience, with the absence of some proof.

Likelihood function p(z|θ)

The form of P(z|q) is assumed known, but the value of q is not known exactly. The likelihood reads as “the probability of the observation, given that the hypothesis is true”. This term basically represents how strongly the hypothesis predicts the observation.

So, the higher the likelihood, the higher the posterior probability is going to be.

Normalization factor(Evidence)  p(z)

The rest of our knowledge about q is contained in a set D of n random variables x1, x2, …, xn that follows P(z)

In the context of Bayes’ theorem, the evidence is the probability of a particular statement being true.

Posterior probability density p??(├ θ┤|z)

The posterior probability is obtained after multiplying the prior probability by the likelihood and then dividing by the evidence.

Bayes’ Theorem (Quick reminder):  posterior=(likelihood ×prior)/evidence

Life Example of Bayesian Inference

There are two ways to use the Bayesian formula here,  the first one is on the slide. While the second one is we view prior as ‘Women & long hair’ , the likelihood is ‘your observation’, the evidence is ‘Man&Longhair + Woman&Longhair ’, and the posterior is your new knowledge about the ‘Woman and long hair’.

Recursive Bayesian Filter

 If the variables are normally distributed and the transitions are linear, the Bayes filter becomes equal to the Kalman filter.

当观测方程中Θ是线性时,自回归贝叶斯就是Kalman

这里xθz是我们对θ观测的数据.

The denominator is constant relative to x, so we can always substitute it for a coefficient C, which can usually be ignored in practice. The numerator can be calculated and then simply normalized, since its integral must be unity.

 

本MATLAB例子是个静态的常量所以他可以看成是对一个一阶马尔可夫过程中一个xk的观测;如果θ是个动态的过程,那么zk只与xk有关。

马尔可夫过程视为对非平稳真实环境的简化,既保留了处理复杂问题的本质,同时也便于进行数学分析。

Since some values of the parameters are more consistent with the data than others, the posterior is narrower than prior.

--》Using evidence narrow the probability distribution

一些参数更符合(更一致)数据,所以他们可能性更大。从公式可以看出为什么会变得尖锐。

the Benefit of Bayesian Approach

Bayesian inference versus Frequentist inference

Two different interpretations of probability have long existed. In Bayesian inference, the prior probabilities are specified and then Bayes theorem is used to make probability statements about the parameter as in equation. In frequentist inference such prior probabilities are considered nonsensical. The parameter θ is considered an unknown constant, not a random variable. Since it is not random, making probability statements doesn’t make sense. A counterargument to this is that even if it is a constant, since it is unknown we may view it as a random variable. The uncertainty may be considered randomness.

这里要讲一下统计学的两种对频率的看法——两个学派。

频率学派关注的是一个点,而贝叶斯关注的是不确定性。

It might be one value, it might be another, it might be a third. Such arguments can and have continued for many years and are very interesting.

Bayesian inference versus Frequentist inference

Bayesian inference updates the probability estimate for a hypothesis as additional evidence is acquired. Bayesian inference is explicitly based on the evidence and prior opinion, which allows it to be based on multiple sets of evidence.

Frequentist inference is capable of making operational decisions and estimating parameters with or without confidence intervals.Frequentist inference is based solely on the probability of the data which is often one set of evidence.

Maximum Likelihood Estimation versus Bayesian Estimation

If you are just interested in determining θ, Bayesian and frequentist methods both offer promising paths toward a solution. Often the two methods generate extremely similar answers anyway, making any argument about which one is better nearly meaningless from the standpoint of whether the method arrives at the correct value of θ. Specifically, often the MSEs of the two methods are identical or nearly identical.  

There are certain problems where the frequentist solution (usually Maximum Likelihood Estimation) is easier to follow, other problems where the Bayesian solution is easier to follow. Thus, a knowledge of both methods is useful.

Bayesian estimation

Bayesian estimation considers (the parameter vector to be estimated) to be a random variable.

Before we observe the data, the parameters are described by a prior which is typically very broad. Once we observed the data, we can make use of Bayes’ Theorem to find posterior

Our general setup is that we have a random sample Z = (x1,x2,…,xn) from a distribution p(y|θ), with θ unknown.

Our goal is to use all the available information to construct estimation θ.

 

Bayes Estimator & Cost function

Bayesian estimators are defined by a minimization problem which seeks for the value of  ? ( θ) ̂   that minimizes the average cost.

Bayes Estimator & Cost function

MAtlab程序

效果:

 

%% 
figure(1);clf;
figure(2);clf;
N = 1000
s=[3;5];  % 真值

n=2*randn(2,N); % 方差为2的高斯白噪声
x=zeros(2,N); % 初始化观测序列

figure(1);
h=plot(s(1),s(2),'r.');  % 真值
set(h,'markersize',40,'linewidth',3); 
axis([0,10,0,10]);
hold off;  
hold on
for i=1:N %画带噪信号噪点
    x(:,i)=s+n(:,i);
    plot(x(1,i),x(2,i),'k.','markersize',10);
end;
pause
%% 贝叶斯迭代初始化

%初始化二维平台,分辨率0,05
Sa=[2:0.05:4];
Sb=[4:0.05:6];

% 无先验知识,均匀分布(相当于方差无限大的正太)
L=length(Sa);
Pr=ones(L,L); % 初始化先验
Po=ones(L,L); %初始化后验

Pr=Pr/sum(sum(Pr)); % 归一化
Po=Po/sum(sum(Po)); % 归一化
figure(1);clf;
colormap(hsv)
mesh(Sa,Sb,Po), axis([2 4 4 6 0 0.015])

%% 贝叶斯迭代开始

[a,b]=find(Po==max(max(Po)));  % 找先验最大点开始(此时后验等于先验)
sest=[Sa(a);Sb(b)];  %从先验概率最大点开始
figure(1);
clf
figure(2);
clf
subplot(211); plot(1,sest(1)); hold on;
line([1,N],[s(1),s(1)]); % 画真值线
subplot(212); plot(1,sest(2)); hold on;
line([1,N],[s(2),s(2)]); % 画真值线

K=[3,0;0,3]; % 二维高斯(协)方差形式
for (n=2:length(x));
    Pr=Po; %上次后验变为下一次的先验
    m=0*Pr;   %获得一个初始化的二维矩阵
    %假设似然函数是协方差为K高斯,假设以每一个带噪信息点为均值计算似然函数和刷新后的后验
    %高度:每一次配方后的e的常数
    %位置其实是相当于对所有样点取均值(概率均值)
    for (i=1:length(Pr))
       for (j=1:length(Pr))
           me=[Sa(i);Sb(j)];
           m(i,j) = 1/sqrt((2*pi)^2*det(K)) * exp(-(x(:,n)-me)'*inv(K)*(x(:,n)-me)/2); %似然函数使用高斯形式         
           m(i,j) = m(i,j) * Pr(i,j); % 对应二维概率相乘,贝叶斯公式的分子
       end;
    end;
    Po=m/sum(sum(m)); %归一化
    figure(1);colormap(hsv);surf(Sa,Sb,Po), axis tight%画3维图

    figure(2);
    [a,b]=find(Po==max(max(Po)));  % 当前我们判断点的位置的坐标
    sest=[Sa(a);Sb(b)];  %对应二维平面的位置
    subplot(211);plot(n,sest(1),'k.');axis([0 N 2 4 ]);h1=text(n,sest(1)+0.1,num2str(sest(1)),'color','r');hold on;drawnow;set(h1,'Visible','off');
    subplot(212); plot(n,sest(2),'k.');axis([0 N 4 6 ]);h2=text(n,sest(2)+0.1,num2str(sest(2)),'color','r');hold on;drawnow;set(h2,'Visible','off');
end;  
subplot(211); hold off;
subplot(212); hold off;

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值