贝叶斯推断/决策论是概率框架下实施决策的基本方法。贝叶斯方法考虑如何基于概率和损失来最小化总体损失。
文章目录
常见的概率分布
Let Z Z Z be some random variable. Then associated with Z Z Z is a probability distribution function that assigns probabilities to the different outcomes Z Z Z can take.
Poisson Distributon
If Z Z Z is discrete, then its distribution is called a probability mass function , which measures the probability that Z Z Z takes on the value k k k , denoted P ( Z = k ) P(Z = k ) P(Z=k). Let’s introduce the first very useful probability mass function. We say that Z Z Z is Poisson -distributed ( Z ∼ P o i ( λ ) Z \sim \mathrm{Poi}(\lambda) Z∼Poi(λ)) if:
P ( Z = k ) = λ k e − λ k ! , λ > 0 , k = 0 , 1 , 2 , ⋯ P(Z=k)=\frac{\lambda^{k} e^{-\lambda}}{k!},\quad \lambda>0, k=0,1,2,\cdots P(Z=k)=k!λke−λ,λ>0,k=0,1,2,⋯ One useful property of the Poisson distribution is that its expected value is equal to its parameter. That is,
E [ Z ∣ λ ] = λ \mathrm{E}[Z|\lambda]=\lambda E[Z∣λ]=λ
Binary Distribution and Bernoulli Distribution
Similar to Poisson distribution, Binary distribution ( Z ∼ B i n ( N , p ) Z\sim\mathrm{Bin}(N,p) Z∼Bin(N,p)) is also a discrete distribution, nevertheless, it only cares about 0 0 0 to N N N rather than 0 0 0 to ∞ \infty ∞.
P ( Z = k ) = ( N k ) p k ( 1 − p ) N − k P(Z=k)=\begin{pmatrix}N\\k\end{pmatrix}p^{k}(1-p)^{N-k} P(Z=k)=(Nk)pk(1−p)N−k The expected value is
E [ Z ∣ N , p ] = N p \mathrm{E}[Z|N,p]=Np E[Z∣N,p]=Np Once N = 1 N=1 N=1 the Binary distribution reduces to Bernoulli distribution.
Exponential Density
Instead of a probability mass function, a continuous random variable has a probability density function. An example of a continuous random variable is a random variable with exponential density ( Z ∼ E x p ( λ ) Z\sim \mathrm{Exp}(\lambda) Z∼Exp(λ)).
f z ( z ∣ λ ) = λ e − λ z , z ≥ 0 f_{z}(z|\lambda)=\lambda e^{-\lambda z},\quad z\geq 0 fz(z∣λ)=λe−λz,z≥0 Given a specific λ \lambda λ, the expected value of an exponential random variable is equal to the inverse of λ \lambda λ, i.e.
E [ Z ∣ λ ] = 1 / λ \mathrm{E}[Z|\lambda]=1/\lambda E[Z∣λ]=1/λ
Normal Distribution
For the sake of data analysis, we denote a normal distribution as Z ∼ N ( μ , 1 / τ ) Z\sim N(\mu, 1/\tau) Z∼N(μ,1/τ) . Then the smaller τ \tau τ means the broader bandwidth. The probability density of a normal-distributed random variable is:
f ( z ∣ μ , τ ) = τ 2 π E x p ( − τ 2 ( z − μ ) 2 ) f(z|\mu,\tau)=\sqrt{\frac{\tau}{2\pi}}\mathrm{Exp}\left(-\frac{\tau}{2}(z-\mu)^{2}\right) f(z∣μ,τ)=2πτExp(−2τ(z−μ)2) The expectation of a normal distribution is nothing but μ \mu μ.
贝叶斯方法简述
统计学中有两个主要学派:频率学派(又称经典学派)和贝叶斯学派。频率学派利用总体信息和样本信息进行统计推断,贝叶斯学派与之的区别在于还用到了先验信息。简单地说,贝叶斯方法是通过新得到的证据不断地更新我们的信念。
基本原理
贝叶斯学派最基本的观点是:任一未知量 θ \theta θ (或者前文中的 λ \lambda λ, μ \mu μ 和 τ \tau τ) 都可以看做随机变量,可用一个概率分布区描述,这个分布称为先验分布 (记为 π ( θ ) \pi(\theta) π(θ))。因为任一未知量都有不确定性,而在表述不确定性地程度时,概率与概率分布是最好的语言。依赖于参数 θ \theta θ 的密度函数在经典统计学中记为 p ( x , θ ) p(x,\theta) p(x,θ),它表示参数空间 Θ \Theta Θ 中不同的 θ \theta θ 对应不同的分布。在贝叶斯统计中应记为 p ( x ∣ θ ) p(x|\theta) p(x∣θ) ,表示随机变量 θ \theta θ 给定某个值时, X X X 的条件密度函数。
从贝叶斯观点看,样本 x x x 的产生要分两步进行:首先,设想从先验分布 π ( θ ) \pi(\theta) π(θ) 中产生一个样本 θ ′ \theta' θ′ ,这一步人是看不到的,所以是“设想”(注意设想也是有意义的,它反映了同一事件不同人的不同看法,为个人之见的差异留有余地);再从 p ( x ∣ θ ′ ) p(x|\theta') p(x∣θ′) 中产生一个样本 x = ( x 1 , x 2 , x 3 , … , x n ) x=(x_{1},x_{2},x_{3},\dots,x_{n}) x=(x1,x2,x3,…,xn) 。这时样本 x x x 的联合条件密度函数为:
p ( x ∣ θ ′ ) = ∏ i = 1 n p ( x i ∣ θ ′ ) p(x|\theta')=\prod_{i=1}^{n}p(x_{i}|\theta') p(x∣θ′)=i=1∏np(xi∣θ′) 这个联合分布综合了总体信息和样本信息,又称为似然函数。它与极大似然估计中的似然函数没有什么区别。 θ ′ \theta' θ′ 仍然是未知的,它是按照先验分布 π ( θ ) \pi(\theta) π(θ) 产生的,为了把先验信息综合进去,不能只考虑 θ ′ \theta' θ′,对 θ \theta