浅谈贝叶斯推断、朴素贝叶斯分类与MCMC

最新推荐文章于 2024-06-29 18:54:38 发布

止于至玄

最新推荐文章于 2024-06-29 18:54:38 发布

阅读量3.4k

点赞数 6

分类专栏： Machine Learning 文章标签：贝叶斯分析

本文链接：https://blog.csdn.net/philthinker/article/details/80791082

版权

本文介绍了贝叶斯推断的基本原理，包括常见概率分布如泊松、二项、指数和正态分布。接着讲解了贝叶斯方法，强调了先验分布和后验分布的概念，以及损失函数在决策中的作用。文章还涉及了朴素贝叶斯分类和贝叶斯方法与MCMC的关系，解释了MCMC如何用于探索高维后验分布。

摘要由CSDN通过智能技术生成

贝叶斯推断/决策论是概率框架下实施决策的基本方法。贝叶斯方法考虑如何基于概率和损失来最小化总体损失。

常见的概率分布

Let $Z$ be some random variable. Then associated with $Z$ is a probability distribution function that assigns probabilities to the different outcomes $Z$ can take.

Poisson Distributon

If $Z$ is discrete, then its distribution is called a probability mass function , which measures the probability that $Z$ takes on the value $k$ , denoted $P (Z = k)$ . Let’s introduce the first very useful probability mass function. We say that $Z$ is Poisson -distributed ( $\sim \mathrm{Poi}(\lambda)$ ) if:
$P(Z=k)=\frac{\lambda^{k} e^{-\lambda}}{k!},\quad \lambda>0, k=0,1,2,\cdots$ One useful property of the Poisson distribution is that its expected value is equal to its parameter. That is,
$\mathrm{E}[Z|\lambda]=\lambda$

Binary Distribution and Bernoulli Distribution

Similar to Poisson distribution, Binary distribution ( $Z\sim\mathrm{Bin}(N,p)$ ) is also a discrete distribution, nevertheless, it only cares about $0$ to $N$ rather than $0$ to $\infty$ .
$P(Z=k)=\begin{pmatrix}N\\k\end{pmatrix}p^{k}(1-p)^{N-k}$ The expected value is
$\mathrm{E}[Z|N,p]=Np$ Once $N = 1$ the Binary distribution reduces to Bernoulli distribution.

Exponential Density

Instead of a probability mass function, a continuous random variable has a probability density function. An example of a continuous random variable is a random variable with exponential density ( $Z\sim \mathrm{Exp}(\lambda)$ ).
$f_{z}(z|\lambda)=\lambda e^{-\lambda z},\quad z\geq 0$ Given a specific $\lambda$ , the expected value of an exponential random variable is equal to the inverse of $\lambda$ , i.e.
$\mathrm{E}[Z|\lambda]=1/\lambda$

Normal Distribution

For the sake of data analysis, we denote a normal distribution as $Z\sim N(\mu, 1/\tau)$ . Then the smaller $\tau$ means the broader bandwidth. The probability density of a normal-distributed random variable is:
$f(z|\mu,\tau)=\sqrt{\frac{\tau}{2\pi}}\mathrm{Exp}\left(-\frac{\tau}{2}(z-\mu)^{2}\right)$ The expectation of a normal distribution is nothing but $\mu$ .

贝叶斯方法简述

统计学中有两个主要学派：频率学派（又称经典学派）和贝叶斯学派。频率学派利用总体信息和样本信息进行统计推断，贝叶斯学派与之的区别在于还用到了先验信息。简单地说，贝叶斯方法是通过新得到的证据不断地更新我们的信念。

基本原理

贝叶斯学派最基本的观点是：任一未知量 $\theta$ （或者前文中的 $\lambda$ ， $\mu$ 和 $\tau$ ）都可以看做随机变量，可用一个概率分布区描述，这个分布称为先验分布 （记为 $\pi(\theta)$ ）。因为任一未知量都有不确定性，而在表述不确定性地程度时，概率与概率分布是最好的语言。依赖于参数 $\theta$ 的密度函数在经典统计学中记为 $p(x,\theta)$ ，它表示参数空间 $\Theta$ 中不同的 $\theta$ 对应不同的分布。在贝叶斯统计中应记为 $p(x|\theta)$ ，表示随机变量 $\theta$ 给定某个值时， $X$ 的条件密度函数。

从贝叶斯观点看，样本 $x$ 的产生要分两步进行：首先，设想从先验分布 $\pi(\theta)$ 中产生一个样本 $\theta'$ ，这一步人是看不到的，所以是“设想”（注意设想也是有意义的，它反映了同一事件不同人的不同看法，为个人之见的差异留有余地）；再从 $p(x|\theta')$ 中产生一个样本 $x=(x_{1},x_{2},x_{3},\dots,x_{n})$ 。这时样本 $x$ 的联合条件密度函数为：
$p(x|\theta')=\prod_{i=1}^{n}p(x_{i}|\theta')$ 这个联合分布综合了总体信息和样本信息，又称为似然函数。它与极大似然估计中的似然函数没有什么区别。 $\theta'$ 仍然是未知的，它是按照先验分布 $\pi(\theta)$ 产生的，为了把先验信息综合进去，不能只考虑 $\theta'$ ，对 $\theta$