2.1_最小错误率贝叶斯决策

最新推荐文章于 2024-07-17 21:40:42 发布

Gantnocap

最新推荐文章于 2024-07-17 21:40:42 发布

阅读量1.7k

点赞数 46

分类专栏：模式识别与机器学习文章标签：人工智能机器学习算法概率论

本文链接：https://blog.csdn.net/2301_79449205/article/details/134646696

版权

模式识别与机器学习专栏收录该内容

7 篇文章 2 订阅

订阅专栏

目标：最小化错误率
问题：错误率是啥？

先定义错误概率：给定样本 $\mathbf{x}_d$ , 是d维列向量（具有d个特征的样本），把该样本分类错误的概率，比如把明明是一只狗，分类成猫的概率是多少，这就是对于该样本的分类的错误概率！对象是一个样本。

那什么又是错误率呢？：
错误率指的就是对于所有的样本，把每个样本分错的概率加权平均就是对于整个样本集的分类错误率。即错误概率的期望。对象是所有样本。

接下来给出具体分析，考虑二分类情况 $\omega_1$ 和 $\omega_2$ , 给定一个样本 $\mathbf{x}_d$ , $\mathbf{x}_d$ 要么属于 $\omega_1$ 类，要么属于 $\omega_2$ 类。假定各类的先验概率 $P(\omega_i)$ 已知，且已知各类中的样本分布密度，即类条件概率密度 $P(\mathbf{x}_d|\omega_i)$ 。

我们要做的决策就是对于某个未知样本 $\mathbf{x}_d$ , 判断该样本属于哪一类。
$即给定\mathbf{x}_d, \\ \mathbf{x}_d \in \omega_1\quad or \in \omega_2$

翻译成概率就是 $P(\omega_i|\mathbf{x}_d)$ 这个条件概率的大小。

定义错误概率为如下：
$p(e|\mathbf{x}_d) = \begin{cases} P(\omega_2|\mathbf{x}_d),\quad \mathbf{x}_d \in \omega_1 \\ P(\omega_1|\mathbf{x}_d),\quad \mathbf{x}_d \in \omega_2 \end{cases} \tag{1}$

定义错误率：（为错误概率的期望，考虑所有的样本）
$P(e)=E[p(e|\mathbf{x}_d)]=\int p(e|\mathbf{x}_d)p(\mathbf{x}_d)d\mathbf{x}_d \tag{2}$
其中 $p(\mathbf{x}_d)$ 就是所有样本对应的概率密度。上式就是代表所有样本分类错误概率的加权平均。

目标：最小化式（2）错误率，即
$\int p(e|\mathbf{x})p(\mathbf{x})d\mathbf{x} \tag{3}$

对于式（3）的积分，我们知道样本本身的分布是一定的，即概率密度 $p(\mathbf{x})$ 不会因分类的错误而改变，所以最小化式（3），可以简化成最小化每个样本的错误概率就行了，即
$\quad p(e|\mathbf{x}) \tag{4}$

又由式（1）可知：
$min\quad p(e|\mathbf{x}) = \begin{cases} min\quad P(\omega_2|\mathbf{x}),\quad \mathbf{x} \in \omega_1 \\ min \quad P(\omega_1|\mathbf{x}),\quad \mathbf{x}\in \omega_2 \end{cases} \tag{5}$

注：
$P(\omega_1|\mathbf{x}) + P(\omega_2|\mathbf{x}) = 1\tag{6}$

则对于一个本属于 $\omega_1$ 类的样本 $\mathbf{x}$ , 要最小化 $P(\omega_2|\mathbf{x})$ , 就是最大化 $P(\omega_1|\mathbf{x})$ , 即：
$\quad \mathbf{x} \in \omega_1 \\ min\quad P(\omega_2|\mathbf{x}) => max\quad P(\omega_1|\mathbf{x}) \tag{7}$
所以最小错误率就是最大化该样本的后验概率。得以下决策规则：
$\quad P(\omega_1|\mathbf{x}) > P(\omega_2|\mathbf{x}), then \quad \mathbf{x}\in \omega_1 \quad else \quad \mathbf{x}\in \omega_2$
还可以记作：
$\quad P(\omega_1|\mathbf{x}) \gtrless P(\omega_2|\mathbf{x}) \quad then\quad \mathbf{x}\in \begin{cases}\omega_1 \\ \omega_2\end{cases} \tag{8}$

式（8）即为最小错误率贝叶斯决策，其它等价形式为：
$P(\omega_i|\mathbf{x}) = \max_{j=1, 2}P(\omega_j|\mathbf{x}),\quad then \quad \mathbf{x}\in \omega_i \tag{9}$

根据贝叶斯定理，可得：
$P(\omega_i|\mathbf{x}) = \frac{P(\omega_i, \mathbf{x})}{P(\mathbf{x})}=\frac{P(\mathbf{x}|\omega_i)P(\omega_i)}{\sum\limits_{i=1}^2P(\mathbf{x}|\omega_i)P(\omega_i)} \tag{10}$

对于上式， $P(\omega_i|\mathbf{x})$ 代表后验概率， $P(\mathbf{x}|\omega_i)$ 代表类条件概率， $P(\omega_i)$ 代表先验概率，同时分母是代表样本的分布，是一定的，所以这里对后验的影响只需考虑分子即可，即类条件概率和先验概率。所以决策规则中，两个后验概率的比较，可以转换成式（10）中分子的比较, 可以写成如下:
$\quad P(\omega_1|\mathbf{x}) \gtrless P(\omega_2|\mathbf{x})$
$P(\mathbf{x}|\omega_1)P(\omega_1) \gtrless P(\mathbf{x}|\omega_2)P(\omega_2) \tag{11}$
又因为先验 $P(\omega_i)$ 与样本是无关的，所以继续整理式（11），得决策规则为：
$\quad l(\mathbf{x})=\frac{P(\mathbf{x}|\omega_1)}{P(\mathbf{x}|\omega_2)} \gtrless \frac{P(\omega_2)}{P(\omega_1)} = \lambda (阈值), \quad then \quad \mathbf{x}\in \begin{cases}\omega_1 \\ \omega_2\end{cases} \tag{12}$

对于式（12）出现的 $l(\mathbf{x})$ , 可知类条件概率密度 $P(\mathbf{x}|\omega_i)$ 反映了在 $\omega_i$ 类中，观察到样本 $\mathbf{x}$ 的相对可能性（likelihood），似然度，故 $l(\mathbf{x})$ 被称作似然比（likelihood ratio）。如果对其取负对数，就化成了加法，即：
$h(\mathbf{x}) = -\ln{l(\mathbf{x})}=-\ln{P(\mathbf{x}|\omega_1)} + \ln{P(\mathbf{x}|\omega_2)} \tag{13}$
$\quad h(\mathbf{x}) \gtrless \ln{\frac{P(\omega_1)}{P(\omega_2)}}, \quad then \quad \mathbf{x} \begin{cases}\omega_1 \\ \omega_2 \end{cases} \tag{14}$

以上依然是决策规则，只是使用对数简化了计算，但本质没有改变。

二. 多类情况

决策规则可以表示为：
$\quad P(\omega_i|\mathbf{x}) = \max_{j=1,2,...,n} P(\omega_j|\mathbf{x}), \quad then \quad \mathbf{x}\in \omega_i \tag{15}$
或者等价于：
$\quad P(\omega_i|\mathbf{x}) = P(\mathbf{x}|\omega_i)P(\omega_i) = \max_{j=1,2,...,n} P(\omega_j|\mathbf{x}), \quad then \quad \mathbf{x}\in \omega_i \tag{16}$