贝叶斯判别

参考文献:

  1. 6 判别分析 | 多元统计分析示例
  2. https://www.cnblogs.com/qizhou/p/13495598.html

一、问题描述

贝叶斯判别的本质是一类分类问题:基于若干采样样本,如何学习一个分类器对新样本数据进行分类并保证分类错误的概率最小。

假设

  • 一共存在 n 个类别,分别为 c_0,c_1,...,c_{n-1}
  • 每个类别都能够用相同的 p 个属性进行描述
  • 对整体样本进行 m 次的采样,分别为 x_0,x_1,...,x_{n-1}
  • 每个采样样本 x_i 包含 p 个属性,即 x_i = [f_0,f_1,...,f_{p-1}],并对应一个类别 c_i

贝叶斯假设某一个新样本 x 属于每个类别 c 为一个随机变量,服从概率分布 p(c|x),那么只要分别计算出 p(c_0|x),p(c_1|x),...,p(c_{n-1}|x) 的概率,那么概率最大的那个类别就是使分类错误概率最小的分类结果。

二、理论推导

后验概率 p(c|x) 无法直接计算,因此根据贝叶斯公式,将后验概率转化为先验概率 p(c) 与似然概率 p(x|c) 乘积的形式.

p(c|x)=\frac{p(c)p(x|c)}{p(x)}\sim p(c)p(x|c)

先验概率 p(c) 描述了每个类别在全体样本中的比例。在没有先验信息的条件下可以假定每个类别在样本中均匀分布,即 p(c_i)=\frac{1}{n};也可以根据已有样本中不同类别出现的频率对类别的分布进行近似。

似然概率 p(x|c) 描述了对应类别属性的分布,一般假定似然概率服从多元高斯分布,即

p(x|c) = \frac{1}{\sqrt{(2\pi)^p\Sigma}}exp^{(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))}

其中 \mu 为每个类别对应属性的平均值,\Sigma 为每个类别对应属性的协方差矩阵。

因此,后验概率

p(c|x)\sim p(c)p(x|c)=p(c)\frac{1}{\sqrt{(2\pi)^p\Sigma}}exp^{(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))}

进一步省略常数项,取对数得到

p(c|x)\sim ln(p(c)) - \frac{1}{2}ln(| \Sigma|) - \frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)

分别得到不同类别情况下 p(c|x) 的均值和方差,然后计算新样本在每个类别下的后验概率,后验概率最大的类别就是贝叶斯判别的类别。

三、后验概率均值和方差计算

后验概率的均值计算比较简单

\mu_i = [\mu_i^0,\mu_i^1,...,\mu_i^{p-1}] = \frac{1}{m_i}\sum_{j=1}^{m_i}x_i^j

其中 \mu_i 是第 i 个类别所有特征的均值,为一个 p 维向量,分别对应每个特性的均值;m_i 表示第 i 类别一共采样了 m_i 个样本;x_i^j 表示第 j 个属于第 i 个类别的样本。

后验概率的协方差 \Sigma=\frac{1}{m_i-1}B^TB,其中

 B=\begin{bmatrix} x_i^0-\mu_i \\ x_i^1-\mu_i \\ ... \\ x_i^{m_{i-1}}-\mu_i \\ \end{bmatrix} 为一个 m_i \times p 维的矩阵。

四、代码实现

Python 实现

import numpy as np

tmp = np.loadtxt("iris.csv", dtype=str, delimiter=",")
data = tmp[1:,1:-1].astype(float)
label = np.array([['"setosa"', '"versicolor"', '"virginica"'].index(t) for t in tmp[1:, -1]])

data_0 = data[label == 0, :]
m_data_0 = np.mean(data_0, axis=0)
sigma_0 = np.matmul((data_0 - m_data_0).T, (data_0 - m_data_0)) / (len(data_0) - 1)

data_1 = data[label == 1, :]
m_data_1 = np.mean(data_1, axis=0)
sigma_1 = np.matmul((data_1 - m_data_1).T, (data_1 - m_data_1)) / (len(data_1) - 1)

data_2 = data[label == 2, :]
m_data_2 = np.mean(data_2, axis=0)
sigma_2 = np.matmul((data_2 - m_data_2).T, (data_2 - m_data_2)) / (len(data_2) - 1)

d = data[-1]
p_0 = np.log(50 / 150) - 0.5 * np.log(np.linalg.det(sigma_0)) - 0.5 * (d - m_data_0) @ np.linalg.inv(sigma_0) @ (d - m_data_0).T
p_1 = np.log(50 / 150) - 0.5 * np.log(np.linalg.det(sigma_1)) - 0.5 * (d - m_data_1) @ np.linalg.inv(sigma_1) @ (d - m_data_1).T
p_2 = np.log(50 / 150) - 0.5 * np.log(np.linalg.det(sigma_2)) - 0.5 * (d - m_data_2) @ np.linalg.inv(sigma_2) @ (d - m_data_2).T

matlab 实现

clc;
clear;

tmp = importdata('iris.csv', ',', 1);
data = tmp.data(:, 1:end-1);
label = zeros(1,150);
label(51:100) = 1;
label(101:end) = 2;

data_0 = data(label == 0, :);
m_data_0 = mean(data_0, dim=1);
sigma_0 = cov(data_0);

data_1 = data(label == 1, :);
m_data_1 = mean(data_1, dim=1);
sigma_1 = cov(data_1);

data_2 = data(label == 2, :);
m_data_2 = mean(data_2, dim=1);
sigma_2 = cov(data_2);

d = data(end, :);
p_0 = log(50 / 150) - 0.5 * log(det(sigma_0)) - 0.5 * (d - m_data_0) * inv(sigma_0) * (d - m_data_0)'
p_1 = log(50 / 150) - 0.5 * log(det(sigma_1)) - 0.5 * (d - m_data_1) * inv(sigma_1) * (d - m_data_1)'
p_2 = log(50 / 150) - 0.5 * log(det(sigma_2)) - 0.5 * (d - m_data_2) * inv(sigma_2) * (d - m_data_2)'

 测试数据 iris.csv

Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
1,5.1,3.5,1.4,0.2,Iris-setosa
2,4.9,3.0,1.4,0.2,Iris-setosa
3,4.7,3.2,1.3,0.2,Iris-setosa
4,4.6,3.1,1.5,0.2,Iris-setosa
5,5.0,3.6,1.4,0.2,Iris-setosa
6,5.4,3.9,1.7,0.4,Iris-setosa
7,4.6,3.4,1.4,0.3,Iris-setosa
8,5.0,3.4,1.5,0.2,Iris-setosa
9,4.4,2.9,1.4,0.2,Iris-setosa
10,4.9,3.1,1.5,0.1,Iris-setosa
11,5.4,3.7,1.5,0.2,Iris-setosa
12,4.8,3.4,1.6,0.2,Iris-setosa
13,4.8,3.0,1.4,0.1,Iris-setosa
14,4.3,3.0,1.1,0.1,Iris-setosa
15,5.8,4.0,1.2,0.2,Iris-setosa
16,5.7,4.4,1.5,0.4,Iris-setosa
17,5.4,3.9,1.3,0.4,Iris-setosa
18,5.1,3.5,1.4,0.3,Iris-setosa
19,5.7,3.8,1.7,0.3,Iris-setosa
20,5.1,3.8,1.5,0.3,Iris-setosa
21,5.4,3.4,1.7,0.2,Iris-setosa
22,5.1,3.7,1.5,0.4,Iris-setosa
23,4.6,3.6,1.0,0.2,Iris-setosa
24,5.1,3.3,1.7,0.5,Iris-setosa
25,4.8,3.4,1.9,0.2,Iris-setosa
26,5.0,3.0,1.6,0.2,Iris-setosa
27,5.0,3.4,1.6,0.4,Iris-setosa
28,5.2,3.5,1.5,0.2,Iris-setosa
29,5.2,3.4,1.4,0.2,Iris-setosa
30,4.7,3.2,1.6,0.2,Iris-setosa
31,4.8,3.1,1.6,0.2,Iris-setosa
32,5.4,3.4,1.5,0.4,Iris-setosa
33,5.2,4.1,1.5,0.1,Iris-setosa
34,5.5,4.2,1.4,0.2,Iris-setosa
35,4.9,3.1,1.5,0.1,Iris-setosa
36,5.0,3.2,1.2,0.2,Iris-setosa
37,5.5,3.5,1.3,0.2,Iris-setosa
38,4.9,3.1,1.5,0.1,Iris-setosa
39,4.4,3.0,1.3,0.2,Iris-setosa
40,5.1,3.4,1.5,0.2,Iris-setosa
41,5.0,3.5,1.3,0.3,Iris-setosa
42,4.5,2.3,1.3,0.3,Iris-setosa
43,4.4,3.2,1.3,0.2,Iris-setosa
44,5.0,3.5,1.6,0.6,Iris-setosa
45,5.1,3.8,1.9,0.4,Iris-setosa
46,4.8,3.0,1.4,0.3,Iris-setosa
47,5.1,3.8,1.6,0.2,Iris-setosa
48,4.6,3.2,1.4,0.2,Iris-setosa
49,5.3,3.7,1.5,0.2,Iris-setosa
50,5.0,3.3,1.4,0.2,Iris-setosa
51,7.0,3.2,4.7,1.4,Iris-versicolor
52,6.4,3.2,4.5,1.5,Iris-versicolor
53,6.9,3.1,4.9,1.5,Iris-versicolor
54,5.5,2.3,4.0,1.3,Iris-versicolor
55,6.5,2.8,4.6,1.5,Iris-versicolor
56,5.7,2.8,4.5,1.3,Iris-versicolor
57,6.3,3.3,4.7,1.6,Iris-versicolor
58,4.9,2.4,3.3,1.0,Iris-versicolor
59,6.6,2.9,4.6,1.3,Iris-versicolor
60,5.2,2.7,3.9,1.4,Iris-versicolor
61,5.0,2.0,3.5,1.0,Iris-versicolor
62,5.9,3.0,4.2,1.5,Iris-versicolor
63,6.0,2.2,4.0,1.0,Iris-versicolor
64,6.1,2.9,4.7,1.4,Iris-versicolor
65,5.6,2.9,3.6,1.3,Iris-versicolor
66,6.7,3.1,4.4,1.4,Iris-versicolor
67,5.6,3.0,4.5,1.5,Iris-versicolor
68,5.8,2.7,4.1,1.0,Iris-versicolor
69,6.2,2.2,4.5,1.5,Iris-versicolor
70,5.6,2.5,3.9,1.1,Iris-versicolor
71,5.9,3.2,4.8,1.8,Iris-versicolor
72,6.1,2.8,4.0,1.3,Iris-versicolor
73,6.3,2.5,4.9,1.5,Iris-versicolor
74,6.1,2.8,4.7,1.2,Iris-versicolor
75,6.4,2.9,4.3,1.3,Iris-versicolor
76,6.6,3.0,4.4,1.4,Iris-versicolor
77,6.8,2.8,4.8,1.4,Iris-versicolor
78,6.7,3.0,5.0,1.7,Iris-versicolor
79,6.0,2.9,4.5,1.5,Iris-versicolor
80,5.7,2.6,3.5,1.0,Iris-versicolor
81,5.5,2.4,3.8,1.1,Iris-versicolor
82,5.5,2.4,3.7,1.0,Iris-versicolor
83,5.8,2.7,3.9,1.2,Iris-versicolor
84,6.0,2.7,5.1,1.6,Iris-versicolor
85,5.4,3.0,4.5,1.5,Iris-versicolor
86,6.0,3.4,4.5,1.6,Iris-versicolor
87,6.7,3.1,4.7,1.5,Iris-versicolor
88,6.3,2.3,4.4,1.3,Iris-versicolor
89,5.6,3.0,4.1,1.3,Iris-versicolor
90,5.5,2.5,4.0,1.3,Iris-versicolor
91,5.5,2.6,4.4,1.2,Iris-versicolor
92,6.1,3.0,4.6,1.4,Iris-versicolor
93,5.8,2.6,4.0,1.2,Iris-versicolor
94,5.0,2.3,3.3,1.0,Iris-versicolor
95,5.6,2.7,4.2,1.3,Iris-versicolor
96,5.7,3.0,4.2,1.2,Iris-versicolor
97,5.7,2.9,4.2,1.3,Iris-versicolor
98,6.2,2.9,4.3,1.3,Iris-versicolor
99,5.1,2.5,3.0,1.1,Iris-versicolor
100,5.7,2.8,4.1,1.3,Iris-versicolor
101,6.3,3.3,6.0,2.5,Iris-virginica
102,5.8,2.7,5.1,1.9,Iris-virginica
103,7.1,3.0,5.9,2.1,Iris-virginica
104,6.3,2.9,5.6,1.8,Iris-virginica
105,6.5,3.0,5.8,2.2,Iris-virginica
106,7.6,3.0,6.6,2.1,Iris-virginica
107,4.9,2.5,4.5,1.7,Iris-virginica
108,7.3,2.9,6.3,1.8,Iris-virginica
109,6.7,2.5,5.8,1.8,Iris-virginica
110,7.2,3.6,6.1,2.5,Iris-virginica
111,6.5,3.2,5.1,2.0,Iris-virginica
112,6.4,2.7,5.3,1.9,Iris-virginica
113,6.8,3.0,5.5,2.1,Iris-virginica
114,5.7,2.5,5.0,2.0,Iris-virginica
115,5.8,2.8,5.1,2.4,Iris-virginica
116,6.4,3.2,5.3,2.3,Iris-virginica
117,6.5,3.0,5.5,1.8,Iris-virginica
118,7.7,3.8,6.7,2.2,Iris-virginica
119,7.7,2.6,6.9,2.3,Iris-virginica
120,6.0,2.2,5.0,1.5,Iris-virginica
121,6.9,3.2,5.7,2.3,Iris-virginica
122,5.6,2.8,4.9,2.0,Iris-virginica
123,7.7,2.8,6.7,2.0,Iris-virginica
124,6.3,2.7,4.9,1.8,Iris-virginica
125,6.7,3.3,5.7,2.1,Iris-virginica
126,7.2,3.2,6.0,1.8,Iris-virginica
127,6.2,2.8,4.8,1.8,Iris-virginica
128,6.1,3.0,4.9,1.8,Iris-virginica
129,6.4,2.8,5.6,2.1,Iris-virginica
130,7.2,3.0,5.8,1.6,Iris-virginica
131,7.4,2.8,6.1,1.9,Iris-virginica
132,7.9,3.8,6.4,2.0,Iris-virginica
133,6.4,2.8,5.6,2.2,Iris-virginica
134,6.3,2.8,5.1,1.5,Iris-virginica
135,6.1,2.6,5.6,1.4,Iris-virginica
136,7.7,3.0,6.1,2.3,Iris-virginica
137,6.3,3.4,5.6,2.4,Iris-virginica
138,6.4,3.1,5.5,1.8,Iris-virginica
139,6.0,3.0,4.8,1.8,Iris-virginica
140,6.9,3.1,5.4,2.1,Iris-virginica
141,6.7,3.1,5.6,2.4,Iris-virginica
142,6.9,3.1,5.1,2.3,Iris-virginica
143,5.8,2.7,5.1,1.9,Iris-virginica
144,6.8,3.2,5.9,2.3,Iris-virginica
145,6.7,3.3,5.7,2.5,Iris-virginica
146,6.7,3.0,5.2,2.3,Iris-virginica
147,6.3,2.5,5.0,1.9,Iris-virginica
148,6.5,3.0,5.2,2.0,Iris-virginica
149,6.2,3.4,5.4,2.3,Iris-virginica
150,5.9,3.0,5.1,1.8,Iris-virginica
  • 9
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值