# Python机器学习入门 - - 贝叶斯算法学习笔记

## 二、贝叶斯算法的数学原理

### 1. 条件概率

P ( A ∣ B ) = P ( A B ) P ( B ) P(A|B) = \frac{P(AB)}{P(B)}

### 2. 全概率公式

P ( A ) = ∑ i = 1 n P ( A ∣ B i ) P ( B i ) P(A)= \sum_{i=1}^{n} P(A|B_{i})P(B_{i})

### 3. 贝叶斯公式

P ( A ∣ B ) = P ( B ∣ A ) P ( A ) P ( B ) P(A|B) = \frac{P(B|A)P(A)}{P(B)}

### 4. 朴素贝叶斯分类器

P ( y ∣ x 1 , x 2 , . . . , x n ) = P ( y ) P ( x 1 , x 2 , . . . , x n ∣ y ) P ( x 1 , x 2 , . . . , x n ) P(y|x_1, x_2, ..., x_n) = \frac{P(y)P(x_1, x_2, ..., x_n|y)}{P(x_1, x_2, ..., x_n)}

P ( x 1 , x 2 , . . . , x n ∣ y ) = ∏ i = 1 n P ( x i ∣ y ) P(x_1, x_2, ..., x_n|y) = \prod_{i=1}^{n} P(x_i | y)

P ( y ∣ x 1 , x 2 , . . . , x n ) = P ( y ) ∏ i = 1 n P ( x i ∣ y ) P ( x 1 , x 2 , . . . , x n ) P(y|x_1, x_2, ..., x_n) = \frac{P(y) \prod\limits_{i=1}^{n}P(x_i|y)}{P(x_1, x_2, ..., x_n)}

y ^ = arg ⁡ max ⁡ y P ( y ) ∏ i = 1 n P ( x i ∣ y ) \hat{y} = \arg\max_{y} P(y) \prod_{i=1}^{n} P(x_i | y)

### 5. 高斯朴素贝叶斯分类器和伯努利朴素贝叶斯分类器

y ^ = arg ⁡ max ⁡ y P ( y ) ∏ i = 1 n 1 2 π σ y , i 2 exp ⁡ ( − ( x i − μ y , i ) 2 2 σ y , i 2 ) \hat{y} = \arg\max_{y} P(y) \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi\sigma_{y,i}^2}} \exp\left(-\frac{(x_i - \mu_{y,i})^2}{2\sigma_{y,i}^2}\right)

y ^ = arg ⁡ max ⁡ y P ( y ) ∏ i = 1 n P i ∣ y x i ( 1 − P i ∣ y ) 1 − x i \hat{y} = \arg\max_{y} P(y) \prod_{i=1}^{n} P_{i|y}^{x_i} (1 - P_{i|y})^{1-x_i}

## 三、Python实现朴素贝叶斯分类

Python实现朴素贝叶斯分类的思路，首先导包，这次我们选择月亮数据集和块状数据集，选择高斯朴素贝叶斯分类器和伯努利朴素贝叶斯分类器，接着实例化对象并创建画布，然后定义一个画图函数，首先标准化数据集并画好网格，接着画散点图并创建子图，最后训练模型，并把返回预测值，拉长进行决策边界等高线可视化

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_blobs
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB, ComplementNB

# 模型的名字
names = ["Gaussian", "Bernoulli"]
# 创建我们的模型对象
classifiers = [GaussianNB(), BernoulliNB()]
# 创建数据集
datasets = [ make_moons(noise=0.2, random_state=0),make_blobs(centers=2, random_state=2),]
# 创建画布
figure = plt.figure(figsize=(12, 8))

def plot_clf(NB_clf, dataset, name, i):
X, y = dataset
#     标准化数据集
X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=42)
#     对画布画网格线
x1_min, x1_max = X[:, 0].min() - .5, X[:, 0].max() + .5
x2_min, x2_max = X[:, 1].min() - .5, X[:, 1].max() + .5
array1, array2 = np.meshgrid(np.arange(x1_min, x1_max, 0.2),
np.arange(x2_min, x2_max, 0.2))
cm = plt.cm.RdBu
cm_bright = ListedColormap(['#fafab0', '#9898ff'])
i += 1
ax = plt.subplot(len(dataset), 2, i)
ax.set_title("dataset")
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train,
cmap=cm_bright, edgecolors='k')
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test,
cmap=cm_bright, alpha=0.6, edgecolors='k')
ax.set_xlim(array1.min(), array1.max())
ax.set_ylim(array2.min(), array2.max())
ax.set_xticks(())
ax.set_yticks(())
i += 1
ax = plt.subplot(len(dataset), 2, i)
clf = NB_clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)
Z = clf.predict_proba(np.c_[array1.ravel(), array2.ravel()])[:, 1]
Z = Z.reshape(array1.shape)
ax.contourf(array1, array2, Z, cmap=cm, alpha=.8)
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright,
edgecolors='k')
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright,
edgecolors='k', alpha=0.6)
ax.set_xlim(array1.min(), array1.max())
ax.set_ylim(array2.min(), array2.max())
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(name)
ax.text(array1.max() - .3, array2.min() + .3, ('{:.1f}%'.format(score * 100)),
size=15, horizontalalignment='right')

for i in range(2):
plot_clf(classifiers[i], datasets[i], names[i], 2*i)
plt.tight_layout()
plt.show()


## 总结

• 4
点赞
• 7
收藏
觉得还不错? 一键收藏
• 打赏
• 2
评论
07-11 129
08-29
12-20 527
08-24 1万+
12-20 117
08-29
05-30 80
07-26 115

### “相关推荐”对你有帮助么？

• 非常没帮助
• 没帮助
• 一般
• 有帮助
• 非常有帮助

szu_ljm

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。