朴素贝叶斯高斯模型_从零开始实现高斯朴素贝叶斯独立贝叶斯模型

最新推荐文章于 2024-08-14 22:32:50 发布

weixin_26720761

最新推荐文章于 2024-08-14 22:32:50 发布

阅读量881

点赞数

文章标签： python 机器学习 tensorflow 人工智能

原文链接：https://medium.com/@chrishuskey/implementing-a-gaussian-naïve-bayes-independence-bayes-model-from-scratch-8a9280215f83

版权

本文介绍了如何从零开始实现高斯朴素贝叶斯独立贝叶斯模型，详细探讨了朴素贝叶斯高斯模型的概念，并提供了相关的翻译内容链接。

摘要由CSDN通过智能技术生成

朴素贝叶斯高斯模型

“Why is Google censuring me?!” Claire asked (true story). Sure, she’s always been a prolific emailer, but she is no scammer — and she assures me her days as a Nigerian prince are long since over. So why did Gmail suddenly lock her account today as if she was a spam-sending super-bot?

“ Google为什么要谴责我？！” 克莱尔问(真实的故事)。当然，她一直是多产的电子邮件发送者，但她绝不是骗子-她向我保证，尼日利亚王子的日子已经过去很久了。那么，为什么Gmail今天突然锁定了她的帐户，就好像她是发送垃圾邮件的超级机器人一样？

My answer, after building this: “Most likely? Honestly, you’re probably doing some pretty unusual things in your email… unusual enough that your Gmail account is a very clear outlier compared to almost any other human user, so Gmail is flagging you as “probably a spam account or bot.” Maybe some combination of how many emails you’re sending, to how many people, with what type of chain-email-like language in them?”

我的回答是：建立完这个之后：“很有可能吗？坦白说，您可能在电子邮件中做了一些非常不寻常的事情……非常不寻常，以至于与几乎任何其他人类用户相比，您的Gmail帐户都非常明显，因此Gmail将您标记为“可能是垃圾邮件帐户或漫游器”。也许是您要向多少人发送多少电子邮件以及其中使用哪种类型的类似于链式电子邮件的语言的组合？”

And most likely, Thomas Bayes was involved.

而且很可能是托马斯·贝叶斯(Thomas Bayes)参与其中。

朴素贝叶斯：快速入门 (Naïve Bayes: A Quick Intro)

Naïve Bayes (a.k.a. Independence Bayes) classifiers are a category of supervised probabilistic models that, despite relying on pretty simple underlying assumptions, are still widely used in today’s world. Because they are very fast even when working with huge datasets, and perform surprisingly well despite said simplistic assumptions, they are consulted and/or used for many critical real-world tasks:

朴素贝叶斯(又名独立贝叶斯)分类器是一种监督概率模型，尽管它依赖于非常简单的基本假设，但在当今世界仍广泛使用。因为即使在处理庞大的数据集时它们也非常快，并且尽管说了这么简单的假设，但它们却表现出令人惊讶的出色表现，因此在许多重要的现实世界任务中都可以参考和/或使用它们：

financial loan decisions
金融贷款决定
predicting health condition/disease likelihoods (including COVID-19) in healthcare and insurance
预测医疗保健和保险中的健康状况/疾病可能性 ( 包括COVID-19 )
text and document classification
文字和文件分类
natural language (NLP) tasks like sentiment analysis in reviews… and
自然语言(NLP)任务，例如评论中的情感分析 …和
most notably, email spam detection.
最值得注意的是，电子邮件垃圾邮件检测。

At its root, Naïve Bayes analyzes conditional probabilities to make its predictions, relying on the same Bayes’ Theorem you learned about in high school:

朴素的贝叶斯(NaïveBayes)根源于在高中时所学的贝叶斯定理，它分析条件概率来做出预测：

Or, as applied our current case of deciding whether an email is spam or not:

或者，按照我们当前确定电子邮件是否为垃圾邮件的当前情况：

朴素贝叶斯的工作原理：具有多个类和许多特征的贝叶斯概率 (How Naïve Bayes Works: Bayesian Probabilities with Multiple Classes and Many Features)

Naïve Bayes models ultimately take any new, previously unseen data point such as a new email, and (1) calculate the probabilities of that data point (its particular set of features) separately as if it belonged to each different class, then (2) choose the most probable class based on which probability is highest. Because the Bayes denominator above is a constant that stays the same for every class — e.g., P(email characteristics) is the same regardless of whether the class is “spam” or “not spam” — it does not help us compare probabilities.

朴素贝叶斯模型最终会采用任何新的，以前看不见的数据点(例如新电子邮件)，并且(1)分别计算该数据点(其特定的一组功能)的概率，就好像它属于每个不同的类一样，然后(2)根据哪个概率最高来选择最可能的类别。因为上面的贝叶斯分母是一个不变的常数，对于每个类别都是相同的-例如，无论类别是“垃圾邮件”还是“非垃圾邮件”，P(电子邮件特征)都是相同的-这无助于我们比较概率。

As such, we are really just comparing the Bayes Theorem numerator across every possible class. We do this by doing the following:

因此，我们实际上只是在比较每个可能类的贝叶斯定理分子。为此，请执行以下操作：

(1) Calculate the independent overall probabilities p(class_i) of each class (the class priors) from the provided training data.

(1)根据提供的训练数据计算每个班级(班级先验)的独立整体概率p(class_i) 。

(2) Calculate the Class-conditional Probability Distributions p(feature_n | class_i) for Each Feature-Class Combination.

(2)计算每个要素类组合的类别条件概率分布p(feature_n | class_i) 。

(3) Calculate the “Likelihood” Probability that a specific Input Data Point Belongs to Each Class.

(3)计算特定输入数据点属于每个类别的“可能性”概率。

(4) Choose the Class with the Maximum Probability, and Predict that Class for the Given Input Data Point.

(4)选择具有最大概率的类别，并为给定的输入数据点预测该类别。

实施：从零开始建立高斯朴素贝叶斯分类器： (Implementation: Building a Gaussian Naïve Bayes Classifier from Scratch:)

Step 1: Calculate the Class Priors, the Raw Probabilities p(class) of Each Class:

步骤1：计算类别优先级，即每个类别的原始概率p(class)：

Step 2: Get the Class-conditional Probability Distributions p(feature_n | class_i) for Each Feature-Class Combination:

步骤2：获取每种要素类组合的类别条件概率分布p(feature_n | class_i)：

Step 3: Calculate the “Likelihood” Probability that the Input Data Point Belongs to Each Class

步骤3：计算输入数据点属于每个类别的“似然”概率

Step 4: Choose the Class with the Maximum Probability, and Predict that Class for the Given Input Data Point

步骤4：选择具有最大概率的类别，并为给定的输入数据点预测该类别

翻译自: https://medium.com/@chrishuskey/implementing-a-gaussian-naïve-bayes-independence-bayes-model-from-scratch-8a9280215f83

朴素贝叶斯高斯模型

weixin_26720761

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫