朴素贝叶斯半朴素贝叶斯_您需要了解的有关朴素贝叶斯的所有信息

最新推荐文章于 2022-12-31 16:46:55 发布

weixin_26731327

最新推荐文章于 2022-12-31 16:46:55 发布

阅读量792

点赞数

文章标签： python

原文链接：https://medium.com/@zg104/everything-you-need-to-know-about-naïve-bayes-9a97cff1cba3

版权

本文详细介绍了朴素贝叶斯和半朴素贝叶斯的概念，包括它们的理论基础和应用场景，帮助读者深入理解这两种重要的概率分类算法。

摘要由CSDN通过智能技术生成

朴素贝叶斯半朴素贝叶斯

内容(Content)

Discriminative Model & Generative Model
判别模型和生成模型
Main Idea (1): Bayesian Rule
主要思想(1)：贝叶斯法则
Main Idea (2): Conditional independence hypothesis
主要思想(2)：条件独立性假设
Main Idea (3): Construct a classifier from a probability model
主要思想(3)：从概率模型构造分类器
Gaussian Naive Bayes (GNB)
高斯朴素贝叶斯(GNB)
Naive Bayes in Python & R
Python和R中的朴素贝叶斯
Interview Questions & Other Materials
面试问题及其他资料
Summary
概要

判别模型和生成模型(Discriminative Model & Generative Model)

判别模型(Discriminative Model)

As is known to us all that machine learning is exactly the technique that predict the target y using features X , which means to compute the conditional probability p(y∣X).

众所周知，机器学习正是使用特征X预测目标y的技术，这意味着计算条件概率p ( y ∣ X )。

Then for the discriminative model, we only consider estimating the conditional probability, which is to establish the discriminant function under the condition of a limited sample, without considering the generative model of the sample, directly studying the prediction model, which can also be understood as in the binary classification problem that

那么对于判别模型，我们只考虑估计条件概率，即在有限样本条件下建立判别函数，而无需考虑样本的生成模型，直接研究预测模型，也可以理解为在二元分类问题中

生成模型(Generative Model)

The generative model obtains p(y,X), we need to find the joint distribution probability between X and the target y, and then the big win is to find two joint probability distributions. Take the larger one.

生成模型获得p(y，X)，我们需要找到X与目标y之间的联合分布概率，然后最大的成功就是找到两个联合概率分布。取较大的一个。

什么是朴素贝叶斯？ (What is Naive Bayes?)

First of all, Naive Bayes is a generative model. The Naive Bayes classifier is a series of simple probabilistic classifiers based on the use of Bayes’ theorem under the assumption of strong independence between features.

首先，朴素贝叶斯是一个生成模型。朴素贝叶斯分类器是一系列简单的概率分类器，基于贝叶斯定理的使用，假设特征之间具有很强的独立性。

Naive Bayes has been extensively studied since the 1950s. In the early 1960s, it was introduced into the text information retrieval field under another name. It is still a popular (benchmark) method of text classification. Text classification uses word frequency as a feature to determine the category or Other (such as spam, legality, sports or politics, etc.) issues. With proper preprocessing, it can compete with more advanced methods in this field (including support vector machines). It is also used in automatic medical diagnosis.

自1950年代以来，朴素贝叶斯得到了广泛的研究。在1960年代初期，它以另一个名称被引入文本信息检索领域。它仍然是文本分类的一种流行(基准)方法。文本分类使用单词频率作为确定类别或其他(例如垃圾邮件，合法性，体育或政治等)问题的功能。通过适当的预处理，它可以与该领域的更高级方法(包括支持向量机)竞争。它还用于自动医疗诊断。

Naive Bayes is a simple way to build a classifier. The classifier model assigns class labels represented by feature values to problem instances, and class labels are taken from a limited set. It is not a single algorithm for training such a classifier, but a series of algorithms based on the same principle: all naive Bayes classifiers assume that each feature of the sample is not related to other features. For example, if a fruit is red, round, and about 3 inches in diameter, the fruit can be judged to be an apple. Although these features are mutually dependent or some features are determined by other features, the Naive Bayes classifier considers these attributes to be independent in the probability distribution of determining whether the fruit is an apple.

朴素贝叶斯是构建分类器的简单方法。分类器模型将由特征值表示的类别标签分配给问题实例，并且类别标签取自有限的集合。它不是训练这种分类器的单一算法，而是基于相同原理的一系列算法：所有朴素贝叶斯分类器都假定样本的每个特征与其他特征均不相关。例如，如果水果是红色的，圆形的且直径约3英寸，则可以判断该水果为苹果。尽管这些特征是相互依赖的，或者某些特征是由其他特征确定的，但是朴素贝叶斯分类器认为这些属性在确定水果是否为苹果的概率分布中是独立的。

For certain types of probability models, very good classification results can be obtained in the sample set of supervised learning. In many practical applications, Naive Bayesian model parameter estimation uses the maximum likelihood estimation method; in other words, the Naive Bayesian model can also work without Bayesian probability or any Bayesian model.

对于某些类型的概率模型，可以在监督学习的样本集中获得很好的分类结果。在许多实际应用中，朴素贝叶斯模型参数估计使用最大似然估计方法。换句话说，朴素贝叶斯模型也可以在没有贝叶斯概率或任何贝叶斯模型的情况下工作。

Despite these naive ideas and over-simplified assumptions, the naive Bayes classifier can still achieve quite good results in many complex real-world situations. In 2004, an article analyzing the problem of Bayesian classifier revealed several theoretical reasons why the naive Bayesian classifier obtains the seemingly incredible classification effect. Nevertheless, an article in 2006 compared various classification methods in detail, and found that the newer methods (such as decision trees and random forests) performed better than Bayesian classifiers.

尽管存在这些幼稚的想法和过分简化的假设，但幼稚的贝叶斯分类器仍可以在许多复杂的实际情况中取得相当不错的结果。 2004年，一篇分析贝叶斯分类器问题的文章揭示了朴素贝叶斯分类器获得看似令人难以置信的分类效果的几个理论原因。尽管如此，2006年的一篇文章详细比较了各种分类方法，发现较新的方法(例如决策树和随机森林)表现优于贝叶斯分类器。

One advantage of the naive Bayes classifier is that it only needs to estimate the necessary parameters (the mean and variance of the variables) based on a small amount of training data. Due to the assumption of variable independence, only the method of estimating each variable is needed, without the need to determine the entire covariance matrix.

朴素的贝叶斯分类器的一个优点是，它仅需要基于少量的训练数据来估计必要的参数(变量的均值和方差)。由于变量独立性的假设，仅需要估计每个变量的方法，而无需确定整个协方差矩阵。

Theoretically, the probability model classifier is a conditional probability model, which is no stranger to us, that is to say: Independent categorical variables have several categories, and the conditions depend on several feature variables, but the problem is that if the number of features is large or the value of each feature is large, the probabilities listed based on the probability model need to be modified to become feasible:

从理论上讲，概率模型分类器是一个条件概率模型，对我们并不陌生，也就是说：独立类别变量具有多个类别，并且条件取决于多个特征变量，但是问题在于如果特征数量较大或每个特征的值较大，则需要修改基于概率模型列出的概率以使其可行：

主要思想(1)：贝叶斯法则 (Main Idea (1): Bayesian Rule)

Theoretically, the probability model classifier is a conditional probability model. We are no longer unfamiliar with this, which is to ask for

从理论上讲，概率模型分类器是条件概率模型。我们不再陌生，这是要求

The independent categorical variable y has several categories, and the conditions depend on several characteristic variables. But the problem is that if the number of features n is large or the value of each feature is large, the probabilities listed based on the probability model need to be modified to become feasible:

独立类别变量y具有多个类别，并且条件取决于多个特征变量。但是问题在于，如果特征数量n很大或每个特征的值很大，则需要修改基于概率模型列出的概率以变得可行：

In simple language, it can be expressed as:

用简单的语言可以表示为：

For example, if we have a deck of poker, we want to know the probability that the card we draw is king given that it is a card with a man. So according to the Bayesian formula, we know

例如，如果我们有一副扑克牌，我们想知道我们抽出的纸牌为国王的概率，因为它是一个有人的纸牌。所以根据贝叶斯公式，我们知道

p(King) is equal to 4/52 because there are 4 Kings in a deck of cards
p(King)等于4/52，因为在一副纸牌中有4个King
p(Face|King) is equal to 1 because all kings are Face
p(Face | King)等于1，因为所有国王都是Face
p(Face) is equal to 12/52 because each suit has 3 Face, and there are 12 cards in total
p(Face)等于12/52，因为每套西装都有3张Face，总共有12张牌

The next formula is something we will often encounter

下一个公式是我们经常会遇到的问题

In practice, we only care about the numerator of the fraction, because the denominator does not depend on yy and the value of the feature X is given, we can consider the denominator to be a constant, so that the numerator is equivalent to the joint distribution model.

在实践中，我们只关心分数的分子，因为分母不取决于yy且给出了特征X的值，所以我们可以认为分母是一个常数，因此分子等于联合分布模型。

Using the chain rule repeatedly, the formula can be written in the form of conditional probability, as follows:

重复使用链式规则，公式可以以条件概率的形式编写，如下所示：

主要思想(2)：条件独立性假设 (Main Idea (2): Conditional independence hypothesis)

From now on, the “simple” assumption of conditional independence begins to work. What is conditional independence? When two events A and B occur given Y, whether A and B happen or not are independent in terms of the conditional probability distribution, that is, they are conditionally independent given Y.

从现在开始，对条件独立性的“简单”假设开始起作用。什么是条件独立性？当给定Y发生两个事件A和B时，就条件概率分布而言，是否发生A和B是独立的，也就是说，给定Y，它们是有条件独立的。

So we assume that each feature is conditionally independent, meaning

因此，我们假设每个功能在条件上都是独立的，这意味着

For i is not equal to j, the joint distribution model can be expressed as

当i不等于j时，联合分布模型可以表示为

So this means that the conditional distribution of class variables y can be expressed as

因此，这意味着类变量y的条件分布可以表示为

Where Z (evidence) is a dependent only on features since the value of the characteristic variable at this time is a constant, which is just an integral. Since it is decomposed into the so-called prior probability p(y) and the independent probability distribution, which evolved into their likelihood. The stability of the probability of the above model has been greatly improved.

其中Z(证据)仅取决于特征，因为此时特征变量的值是一个常数，仅是一个整数。由于它被分解为所谓的先验概率p(y)和独立概率分布，因此演变为它们的可能性。上述模型的概率的稳定性已大大提高。

主要思想(3)：从概率模型构造分类器 (Main Idea (3): Construct a classifier from a probability model)

So how to construct a classifier from the naive Bayes probability model requires the establishment of a decision rule. A very common rule is to select the most probable one, which is the maximum posterior probability (MAP) decision criterion, and the corresponding classification is a formula defined as follows. Here we assume that y has K categories, so we have

因此，如何从朴素贝叶斯概率模型构造分类器需要建立决策规则。一个非常普遍的规则是选择最可能的规则，即最大后验概率( MAP )决策标准，相应的分类是如下定义的公式。这里我们假设y有K个类别，所以我们有

All the parameters of the model can be estimated by the frequency of each class corresponding to the whole dataset.

可以通过与整个数据集相对应的每个类别的频率来估计模型的所有参数。

扩展名(1)：高斯朴素贝叶斯(GNB) (Extension (1): Gaussian Naive Bayes (GNB))

In order to estimate the distribution parameters of the features, we assume tat the training set is following some kind of distribution. Here, we assume that it is normally distributed.

为了估计特征的分布参数，我们假设训练集遵循某种分布。在这里，我们假设它是正态分布的。

If you are dealing with continuous data, a common assumption is that these continuous values are Gaussian. For example, assuming that there is a continuous feature x in the training set, we first classify the data according to categories, and then calculate the mean and variance of x in each category. We have:

如果要处理连续数据，通常的假设是这些连续值是高斯型的。例如，假设训练集中有一个连续特征x，我们首先根据类别对数据进行分类，然后计算每个类别中x的均值和方差。我们有：

So, we take x = v as an example

因此，我们以x = v为例

Another commonly used technique for dealing with continuous numerical problems is by discretizing continuous numerical values. Generally, when the number of training samples is small or the precise distribution is known, the method of passing probability distribution is a better choice. In the case of a large number of samples, the discretization method performs better, because a large number of samples can learn the distribution of the data. Since NB is a typical method that uses a large number of samples (the larger the amount of calculation, the higher the accuracy of classification), so the NB uses the discretization method , rather than the method of probability distribution estimation.

处理连续数值问题的另一种常用技术是离散化连续数值。通常，当训练样本数量少或已知精确分布时，通过概率分布的方法是更好的选择。在大量样本的情况下，离散化方法的效果更好，因为大量样本可以了解数据的分布。由于NB是使用大量样本的典型方法(计算量越大，分类的准确性越高)，因此NB使用离散化方法，而不是概率分布估计方法。

Here is how GNB works. The z-score of each class-mean and every data point can be calculated.

这是GNB的工作方式。可以计算每个类均值和每个数据点的z得分。

If a given class and feature did not appear together in the training set, then the probability will be 0 under frequency-based estimation. This will be a problem. Because when multiplying with other probabilities, all other probabilities will be removed. Therefore, it is often required to modify the probability estimate of each small sample to ensure that there will be no probability of 0.

如果给定的类别和特征未同时出现在训练集中，则在基于频率的估计下概率将为0。这将是一个问题。因为当与其他概率相乘时，所有其他概率将被删除。因此，通常需要修改每个小样本的概率估计，以确保不存在0的概率。

Although the independence assumption is often inaccurate in practice, several features of the NB classifier make it possible to obtain surprising results in practice. In particular, the decoupling between various conditional features (decoupling in mathematics refers to turning a mathematical equation containing multiple variables into a system of equations that can be represented by a single variable, that is, variables no longer directly affect the result of an equation at the same time so as to simplify analysis and calculation) means that the distribution of each feature can be independently estimated as a one-dimensional distribution.

尽管在实践中独立性假设通常不准确，但是NB分类器的一些功能使在实践中获得令人惊讶的结果成为可能。特别是，各种条件特征之间的去耦(数学中的去耦是指将包含多个变量的数学方程变成可以由单个变量表示的方程组，也就是说，变量不再直接影响方程的结果。同时为了简化分析和计算)意味着每个特征的分布可以独立地估计为一维分布。

This alleviates the obstacles caused by the disaster of dimensionality. When the number of features of the sample increases, the sample size does not need to increase exponentially. However, Naive Bayes cannot make a very accurate estimate of class probability in most cases, but this is not required in many applications.

这减轻了因尺寸灾难而造成的障碍。当样本的特征数量增加时，样本大小不需要成倍增加。但是，在大多数情况下，朴素贝叶斯无法对类概率做出非常准确的估计，但是在许多应用程序中并不需要这样做。

For example, in the Naive Bayes classifier, according to the maximum posterior probability (MAP) decision rule, as long as the posterior probability of the correct class is higher than other classes, the correct classification can be obtained. So no matter whether the probability estimate is mild or even severely imprecise, it will not affect the correct classification result. In this way, the classifier can be robust enough to ignore the defects in the Naive Bayes probability model.

例如，在朴素贝叶斯分类器中，根据最大后验概率(MAP)决策规则，只要正确类别的后验概率高于其他类别，就可以获得正确分类。因此，无论概率估计是轻微的还是严重的不精确，都不会影响正确的分类结果。这样，分类器可以足够健壮以忽略朴素贝叶斯概率模型中的缺陷。

Python和R中的朴素贝叶斯 (Naive Bayes in Python & R)

Python
Python

code is released under the 代码根据 MIT licenseMIT许可证发布

#How To Implement Naive Bayes From Scratch in Python
#http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/
#Dataset
#https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data


import csv
import math
import random


#Handle data
def loadCsv(filename):
	lines = csv.reader(open(filename, "r"))
	dataset = list(lines)
	for i in range(len(dataset)):
		dataset[i] = [float(x) for x in dataset[i]]
	return dataset


#Test handling data
""" 
filename = 'pima-indians-diabetes.data.csv'
SomeDataset = loadCsv(filename)
print("Loaded data file {0:s} with {1:5d} rows".format(filename,len(SomeDataset)))
"""


#Split dataset with ratio
def splitDataset(dataset, splitRatio):
	trainSize = int(len(dataset) * splitRatio)
	trainSet = []
	copy = list(dataset)
	while len(trainSet) < trainSize:
		index = random.randrange(len(copy))
		trainSet.append(copy.pop(index))
	return [trainSet, copy]


#Test splitting data
"""
dataset = [[1], [2], [3], [4], [5]]
splitRatio = 0.67
train, test = splitDataset(dataset, splitRatio)
print('Split {0} rows into train with {1} and test with {2}'.format(len(dataset),train,test))
"""


#Separate by Class
def separateByClass(dataset):
	separated = {}
	for i in range(len(dataset)):
		vector = dataset[i]
		if (vector[-1] not in separated):
			separated[vector[-1]] = []
		separated[vector[-1]].append(vector)
	return separated


#Test separating by class
"""
dataset = [[1,20,1],[2,21,0],[3,22,1]]
separated = separateByClass(dataset)
print('Separated instances: {0}'.format(separated))
"""


#Calculate Mean
def mean(numbers):
	return sum(numbers)/float(len(numbers))


def stdev(numbers):
	avg = mean(numbers)
	variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
	return math.sqrt(variance)


#Test stdev & mean calculation
"""
numbers = [1,2,3,4,5]
print('Summary of {0}: mean={1}, stdev={2}'.format(numbers, mean(numbers), stdev(numbers)))
"""


#Summarize Dataset
def summarize(dataset):
	summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)]
	del summaries[-1]
	return summaries


#Test summarizing data
"""
dataset = [[1,20,0], [2,21,1], [3,22,0]]
summary = summarize(dataset)
print('Attribute summaries: {0}'.format(summary))
"""


#Summarize attributes by class
def summarizeByClass(dataset):
	separated = separateByClass(dataset)
	summaries = {}
	for classValue, instances in separated.items():
		summaries[classValue] = summarize(instances)
	return summaries


#Test summarizing attributes
"""
dataset = [[1,20,1], [2,21,0], [3,22,1], [4,22,0]]
summary = summarizeByClass(dataset)
print('Summary by class value: {0}'.format(summary))
"""


#Calculate Gaussian Probability Density Function
def calculateProbability(x, mean, stdev):
	exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
	return (1/(math.sqrt(2*math.pi)*stdev))*exponent


#Testing Gaussing PDF
"""
x = 71.5
mean = 73
stdev = 6.2
probability = calculateProbability(x,mean,stdev)
print('Probability of belonging to this class: {0}'.format(probability))
"""


#Calculate Class Probabilities
def calculateClassProbabilities(summaries, inputVector):
	probabilities = {}
	for classValue, classSummaries in summaries.items():
		probabilities[classValue] = 1
		for i in range(len(classSummaries)):
			mean, stdev = classSummaries[i]
			x = inputVector[i]
			probabilities[classValue] *= calculateProbability(x, mean, stdev)
		return probabilities


#Testing Class Probability calculation
"""
summaries = {0:[(1, 0.5)], 1:[(20, 5.0)]}
inputVector = [1.1, '?']
probabilities = calculateClassProbabilities(summaries, inputVector)
print('Probabilities for each class: {0}'.format(probabilities))
"""


#Make a prediction
def predict(summaries, inputVector):
	probabilities = calculateClassProbabilities(summaries, inputVector)
	bestLabel, bestProb = None, -1
	for classValue, probability in probabilities.items():
		if bestLabel is None or probability > bestProb:
			bestProb = probability
			bestLabel = classValue
	return bestLabel


#Test prediction
"""
summaries = {'A':[(1, 0.5)], 'B':[(20, 5.0)]}
inputVector = [1.1, '?']
result = predict(summaries, inputVector)
print('Prediction: {0}'.format(result))
"""


#Get predictions


def getPredictions(summaries, testSet):
	predictions = []
	for i in range(len(testSet)):
		result = predict(summaries, testSet[i])
		predictions.append(result)
	return predictions


#Test predictions
"""
summaries = {'A':[(1, 0.5)], 'B':[(20, 5.0)]}
testSet = [[1.1,'?'], [19.1,'?']]
predictions = getPredictions(summaries, testSet)
print('Predictions: {0}',format(predictions))
"""


#Get Accuracy
def getAccuracy(testSet, predictions):
	correct = 0
	for x in range(len(testSet)):
		if testSet[x][-1] == predictions[x]:
			correct += 1
	return (correct/float(len(testSet)))*100.0


#Test accuracy
"""
testSet = [[1,1,1,'a'], [2,2,2,'a'], [3,3,3,'b']]
predictions = ['a', 'a', 'a']
accuracy = getAccuracy(testSet, predictions)
print('Accuracy: {0}'.format(accuracy))
"""
def main():
	filename = 'pima-indians-diabetes.data.csv'
	splitRatio = 0.67
	dataset = loadCsv(filename)
	trainingSet, testSet = splitDataset(dataset, splitRatio)
	print('Split {0} rows into train = {1} and test = {2} rows'.format(len(dataset),len(trainingSet),len(testSet)))
	#prepare model
	summaries = summarizeByClass(trainingSet)
	#test model
	predictions = getPredictions(summaries, testSet)
	accuracy = getAccuracy(testSet, predictions)
	print('Accuracy: {0}%'.format(accuracy))


main()

R
[R

 library(datasets)
data(iris) set.seed(100) 
training_rows <- sort(c(sample(1:50, 40), sample(51:100, 40), sample(101:150, 40)))
training_x <- as.data.frame(iris[training_rows, 1:4])
training_y <- iris[training_rows, 5] iris_nb <- function(x, trainx, trainy){
  train <- cbind(trainx, trainy)
  
  class_virginica <- train[which(train$trainy == 'virginica'),]
  class_setosa <- train[which(train$trainy == 'setosa'),]
  class_versicolor <- train[which(train$trainy == 'versicolor'),]
  
  posterior <- function(x, classtype){
    p_Sepal.Length <- dnorm(x[1], mean(classtype[,1]), sd(classtype[,1]))
    p_Sepal.Width  <- dnorm(x[2], mean(classtype[,2]), sd(classtype[,2]))
    p_Petal.Length <- dnorm(x[3], mean(classtype[,3]), sd(classtype[,3]))
    p_Petal.Width  <- dnorm(x[4], mean(classtype[,4]), sd(classtype[,4]))
    
    vec <-  0.33* p_Sepal.Length * p_Sepal.Width * p_Petal.Length * p_Petal.Width #for each species
    return(vec)
  }
  
  return(list(virginica = sum(posterior(x, class_virginica)), 
              setosa = sum(posterior(x, class_setosa)),
              versicolor = sum(posterior(x, class_versicolor))))
} test_case_1 <- as.matrix(iris[1, 1:4])
iris_nb(test_case_1, training_x, training_y) ## $virginica
## [1] 

## $setosa
## [1] 

## $versicolor
## [1]

面试问题及其他资料(Interview Questions & Other Materials)

Interview Question

面试问题

What is the difference between Naive Bayes and LR?
朴素贝叶斯和LR有什么区别？

Simply put: Naive Bayes is a generative model. Bayesian estimation is based on existing samples to learn the prior probability P(Y) and conditional probability P(X|Y), and then the joint distribution probability P(X, Y) , And finally use Bayes' theorem to solve P(Y|X), and LR is a discriminant model, according to the maximum log likelihood function to directly obtain the conditional probability P(Y|X); Naive Bayes is based on a strong conditional independence assumption (under the condition that the classification Y is known, the values of each feature variable are independent of each other), while LR does not require this; Naive Bayes is suitable for scenarios with few data points, and LR is suitable for a large scale dataset.

简而言之：朴素贝叶斯是一个生成模型。贝叶斯估计是基于现有样本来学习先验概率P(Y)和条件概率P(X | Y)，然后是联合分布概率P(X，Y)，最后使用贝叶斯定理求解P(Y | X)，而LR是一个判别模型，根据最大对数似然函数直接获得条件概率P(Y | X)；朴素贝叶斯基于强大的条件独立性假设(在已知分类Y的条件下，每个特征变量的值彼此独立)，而LR则不需要这样做; 朴素贝叶斯适用于数据点较少的场景，LR适用于大规模数据集。

Why is Naive Bayes naive?
为什么天真贝叶斯天真？

When using Bayes’ theorem to solve the joint probability P(X, Y), you need to calculate the conditional probability P(X|Y). When calculating P(X|Y), Naive Bayes made a strong conditional independence assumption (when Y is determined, the values of the components of X are independent of each other), which is as follows:

使用贝叶斯定理求解联合概率P(X，Y)时，需要计算条件概率P(X | Y)。在计算P(X | Y)时，朴素贝叶斯提出了一个强的条件独立性假设(确定Y时，X的成分值彼此独立)，如下所示：

What to do if the probability is 0 when estimating the conditional probability P(X|Y)?
如果估计条件概率P(X | Y)时概率为0，该怎么办？

Simply put: Introduce λ. When λ=1, it is called Laplace smoothing.

简而言之：引入λ。当λ= 1时，称为拉普拉斯平滑。

Laplacian smoothing is a correction method for dealing with zero probability in Naive Bayes. When performing classification, it may happen that a certain attribute does not appear in the training set at the same time as a certain class. If the calculation is directly based on the expression of the naive Bayes classifier, there will be a zero probability phenomenon. In order to prevent the information carried by other attributes from being “erased” by attribute values that have not appeared in the training set, the Laplacian estimator is used for correction. The specific method is: add 1 to the numerator, and for the prior probability, add the number of possible categories in the training set to the denominator; for conditional probability, add the number of possible values of the i-th attribute to the denominator

拉普拉斯平滑是在朴素贝叶斯中处理零概率的一种校正方法。在执行分类时，可能会出现某个属性没有与某个班级同时出现在训练集中的情况。如果直接基于朴素贝叶斯分类器的表达式进行计算，则将出现零概率现象。为了防止其他属性携带的信息被训练集中未出现的属性值``擦除''，拉普拉斯估计器用于校正。具体方法是：将1加到分子上，对于先验概率，将训练集中可能类别的数量加到分母上； 对于条件概率，将第i个属性的可能值的数量添加到分母

Pros and cons of Naive Bayes
朴素贝叶斯的利与弊

Pros: Performs well on small-scale data; Suitable for multi-classification tasks; Suitable for incremental training.

优点：在小规模数据上表现良好；适用于多分类任务；适用于增量训练。

Cons: Very sensitive to the expression form of the input data (discrete, continuous, maximum and minimum values, etc.)

缺点：对输入数据的表达形式(离散，连续，最大值和最小值等)非常敏感。

Are there any hyperparameters in Naive Bayes that can be adjusted?
朴素贝叶斯中是否可以调整任何超参数？

Naive Bayes has no hyperparameters that can be adjusted, so it does not need to adjust parameters. Naive Bayes is classified according to the training set, and the result of the classification is basically determined. The Laplace estimator λ is not a parameter of naive Bayes classifier. The parameters cannot be adjusted to Naive Bayes through the Laplace estimator λ.

朴素贝叶斯(Naive Bayes)没有可调整的超参数，因此不需要调整参数。根据训练集对朴素贝叶斯进行分类，并基本确定分类结果。拉普拉斯估计量λ不是朴素贝叶斯分类器的参数。无法通过拉普拉斯估计量λ将参数调整为朴素贝叶斯。

Other Materials

其他材质

演示地址

概要(Summary)

Among all machine learning classification algorithms, Naive Bayes is different from most other classification algorithms. For most classification algorithms, such as decision trees, KNN, logistic regression, support vector machines, etc., they are all discriminative methods, that is, directly learn the relationship between feature output Y and feature X, or decision function Y=f (X), or conditional distribution P(Y|X). But Naive Bayes is a generation method, that is, directly find the joint distribution P(X,Y) of feature output Y and feature X, and then use P(Y|X)=P(X,Y)/P(X) inferred. Naive Bayes is very intuitive and does not require a large amount of calculation. It has wide applications in many fields.

在所有机器学习分类算法中，朴素贝叶斯不同于大多数其他分类算法。对于大多数分类算法，例如决策树，KNN，逻辑回归，支持向量机等，它们都是判别方法，即直接学习特征输出Y与特征X之间的关系，或者决策函数Y = f( X)或条件分布P(Y | X)。但是朴素贝叶斯是一种生成方法，即直接找到特征输出Y和特征X的联合分布P(X，Y)，然后使用P(Y | X)= P(X，Y)/ P(X ) 推断。朴素贝叶斯非常直观，不需要大量的计算。在许多领域都有广泛的应用。

Originally published at https://zg104.github.io.

最初发布在https://zg104.github.io 。