异常检测机器学习_机器学习,用于异常检测背后的数学

异常检测机器学习

Machine Learning has got its application spread across a variety of domains. Today I’ll write about one such real life application of Machine Learning which is extensively used to detect a defective item from a mixture of both defective and non-defective items.

机器学习的应用遍及各个领域。 今天,我将介绍一种这样的机器学习的现实生活应用程序,该应用程序广泛用于从有缺陷和无缺陷项目的混合物中检测有缺陷的项目。

Before we jump into the Algorithm, let’s have some touch up on basic Statistics.

在进入算法之前,让我们先了解一下基本统计信息。

高斯分布: (Gaussian Distribution :-)

A Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. It is a bell-shaped curve that signifies the probability distribution of a variable say ‘X’ parameterized by the distribution’s mean and variance.

高斯分布是实值随机变量的一种连续概率分布。 它是钟形曲线,表示变量“ X”的概率分布,该变量由分布的均值和方差参数化。

The probability density of the distribution is found out from the following formula:-

分布的概率密度可以从以下公式中得出:

Image for post

The value for f(x) is found out by computing the values of mean and variance for a particular value of x and substituting them in the above equation. The graph somewhat looks like the following image after plotting x in the x-axis and f(x) in the y-axis.

f(x)的值是通过计算x的特定值的均值和方差值并将它们代入上式得出的。 在x轴上绘制x并在y轴上绘制f(x)之后,该图有点像下图。

Image for post
Example Graph 示例图

The x value corresponding to the highest point on the graph is the mean of the distribution and greater the value of sigma or standard deviation which is the square root of the variance, more spread out the curve is. Here, in this case it is a Standard Normal distribution with mean= 0 and standard deviation= 1.

对应于图形最高点的x值是分布的平均值,而sigma或标准差(即方差的平方根)的值越大,曲线越分散。 此处,在这种情况下,它是标准正态分布,均值= 0,标准差= 1。

但是为什么我们需要它? (But why do we need it?)

We’ll come to the answer shortly but before that let’s look at how our training set with m examples and n features is represented:-

我们将很快给出答案,但在此之前,让我们先看看如何用m个示例和n个功能来表示我们的训练集:

Image for post

where x is an n-dimensional vector.

其中x是n维向量。

Now, we’ll plot examples under the features say x(1) and x(2) in a graphical format where the points mostly correspond to non-anomalous examples (Let us suppose the graph somewhat looks like this):-

现在,我们将以图形格式在名为x(1)和x(2)的特征下绘制示例,其中这些点大多对应于非异常示例(让我们假设该图形看起来像这样):

Image for post
Image Courtesy :- Coursera(Machine Learning by Andew NG , Lecture :- 15)
图像提供:-Coursera(Andew NG的机器学习,讲座:-15)

Here we see that most of the points are concentrated around the center, i.e. the density is the highest around the center and it keeps on decreasing as we move away. We can guess from this observation that the probability of finding a non-anomalous example in this case increases as we approach the center or the probability of finding a non-anomalous example increases as the density of the other non-anomalous examples under features x(1) and x(2) increases. So when we are given two more points which lie somewhere like this in the following graph-

在这里,我们看到大多数点都集中在中心附近,即,密度是中心附近的最高点,并且随着我们移开,密度一直在下降。 我们可以从该观察值中猜出,在这种情况下,找到一个非异常实例的概率随着我们接近中心而增加,或者在特征x(下)下,找到另一个非异常实例的概率随着其他非异常实例的密度而增加。 1)和x(2)增加。 因此,当我们得到另外两个点时,如下图所示:

Image for post
Image Courtesy :- Coursera(Machine Learning by Andew NG , Lecture :- 15)
图像提供:-Coursera(Andew NG的机器学习,讲座:-15)

We can guess that one of them lies in the higher density region and thus indicates a non-anomalous example whereas the other one which is placed far away indicates an anomalous one.

我们可以猜测,其中一个位于较高密度的区域,因此表示一个非异常的示例,而另一个放置较远的示例则指示异常。

Now let’s find the answer to our above question. Gaussian Distribution is a very important and fundamental part of statistics which is used in a lot of mathematical problems. One such use of it is in this Anomaly Detection problem. We assume that all features follow Gaussian Distribution(for an ideal case). So, we plot the probability distribution for the density estimation of a particular feature containing m examples(considering their mean and variance). From there, we deduce that the points that do not lie on the graph are the anomalous cases.

现在,让我们找到上述问题的答案。 高斯分布是统计学中非常重要的基础部分,它用于许多数学问题。 这种异常检测问题就是其中之一。 我们假设所有特征都遵循高斯分布(理想情况下)。 因此,我们绘制了包含m个示例(考虑其均值和方差)的特定特征的密度估计的概率分布。 从那里,我们推断出不位于图中的点是异常情况。

Image for post

The red cross marks are our anomalous cases and the graph is our Gaussian Distribution of a particular feature containing m examples. As mentioned above, we calculate the value of f(x) for a particular value of x from the following formula

红叉标记是我们的异常情况,图是我们的高斯分布,其中包含m个示例。 如上所述,我们根据以下公式针对特定x值计算f(x)的值

Image for post

where μ and σ squared are mean and variance respectively.

其中μ和σ平方分别是均值和方差。

选择要使用的功能 (Choosing what features to use)

As previously discussed, we assume that all features containing a particular number of examples follow Gaussian Distribution. But this is applicable for only ideal cases. In real world datasets, there are a large number of features to be chosen from. All these features might not follow this distribution. So, it is very important to choose only those features that do follow a Gaussian Distribution. We can either choose them directly after plotting their graphs or we can transform the Non-Gaussian features and the examples under it in such a way that the transformed feature with its examples follow our distribution.

如前所述,我们假定包含特定数量示例的所有特征都遵循高斯分布。 但这仅适用于理想情况。 在现实世界的数据集中,有很多特征可供选择。 所有这些功能可能都不遵循此发行版。 因此,仅选择那些遵循高斯分布的特征非常重要。 我们既可以在绘制它们的图后直接选择它们,也可以对非高斯特征及其下的示例进行转换,以使转换后的特征及其示例遵循我们的分布。

For example, if a particular feature x(j) does not give us a Gaussian Distribution, we can transform it by either squaring it(x(j)²) or by taking its logarithmic value(log(x(j)) to get a graph that is Gaussian.

例如,如果特定特征x(j)没有给我们高斯分布,则可以通过对它平方(x(j)²)或通过取其对数值(log(x(j))来对其进行变换来变换它高斯图。

现在,让我们合并并提出算法! (Now let’s merge things up and come up with the algorithm!)

We have our training set with n features containing mostly non-anomalous examples. We want to find the probability distribution of each feature and merge them up to get a particular function that best fits our training set.

我们的训练集包含n个特征,其中大部分包含非异常的示例。 我们想要找到每个特征的概率分布并将它们合并起来,以获得最适合我们训练集的特定函数。

And so for this, we first find the parameters μ and σ for each and every feature in our training set. We calculate the mean(μ) and variance(σ²) from the following formulas and fit the parameters to our model:-

因此,为此,我们首先为训练集中的每个特征找到参数μ和σ。 我们根据以下公式计算均值(μ)和方差(σ²),并将参数拟合到我们的模型中:

Image for post

Here “i” represents the number of examples and “j” represents the number of features. Then we calculate f(x) or probability distribution for every feature. Since the features are not related to each other, they are considered as independent observations. Hence, we multiply them together to get the value of p(x) which is also called the “Likelihood Function”, which will in turn be used to determine our result. That is:-

在此,“ i”表示示例数,“ j”表示特征数。 然后我们为每个特征计算f(x)或概率分布。 由于特征互不相关,因此将它们视为独立的观察结果。 因此,我们将它们相乘得到p(x)的值,该值也称为“似然函数”,该值又将用于确定结果。 那是:-

Image for post

Now, given a new example x, we calculate this function for our new example(p(x)) from the following formula:-

现在,给定一个新示例x,我们根据以下公式为新示例(p(x))计算此函数:

Image for post

From this formula the probability of this new example being an anomaly is calculated w.r.t each and every feature taken independently. They are then multiplied together to find the likelihood function for that example or in another way to find out how likely is that example to become an anomaly.

根据该公式,可以计算出每个新特征独立发生的异常概率。 然后将它们相乘,以找到该示例的似然函数,或者以另一种方式找出该示例成为异常的可能性。

Now that we’ve found out p(x) for this example we check if its smaller than the value of another variable ε. The value of ε is equal to a very small number which is determined from the results of p(x) of our training set. This ε value then acts as the threshold between anomalous and non-anomalous cases. We flag it as an anomaly if it is lesser than epsilon. That is:-

现在我们已经找到了本例中的p(x),我们检查它是否小于另一个变量ε的值。 ε的值等于一个很小的数字,它是根据我们训练集的p(x)的结果确定的。 然后,此ε值充当异常和非异常情况之间的阈值。 如果小于epsilon,我们会将其标记为异常。 那是:-

Image for post

From this condition we determine our target value or y where:-

根据此条件,我们确定目标值或y,其中:

Image for post

结论 (Conclusion)

Anomaly Detectors are used worldwide in various industries for a multitude of purposes. They are a key part of building robust distributed software. I hope this article gives a little insight as to how they really work!

异常检测器在全球范围内被广泛用于各种目的。 它们是构建强大的分布式软件的关键部分。 我希望本文能对它们的工作原理有所启发!

翻译自: https://medium.com/srm-mic/machine-learning-for-anomaly-detection-the-mathematics-behind-it-7a2c3b5a755

异常检测机器学习

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值