局部异常因子与KL散度异常检测算法简述

最新推荐文章于 2024-04-15 15:52:58 发布

止于至玄

最新推荐文章于 2024-04-15 15:52:58 发布

阅读量2.1k

点赞数

分类专栏： Machine Learning 文章标签：机器学习

本文链接：https://blog.csdn.net/philthinker/article/details/70172905

版权

Local Outlier Factor

Given local outlier factors, we can detect the outliers that are always away from most of the samples. In order to outline the algorithm, some concepts must go first:
Reachability Distance

R D k (x, x') = max (∥ x - x (k) ∥, ∥ x - x' ∥)

$\mathrm{RD}_{k}(x,x')=\max(\|x-x^{(k)}\|, \|x-x'\|)$
where

x(k) $x^{(k)}$ stands for the

k $k$ th point nearest to

x $x$ in training set

{ xi}ni=1 $\{x_{i}\}_{i=1}^{n}$ . Note that

k $k$ is manually selected.
Local Reachability Density

L R D k (x) = (1 k \sum i = 1 k R D k (x (i), x)) - 1

$\mathrm{LRD}_{k}(x)=\left(\frac{1}{k}\sum_{i=1}^{k}\mathrm{RD}_{k}(x^{(i)},x)\right)^{-1}$
Local Outlier Factor

L O F k (x) = 1 k \sum k i = 1 L R D k ( x ( i ) ) L R D k ( x )

$\mathrm{LOF}_{k}(x)=\frac{\frac{1}{k}\sum_{i=1}^{k}\mathrm{LRD}_{k}(x^{(i)})}{\mathrm{LRD}_{k}(x)}$
Evidently, as the LOF of

x $x$ ascends, the probability that

x $x$ is an outlier also goes up. Theoretically, it is an easy algorithm with intuitive principle. However, when

n $n$ is a very large number, it also requires tremendous computation amount.

Here is a simple example

n=100; x=[(rand(n/2,2)-0.5)*20; randn(n/2,2)]; x(n,1)=14;
k=3; x2=sum(x.^2,2);
[s, t]=sort(sqrt(repmat(x2,1,n)+repmat(x2',n,1)-2*x*x'), 2);

for i=1:k+1
    for j=1:k
        RD(:,j)=max(s(t(t(:,i),j+1),k), s(t(:,i),j+1));
    end
    LRD(:,i)=1./mean(RD,2);
end
LOF=mean(LRD(:,2:k+1),2)./LRD(:,1);

figure(1); clf; hold on
plot(x(:,1),x(:,2),'rx');
for i=1:n
    plot(x(i,1),x(i,2),'bo', 'MarkerSize', LOF(i)*10);
end

Large circle means abnormal point

KL Divergence

In unsupervised learning problems, there is usually little information about the outliers. However, when some known normal sample set $\{x_{i'}' \}_{i'=1}^{n'}$ is given, we may be confident to figure out the outliers in the test set $\{x_{i} \}_{i=1}^{n}$ to some degree.
Kullback-Leibler (KL) divergence, also known as Relative Entropy, is a powerful tool to estimate the probability density ratio of normal samples to test samples-

w (x) = p ' ( x ) p ( x )

$w(x)=\frac{p'(x)}{p(x)}$
where

p′

最低0.47元/天解锁文章

止于至玄

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
局部异常因子与KL散度异常检测算法简述

Local Outlier FactorGiven local outlier factors, we can detect the outliers that are always away from most of the samples. In order to outline the algorithm, some concepts must go first:
复制链接

扫一扫