Local Outlier Factor
Given local outlier factors, we can detect the outliers that are always away from most of the samples. In order to outline the algorithm, some concepts must go first:
Reachability Distance
where x(k) stands for the k th point nearest to
Local Reachability Density
Local Outlier Factor
Evidently, as the LOF of x ascends, the probability that
Here is a simple example
n=100; x=[(rand(n/2,2)-0.5)*20; randn(n/2,2)]; x(n,1)=14;
k=3; x2=sum(x.^2,2);
[s, t]=sort(sqrt(repmat(x2,1,n)+repmat(x2',n,1)-2*x*x'), 2);
for i=1:k+1
for j=1:k
RD(:,j)=max(s(t(t(:,i),j+1),k), s(t(:,i),j+1));
end
LRD(:,i)=1./mean(RD,2);
end
LOF=mean(LRD(:,2:k+1),2)./LRD(:,1);
figure(1); clf; hold on
plot(x(:,1),x(:,2),'rx');
for i=1:n
plot(x(i,1),x(i,2),'bo', 'MarkerSize', LOF(i)*10);
end
KL Divergence
In unsupervised learning problems, there is usually little information about the outliers. However, when some known normal sample set
Kullback-Leibler (KL) divergence, also known as Relative Entropy, is a powerful tool to estimate the probability density ratio of normal samples to test samples-
where p′