1999.VLDB.Finding intensional knowledge of distance-based outliers

1999.VLDB.Finding intensional knowledge of distance-based outliers

paper

pdf

main idea

intensional knowledge:
a description or an explanation of why an identified outlier is exceptional.
two main issue:
what kinds of intensional knowledge to provide;
how to optimize the computation of such knowledge.

contribution

1、define two notions of outliers and the corresponding intensional knowledge: strongest outlier and weak outlier.
2、develop a naive and semi-naive algorithm for computing strongest outlier and weak outlier and the corresponding intensional knowledge.
3、effective sharing of IO and experiment.

method

other’s citation

citation 1

authors identify outliers in subspaces of the features space using a distance-based anomaly detection method. This serves as explanation since the identified anomalies are outliers in the specific subspaces found, meaning that the features constituting the subspace are those that discriminate the most the instance. The authors introduce the notions of strongest, weak and trivial outliers. An outlier is non-trivial in a subspace A if it is not
an outlier in any subspace included in A. A strongest outlier is an outlier in a strongest outlying feature space (if no outlier exists in any subspace included in A, then A is a strongest feature space). A weak outlier is a non-trivial not strongest outlier. Algorithms are provided to identify (and thus explain) strong and weak outliers. This anomaly explanation method is model-specific because it is designed for distance-based methods. It is also local because it helps explaining one outlier at a time.[1]

citation 2

For example, Knorr and Ng [50] define the outlier categories C = {“trivial outlier,” “weak outlier,” “strongest outlier”} to help gain better insights about the nature of outliers. They define an anomalous data point o as the “strongest outlier” in a subspace A if it meets two criteria: (i) o is not an outlier in any subspace B ⊂ A, and (ii) no outlier exists in any subspace B ⊂ A.If o does not satisfy the criteria in (i), then it is a “weak outlier.” If it does not fit the two criteria, then it is a “trivial outlier.” The terms “trivial outlier,” “weak outlier,” and “strongest outlier” are used to separate noise from meaningful abnormal data [2].

Figure 3 shows an illustration of strongest, weak, and trivial outliers in the 3D space {A, B, C}. P1 and P5 are non-trivial outliers in the subspace AB because they are not outliers in subspace A or subspace B. They are also the strongest outliers in AB because there is no other anomalous point in the subspace A or B. P20 is a weak outlier in the subspace AC because there is another outlier point P11 in the subspace C. P11 is a trivial outlier in the subspace AC because it is also an outlier in the subspace C.[3]
在这里插入图片描述

definitions

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

experiment

reference

[1]2022.DKE.Anomaly explanation A review :3.1.1. Non-weighted feature importance
[2]2022.VLDB.A survey on outlier explanations:2.1.2 Categorical ranking of outliers
[3]2022.VLDB.A survey on outlier explanations:5.2 Techniques to find categorical rankings of
outliers

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值