2019.ECML PKDD.Beyond outlier detection: lookout for pictorial explanation

shaoyue1234

已于 2024-10-04 21:39:35 修改

阅读量826

点赞数 13

文章标签：异常检测

于 2024-09-05 14:48:51 首次发布

本文链接：https://blog.csdn.net/shaoyue1234/article/details/141823748

版权

2019.ECML PKDD.Beyond outlier detection: lookout for pictorial explanation

paper
main idea
- example
contribution
method
experiment
reference

paper

pdf
code

main idea

使用一组焦点图focus-plots，每个焦点图都“blame”或“解释”输入的异常的一个子集，这个焦点图对应的一对特征能最好地展示其异常之处。
use a set of focus-plots, each of which “blames” or “explains away” a subset of the input outliers, whose outlierness is best showcased by the corresponding pair of features.

example

在这里插入图片描述

contribution

The proposed LOOKOUT makes four contributions:
(a) problem formulation: we introduce an “analyst-centered” problem formulation for explaining outliers via focus-plots,
(b) explanation algorithm: we propose a plot-selection objective and the LOOKOUT algorithm to approximate it with optimality guarantees,
(c ) generality: our explanation algorithm is both domain- and detector-agnostic
(d) scalability: LOOKOUT scales linearly with the size of input outliers to explain and the explanation budget.

method

问题定义

焦点图是二维的，每个维度对应的一个特征，也就是每个焦点图对应一组特征对。对于d维特征的数据，要从 $l=\frac{d(d-1)}{2}$ 个不同组合中，选出b个最好的特征对。
在这里插入图片描述

流程

1、按特征对进行异常评分，每个异常获得d(d-1)/2个分数（如果异常是黑盒模型检测出的，接着用同样的黑盒模型，否则用现成的其他模型）。
2、焦点图选择，根据marginal gain依次选择b个焦点图，满足在这里插入图片描述
。
procedure：
1、Scoring by Feature Pairs，each outlier receives |P| =l scores（if detected by “black-box” detector, use the same detector, otherwise use any off-the-shelf detector）.
2、Plot Selection，select focus-plots by marginal gain.

例子

example
在这里插入图片描述

算法

algorithm
在这里插入图片描述

citation

use focus plots to explain a group of outliers. Focus plots are 2-dimensional feature plots. The explanation algorithm tries to find the set of features pairs that best discriminate the outliers in the group. All
possible combinations of pairwise plots are generated, and, for each pair of features the outlier scores of the data points in the group are computed using only the two features in the pair. The pair that gives the highest anomaly score is kept. Some heuristics are used to limit the search in the features space. This method named LookOut is model-agnostic. outliers can be diverse, and trying to explain a set of random outliers using LookOut is not efficient as the algorithm will try to make a compromise between the outliers to produce the final focus plots. The latter may therefore not include the best focus plot for each outlier individually. For example, the best focus plot for outlier 2 is (model, unitprice) and the best focus plot for outlier 3 is (unitweight, unitprice). If we want to explain these two outliers using LookOut, the method may select the first focus plot, which is not optimal for outlier 3.[1]

experiment

Our experiments were designed to answer the following questions:
[Q1] Quality of Explanation: How well can LOOKOUT “explain” or “blame” the
given outliers?
评估时用了自己提出的指标：基于F(s)的，也就是前文选择时的优化目标。
在这里插入图片描述

[Q2] Scalability: How does LOOKOUT scale with the input graph size and the number
of outliers?
[Q3] Discoveries: Does LOOKOUT lead to interesting and intuitive explanations on
real world data?