2019.ECML PKDD.Beyond outlier detection: lookout for pictorial explanation
paper
main idea
使用一组焦点图focus-plots,每个焦点图都“blame”或“解释”输入的异常的一个子集,这个焦点图对应的一对特征能最好地展示其异常之处。
use a set of focus-plots, each of which “blames” or “explains away” a subset of the input outliers, whose outlierness is best showcased by the corresponding pair of features.
example
contribution
The proposed LOOKOUT makes four contributions:
(a) problem formulation: we introduce an “analyst-centered” problem formulation for explaining outliers via focus-plots,
(b) explanation algorithm: we propose a plot-selection objective and the LOOKOUT algorithm to approximate it with optimality guarantees,
(c ) generality: our explanation algorithm is both domain- and detector-agnostic
(d) scalability: LOOKOUT scales linearly with the size of input outliers to explain and the explanation budget.
method
问题定义
焦点图是二维的,每个维度对应的一个特征,也就是每个焦点图对应一组特征对。对于d维特征的数据,要从
l
=
d
(
d
−
1
)
2
l=\frac{d(d-1)}{2}
l=2d(d−1)个不同组合中,选出b个最好的特征对。
流程
1、按特征对进行异常评分,每个异常获得d(d-1)/2个分数(如果异常是黑盒模型检测出的,接着用同样的黑盒模型,否则用现成的其他模型)。
2、焦点图选择,根据marginal gain依次选择b个焦点图,满足
。
procedure:
1、Scoring by Feature Pairs,each outlier receives |P| =l scores(if detected by “black-box” detector, use the same detector, otherwise use any off-the-shelf detector).
2、Plot Selection,select focus-plots by marginal gain.
例子
example
算法
algorithm
citation
use focus plots to explain a group of outliers. Focus plots are 2-dimensional feature plots. The explanation algorithm tries to find the set of features pairs that best discriminate the outliers in the group. All
possible combinations of pairwise plots are generated, and, for each pair of features the outlier scores of the data points in the group are computed using only the two features in the pair. The pair that gives the highest anomaly score is kept. Some heuristics are used to limit the search in the features space. This method named LookOut is model-agnostic. outliers can be diverse, and trying to explain a set of random outliers using LookOut is not efficient as the algorithm will try to make a compromise between the outliers to produce the final focus plots. The latter may therefore not include the best focus plot for each outlier individually. For example, the best focus plot for outlier 2 is (model, unitprice) and the best focus plot for outlier 3 is (unitweight, unitprice). If we want to explain these two outliers using LookOut, the method may select the first focus plot, which is not optimal for outlier 3.[1]
experiment
Our experiments were designed to answer the following questions:
[Q1] Quality of Explanation: How well can LOOKOUT “explain” or “blame” the
given outliers?
评估时用了自己提出的指标:基于F(s)的,也就是前文选择时的优化目标。
[Q2] Scalability: How does LOOKOUT scale with the input graph size and the number
of outliers?
[Q3] Discoveries: Does LOOKOUT lead to interesting and intuitive explanations on
real world data?
reference
[1]2022.DKE.Anomaly explanation A review :3.1.1. Non-weighted feature importance