(2019)Outlier detection in Graphs : A study on the Impact of Multiple Graph Model(论文笔记)

1. a single graph representation derived from a given dataset
2. multiple graphs models to represent a given database.

The classical approach for detecting outliers in a dataset is to model the data as a singl graph and to apply a single outlier detection method, as sketched in Figure 1(a).
在这里插入图片描述
By using this approach, the identification of outliers is biased by the given model and the selected algorithm.

Alternatively, one could use an ensemble approach to apply a set of complementary outlier detection methods on a single graph and combine their results, such that the algorithm bias is reduced. This approach is sketched in Figure 1(b).
在这里插入图片描述

Existing work for outlier detection in graphs follows the methodologies in Figures 1(a) and 1(b). As a consequence the built-in bias from the graph model selection is not adressed

Here we propose a new methodology that tackles the reduction of graph model bias towards outlier detection by generating multiple graph models to represent the same data.

The overall workflow for an ensemble method combining outlier detection results from multiple graphs is depicted in Figure 1©.
First, multiple graph models represent the same dataset, possibly taking different aspects of the dataset into account for deriving different graph models. We assume, though, that the nodes in different graphs represent the same entities. Only their relations change from model to model.

Next, some algorithm to detect (node) outliers in graphs are applied to each graph model.

In the last step, results from the outlier detection on the different graph representations are combined.

Through the ensemble of different graphs modeling the same data, we can expect an increasing precision and robustness of the outlier detection
在这里插入图片描述

Conclusion

Outlier detection is a subjective and unsupervised task that demands good knowledge and understanding of the data.

Using a single graph model of relation-rich datasets may only model some aspects of the data, thus not making proper use of potential information.

Using multiple graph models may capture more and complementary information.

We therefore suggest, based on our findings, to explore real world data using multiple graph models that are as complementary as possible.

In a practical application, a data analyst is interested in certain entities that lend themselves as a set of nodes in a graph representation while several attributes or inter-relational connections may be represented as edges between nodes. Instead of looking for the one and only, best-ever graph representation of some given raw data, the data analyst should
therefore generate multiple graph models describing different aspects of the raw data,capturing a large variety of characteristics, or putting different emphasis on certain characteristics. That is, the graphs may differ both quantitatively (how dense they are) and qualitatively (which relationships are expressed in the graph structure).

These multiple graph models aim to materialize the various perspectives that the analyst wants to highlight, that is, they should cover the problem scenario as well as possible and in as many different ways as suitable.

Clearly, many questions remain open. We focused in this study purely on the aspect of the impact of multiple graph models for a given dataset.

We evaluated this impact using two different outlier detection algorithms, four combination functions, and two similarity
measures on synthetic and real world data.

For a practical application, various aspects will have strong influence on the achievable quality, for example the algorithm used to detect outliers on the individual graphs and the method used to combine the individual results(as we have seen in this evaluation).
However based on our study we can maintain the recommendation to consider several different graph representations in any case.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值