【2019/IJCAI】AddGraph: Anomaly Detection in Dynamic Graph Using Attention-based Temporal GCN

最新推荐文章于 2024-03-28 09:38:58 发布

梦家

最新推荐文章于 2024-03-28 09:38:58 发布

阅读量909

点赞数 1

分类专栏： paper reading 文章标签：异常检测动态图推荐系统

本文链接：https://blog.csdn.net/DreamHome_S/article/details/111246463

版权

paper reading 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

文章链接：https://www.ijcai.org/Proceedings/2019/0614.pdf
源码链接：未公布

TL;DR

论文结合GNN提出了动态图中半监督的边异常检测模型 AddGraph，同时考虑了节点的结构，属性和时序特征。对于标签数据不足的问题，在训练过程中采用了 negative sampling 和 margin loss 两个技巧。在两个真实数据集的实验中取得了较好的效果。

Problem Definition

论文中的方法主要用于推荐系统中的异常操作检测，举个例子：异常的用户想自己的物品被推荐，那怎么办，那就找几个人疯狂点自己的东西的同时点那些本身就已经是popular的item，这样就会让自己的物品和popular的有相似的特征等，就会增加这些物品的被推荐的rank。这同时会产生很多新的edge，这些都是异常的edge。如下图所示：

异常模式特征：当异常的用户频繁执行操作时，用户和物品间的边会组成一个 dense subgraph，这种异常的特征模式就需要被检测出来；

Algorithm / Model

AddGraph 的整体架构如下图所示：

主要分为三部分：

GCN for content and structural features.
GRU with attention to combine short-term and long-term states.
Anomalous score computation for edges.

GCN

给定 $t$ 时刻图流中的一个 snapshot $\mathcal{G}^{t}=\left(\mathcal{V}^{t}, \mathcal{E}^{t}\right)$ 、对应的邻接矩阵 $A^t$ 和 $t - 1$ 时刻的节点隐含状态矩阵 $\mathbf{H}^{t-1} \in \mathbb{R}^{n \times d}$ ，那么当前时刻的节点状态矩阵可以通过 GCN 来更新，
$\text { Current }^{t}=\mathrm{GCN}_{L}\left(\mathbf{H}^{t-1}\right)$
以上一时刻的节点特征作为当前时刻节点模式的输入，融合了 long-term 特征；对于一个 $L$ 层的GCN，卷积过程如下所示：
$\begin{aligned} \mathbf{Z}^{(0)} &=\mathbf{H}^{t-1} \\ \mathbf{Z}^{(l)} &=\operatorname{Re} L U\left(\hat{\mathbf{A}}^{t} \mathbf{Z}^{(l-1)} \mathbf{W}^{(l-1)}\right) \\ \text { Current }^{t} &=\operatorname{Re} L U\left(\hat{\mathbf{A}}^{t} \mathbf{Z}^{(L-1)} \mathbf{W}^{(L-1)}\right) \end{aligned}$

GRU

为了考虑节点 short-term 的特征，采用了基于上下文注意力模型，以当前的某个时间窗口来编码节点特征，过程如下：
$\begin{array}{rlrl} \mathbf{C}_{h, i}^{t} & =\left[\mathbf{h}_{i}^{t-\omega} ; \ldots ; \mathbf{h}_{i}^{t-1}\right] & \mathbf{C}_{h, i}^{t} \in \mathbb{R}^{\omega \times d} \\ \mathbf{e}_{h, i}^{t} & =\mathbf{r}^{T} \tanh \left(\mathbf{Q}_{h}\left(\mathbf{C}_{h, i}^{t}\right)^{T}\right) & \mathbf{e}_{h, i}^{t} \in \mathbb{R}^{\omega} \\ \mathbf{a}_{h, i}^{t} & =\operatorname{softmax}\left(\mathbf{e}_{h, i}^{t}\right) & \mathbf{a}_{h, i}^{t} \in \mathbb{R}^{\omega} \\ \mathbf{s h o r t}_{i}^{t} & =\left(\mathbf{a}_{h, i} \mathbf{C}_{h, i}^{t}\right)^{T} & \text { short }_{i}^{t} \in \mathbb{R}^{d} \end{array}$
其中 $h_i^t$ 表示节点 $h_i$ 在 $t$ 时刻的特征， $\omega$ 表示时间窗口的大小， $\mathbf{Q}_{h}$ 和 $\mathbf{r}$ 是模型需优化的参数；对于所有节点表示成矩阵的计算公司如下：
$\text { Short }^{t}=\mathrm{CAB}\left(\mathbf{H}^{t-\omega} ; \ldots ; \mathbf{H}^{t-1}\right)$
目前我们得到了包含 long-term 信息的节点特征 $\text { Current }^{t}$ 和 short-term 信息的节点特征 $\text { Short }^{t}$ ，为了考虑时序特征在模型中使用 GRU 来处理这两种特征：
$\mathbf{H}^{t}=\text { GRU(Current }^{t}, \text { Short } \left.^{t}\right)$
具体的前向传播计算公式如下所示：
$\begin{aligned} \mathbf{P}^{t} &=\sigma\left(\mathbf{U}_{P} \text { Current }^{t}+\mathbf{W}_{P} \text { Short }^{t}+\mathbf{b}_{P}\right) \\ \mathbf{R}^{t} &=\sigma\left(\mathbf{U}_{R} \text { Current }^{t}+\mathbf{W}_{R} \text { Short }^{t}+\mathbf{b}_{R}\right) \\ \tilde{\mathbf{H}}^{t} &=\tanh \left(\mathbf{U}_{c} \mathbf{C u r r e n t}^{t}+\mathbf{W}_{c}\left(\mathbf{R}^{t} \odot \text { Short }^{t}\right)\right) \\ \mathbf{H}^{t} &=\left(1-\mathbf{P}^{t}\right) \odot \text { Short }^{t}+\mathbf{P}^{t} \odot \tilde{\mathbf{H}}^{t} \end{aligned}$
经过上述特征融合的步骤之后，得到的节点特征 $\mathbf{H}^{t}$ 就包含了结构、属性、时序信息；

Anomalous score computation

在编码节点特征 $\mathbf{H}^{t}$ 之后，图中每条边 $\in \mathcal{E}^{t}$ 的异常概率通过👇的公式进行计算：
$\cdot \sigma\left(\beta \cdot\left(\left\|\mathbf{a} \odot \mathbf{h}_{i}+\mathbf{b} \odot \mathbf{h}_{j}\right\|_{2}^{2}-\mu\right)\right)$
其中 $\sigma(x)=\frac{1}{1+e^{x}}$ 是sigmoid函数， $\mathbf{a}$ 和 $\mathbf{b}$ 是输出层的参数， $\beta$ 和 $\mu$ 是超参数；

loss function

为了去解决anomaly data不足的情况，我们建立一个模型去描述normal的分布情况，假设训练样本中全部的edge都是normal的，对应每个normal的edge我们产生一个负样本当作anomaly的edge （替换边中的source或者target）。用一个伯努利分布，重新计算了normal边对应两边node的degree，如果这个degree越大则表示越可能是异常的，越小则是正常的；

marginbased pairwise loss
度量损失：异常边的评分足够大，正常边的评分足够小；
$\begin{aligned} \mathcal{L}^{t}=\min & \sum_{(i, j, w) \in \mathcal{E}^{t}\left(i^{\prime}, j^{\prime}, w\right) \notin \mathcal{E}^{t}} \\ & \max \left\{0, \gamma+f(i, j, w)-f\left(i^{\prime}, j^{\prime}, w\right)\right\} \end{aligned}$
L2 regularization loss
正则损失防止过拟合；
$\begin{aligned} \mathcal{L}_{r e g} &=\sum\left(\left\|\mathbf{W}_{1}\right\|_{2}^{2}+\left\|\mathbf{W}_{2}\right\|_{2}^{2}+\left\|\mathbf{Q}_{h}\right\|_{2}^{2}+\|\mathbf{r}\|_{2}^{2}\right.\\ &+\left\|\mathbf{U}_{z}\right\|_{2}^{2}+\left\|\mathbf{W}_{z}\right\|_{2}^{2}+\left\|\mathbf{b}_{z}\right\|_{2}^{2}+\left\|\mathbf{U}_{r}\right\|_{2}^{2}+\left\|\mathbf{W}_{r}\right\|_{2}^{2} \\ &\left.+\left\|\mathbf{b}_{r}\right\|_{2}^{2}+\left\|\mathbf{U}_{c}\right\|_{2}^{2}+\left\|\mathbf{W}_{c}\right\|_{2}^{2}+\|\mathbf{a}\|_{2}^{2}+\|\mathbf{b}\|_{2}^{2}\right) \end{aligned}$
综上训练的loss 为：
$\mathcal{L}=\mathcal{L}^{t}+\lambda \mathcal{L}_{r e g}$