3、Mining top-k high utility itemsets with effective threshold raising

高效的阈值增长策略
高效的阈值增长策略

1、论文希望解决的问题:

难以确定一个合适的最小效用阈值。

2、作者的目标:

  • raise the minimum utility threshold values as quickly as possible to reduce the total number of candidates generated in growth stages.(design better threshold raising strategies to signifificantly improve the performance of top-k HUI mining for dense datasets)
  • easily adapted to the above top-k sequential, streaming, and on-shelf variants.
  • 挖掘一个长的连续的序列项集:it is evident that the LIU structure aims to capture utilities of long contiguous sequence of items in a compact form

3、问题定义:

High-utility Itemset (HUI):
在这里插入图片描述

Top-K High Utility Itemset
在这里插入图片描述
optimal minimum utility threshold value
在这里插入图片描述

High-utility Itemset mining(HUIM):
The HUI mining problem primarily involves determining the set of all itemsets that satisfy a user specifified minimum utility threshold value.
在这里插入图片描述

Two-phase method:
the potential top-k HUIs are mined in the first phase,in the second phase, the actual top-k HUIs are determined by computing the actual utility values and fifiltering the false positive top-k itemsets.
TKU (Wu et al., 2012) and REPT(Ryang & Yun, 2015)

One-phase method:
directly mine the top-k HUIs in one phase without generating the intermediate candidates or potential top-k HUIs.

4、方法:

set the minimum utility threshold value to zero, design better threshold raising strategies to signifificantly improve the performance of top-k HUI mining for dense datasets.

5、关键技术

(1) Leaf Itemset Utility (LIU) structure

首先,用一个上三角矩阵来存储连续项集的效用值。纵向的矩阵 如 (e, c)表示项集{e, c}的效用值;(f, c)表示项集{f, d, a, e, c}的效用值。

在这里插入图片描述
在这里插入图片描述

然后用一个 Leaf Itemset Utility (LIU) 结构构建一个项集树,叶子节点是最后一项,项集是连续的
在这里插入图片描述
在这里插入图片描述
【图中的所有正方形标记项都表示连续序列,c为最后一项
椭圆形的条目表示连续的序列,e作为最后一项。
同样,双矩形和菱形条目分别表示具有a和d的连续序列
可以将上述结构可视化为给定最后一项的项集树的叶节点层】

(2) threshold raising strategy

① 矩阵中效用值不小于阈值的项被加入到优先队列PQ_LIU中
在这里插入图片描述
利用矩阵中的top-k项集来提升阈值。

② 累计效用性质
在这里插入图片描述
在这里插入图片描述

③效用值下界性质
在这里插入图片描述
②、③使用存储在LIU矩阵中的连续项集的效用值来估计其他相关项目集的效用下界值

(3) utility lower bound estimation method

6、创新点(贡献)

(1) present a new method THUI, the method uses a novel Leaf Itemset Utility (LIU) data structure for storing utility information of itemsets.
(2) present a new threshold raising strategy (LIU-Exact utilities) to effectively raise the minimum utility threshold value. The proposed strategy leverages the information stored in LIU for raising the minimum utility value.
(3) A novel utility lower bound estimation method (LIU-LB) is proposed. This method aids in signifificantly increasing the minimum utility threshold value without computing the actual utility value of long itemsets.
(4) Rigorous experimental evaluation against two state-of-the-art methods (KHMC and TKO) is conducted to demonstrate the utility of the proposed ideas.

7、相关文献

KHMC (Duong et al., 2016) and TKO (Tseng et al., 2016) algorithms leverage a vertical utility list based data structure.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值