Pattern Evaluation

最新推荐文章于 2020-11-21 12:07:35 发布

南极光

最新推荐文章于 2020-11-21 12:07:35 发布

阅读量3.4k

点赞数

分类专栏： Data Mining 文章标签： pattern evaluation

本文链接：https://blog.csdn.net/rk2900/article/details/43867993

版权

Data Mining 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

Pattern Evaluation

@(Pattern Discovery in Data Mining)
本文介绍了数据挖掘中模式挖掘，评估所得模式与规则科学性的方法。

Pattern Evaluation

Limitation of Support-Confidence Framework

Pattern-mining will generate a large set of patterns/rules. However, not all the generated patterns/rules are interesting.

The interestingness measures: Objective vs. subjective
* Objective interestingness measures
* Support, confidence, correlation, …
* Subjective interestingness measures: One man’s trash could be another man’s treasure
* Query-based: Relevant to a user’s particular request
* Against one’s knowledge-base: unexpected, freshness, timeliness
* Visualization tools: Multi-dimensional, interactive examination

An example of limitations:

Interesting Measures: Lift and $\chi ^2$

Lift
- Measure of dependent/correlated events: lift
  
  $l i f t (B, C) = c ( B \to C ) s ( C ) = s ( B \cup C ) s ( B ) \times s ( C )$ $lift(B,C) = \frac{c(B \to C)}{s(C)} = \frac{s(B \cup C)}{s(B)\times s(C)}$
- Lift(B, C) may tell how B and C are correlated
- Lift(B, C) = 1: B and C are independent
- > 1: positively correlated
- < 1: negatively correlated

Example:

Thus, B and C are negatively correlated since list < 1; But B and $\neg C$ are positively correlated since lift > 1.

$\chi ^2$
- Measure to test correlated events
  
  $χ 2 = \sum O b s e r v e d - E x p e c t e d E x p e c t e d$ $\chi ^2 = \sum{\frac{Observed - Expected}{Expected}}$
- General rules:
- $\chi ^2 = 0$ , independent
- $\chi ^2 > 0$ , correlated, either positive or negative. So it needs additional test

Example:

Null transaction( ¬A∩¬B )
- Notion: Lift and $\chi ^2$ are not always good measures

Null Invariance Measures

Null Invariance: Value does not change with the number of null-transactions.
Why is null invariance crucial for the analysis of massive transaction data? Because Many transactions may contain neither milk nor coffee!