Pattern Evaluation
@(Pattern Discovery in Data Mining)
本文介绍了数据挖掘中模式挖掘,评估所得模式与规则科学性的方法。
Limitation of Support-Confidence Framework
Pattern-mining will generate a large set of patterns/rules. However, not all the generated patterns/rules are interesting.
The interestingness
measures: Objective vs. subjective
* Objective interestingness measures
* Support, confidence, correlation, …
* Subjective interestingness measures: One man’s trash could be another man’s treasure
* Query-based: Relevant to a user’s particular request
* Against one’s knowledge-base: unexpected, freshness, timeliness
* Visualization tools: Multi-dimensional, interactive examination
An example of limitations:
Interesting Measures: Lift and χ2
Lift
Measure of dependent/correlated events:
lift
lift(B,C)=c(B→C)s(C)=s(B∪C)s(B)×s(C)Lift(B, C) may tell how B and C are correlated
- Lift(B, C) = 1: B and C are independent
- > 1: positively correlated
- < 1: negatively correlated
Example:
Thus, B and C are negatively correlated since list < 1; But B and ¬C are positively correlated since lift > 1.
χ2
Measure to test correlated events
χ2=∑Observed−ExpectedExpectedGeneral rules:
- χ2=0 , independent
- χ2>0 , correlated, either positive or negative. So it needs additional test
Example:
- Null transaction(
¬A∩¬B
)
- Notion: Lift and
χ2
are not always good measures
- Notion: Lift and
χ2
are not always good measures
Null Invariance Measures
- Null Invariance: Value does not change with the number of null-transactions.
- Why is null invariance crucial for the analysis of massive transaction data? Because Many transactions may contain neither milk nor coffee!
Comparison of Null-invariance Measures
Use Imbalanced Ratio to measure the imbalance of two itemsets A and B in rule implications.